129 lines
2.3 KiB
Markdown
129 lines
2.3 KiB
Markdown
|
|
# 新疆移动扩容环境NEZHA 22.02版本 指标显示问题
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-508 | 2022-06-02T17:37:43.000+0800 | 史振东 | 已解决 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
5月29日的省口运行环境例行巡检中,通过nezha监控发现ck日志入库量骤降,如
|
|||
|
|
|
|||
|
|
!图1.png|thumbnail!
|
|||
|
|
|
|||
|
|
经反馈大数据排查,ck无问题。
|
|||
|
|
|
|||
|
|
排查nezha步骤如下:
|
|||
|
|
|
|||
|
|
1、表达式探索,曲线与图1一致
|
|||
|
|
|
|||
|
|
2、裸指标探索,29日凌晨,nazha拉取指标的CK数量由37台降为11台
|
|||
|
|
|
|||
|
|
!图2.png|thumbnail!
|
|||
|
|
|
|||
|
|
3、数据源查看,ck暴露的指标可以正常查看
|
|||
|
|
|
|||
|
|
!图3.png|thumbnail!
|
|||
|
|
|
|||
|
|
4、endpoint状态 up
|
|||
|
|
|
|||
|
|
!图4.png|thumbnail!
|
|||
|
|
|
|||
|
|
5、prometheus target状态 up
|
|||
|
|
|
|||
|
|
!图5.png|thumbnail!
|
|||
|
|
|
|||
|
|
6、prometheus 指标查询,提供ck指标数量为37
|
|||
|
|
|
|||
|
|
!图6.png|thumbnail!
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
7、nezha相关程序均查看log,并重启,log无明显异常,重启后nezha界面查询ck数量仍然是11个。
|
|||
|
|
|
|||
|
|
8、待排查**fangshunjian** commented on *2022-06-07T11:28:02.297+0800*:
|
|||
|
|
|
|||
|
|
在 APM - Explore 页面
|
|||
|
|
* count(up\{module="NC-Clickhouse"}) 检查数值是否等于 37 [~jiayimeng]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**jiayimeng** commented on *2022-06-07T11:46:40.957+0800*:
|
|||
|
|
|
|||
|
|
数值是37
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**fangshunjian** commented on *2022-06-07T16:44:56.017+0800*:
|
|||
|
|
|
|||
|
|
nz-agent cortex proxy 接口部分结果 error,报空指针异常
|
|||
|
|
|
|||
|
|
!image-2022-06-07-16-44-04-571.png!
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**shizhendong** commented on *2022-06-13T10:28:22.221+0800*:
|
|||
|
|
|
|||
|
|
Bug产生原因: 因 cortex ingester 组件超过 用户 active series 最大数量,造成写入失败,导致上述 NEZHA 与 Prometheus 查询指标数量不一致问题。
|
|||
|
|
|
|||
|
|
定位:
|
|||
|
|
|
|||
|
|
1. nz-agent cortex proxy 接口报错问题通过代码方式解决,与该BUG无直接关系
|
|||
|
|
|
|||
|
|
2. 排查 prometheus 日志
|
|||
|
|
|
|||
|
|
3. 排查 cortex 日志,发现 push error 日志,过滤后发现具体原因
|
|||
|
|
|
|||
|
|
!11111.png!
|
|||
|
|
|
|||
|
|
解决方式: 通过调整 limits_config.max_series_per_user 配置参数解决该问题
|
|||
|
|
|
|||
|
|
存在的其它问题:nz-agent 代理接口、日志记录方式 (已解决)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**28658/11111.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28447/image-2022-06-07-16-44-04-571.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28400/图1.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28401/图2.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28402/图3.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28403/图4.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28404/图5.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28405/图6.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|