7.8 KiB
【E21-NZ】BJR-IGW NPB 频繁出现短时间的asset_ping_failed告警
| ID | Creation Date | Assignee | Status |
|---|---|---|---|
| OMPUB-575 | 2022-08-04T21:30:17.000+0800 | 方顺健 | 已关闭 |
近期 BJR-IGW的NPB 、DIR-IGW的NPB、 national center服务器出现短时间的asset_ping_failed告警,其中BJR-IGW NPB 告警近期尤为频繁,且每次是单个NPB告警,持续时间多数都是4m30s,
处理进展:
告警发生期间,从nz-agent ping NPB 也可以ping 成功且无丢包现象,但是NZ采集到的ping状态是失败 等于0,查看告警期间的交换机日志也无相应的异常信息。
alert messages详情参考附件。fangshunjian commented on 2022-08-05T11:00:03.392+0800:
{code:java} // code placeholder
登录agent服务器
ssh 10.243.12.3
备份配置文件
cp /opt/nezha/nz-agent/blackbox_exporter/config.conf /opt/nezha/nz-agent/blackbox_exporter/config.conf.bak
修改log级别
echo "OPTION="--web.listen-address='0.0.0.0:19115' --config.file='/opt/nezha/nz-agent/blackbox_exporter/blackbox.yml' --log.level=debug"" > /opt/nezha/nz-agent/blackbox_exporter/config.conf
重启 blackbox-exporter
systemctl restart blackbox-exporter.service{code} [~liuju] 需要按照以上步骤,开启debug级别日志,待下一次出现时 进一步排查
liuju commented on 2022-08-05T15:28:25.953+0800:
收到,10.243.12.3 已开启debug级别日志,观察中
fangshunjian commented on 2022-08-08T14:39:13.968+0800:
导出blackbox exporter 日志
journalctl --since "2022-08-06" -u blackbox-exporter.service > blackbox.log && tar -zcvf blackbox.tar.gz ./blackbox.log
fangshunjian commented on 2022-08-08T15:08:58.422+0800:
经排查 告警时出现 ping 响应超时
Cannot get TTL from the received packet
!image-2022-08-08-15-07-58-373.png!
!image-2022-08-08-16-09-06-021.png!!image-2022-08-08-16-09-49-835.png!
fangshunjian commented on 2022-08-08T17:07:34.954+0800:
1、将脚本上传到 10.243.12.3
- [^ping_test.sh]
2、chmod +x ping_test.sh #设置可执行权限
3、nohup ./ping_test.sh 10.243.11.2 & # 每10s执行一次ping
4、待下次 10.243.11.2 再次出现告警时,检查 ping_test.log 是否同样出现错误信息,从而确定是否为 blackbox exporter bug
liuju commented on 2022-08-08T18:03:28.525+0800:
收到,好的。
liuju commented on 2022-08-08T18:16:35.375+0800:
已添加了ping状态检测脚本。
liuju commented on 2022-08-08T21:47:50.140+0800:
在 2022-08-08 16:24:58 10.243.11.2 出现ping告警,告警消息active时,显示持续时间9m,但是告警消失之后,显示该条告警消息持续时间为4m30s ,告警active期间2022-08-08 16:30:00 从NZ-agent ping 10.243.11.2 也进行了截图,无丢包。
告警消失之后,执行以下语句:
journalctl --since "2022-08-08" -u blackbox-exporter.service > blackbox20220808.log && tar -zcvf blackbox20220808.tar.gz ./blackbox20220808.log
导出blackbox20220808.tar.gz及ping_test.log日志及NZ上告警消息截图打包到文件夹ping20220808压缩上传到附件。
fangshunjian commented on 2022-08-10T17:33:27.292+0800:
[~liuju] 请按照以下步骤,更新 blackbox_exporter 程序
1、ssh 10.243.12.3
2、mv /opt/nezha/nz-agent/blackbox_exporter/blackbox_exporter /opt/nezha/nz-agent/blackbox_exporter/blackbox_exporter_bak #备份
3、将 blackbox_exporter文件上传到 /opt/nezha/nz-agent/blackbox_exporter/
- [^blackbox_exporter]
4、systemctl restart blackbox-exporter.service # 重启服务
5、systemctl status blackbox-exporter.service # 检查状态
liuju commented on 2022-08-11T14:33:12.745+0800:
好的,已更新,继续观察。
liuju commented on 2022-08-12T14:52:44.939+0800:
我在2022-08-11 09:30:46 在10.243.12.3 上更新的blackbox-exporter.service,查询更新之后NZ上ping告警,目前还到仍有一个。 !image-2022-08-12-09-52-24-114.png!
fangshunjian commented on 2022-08-16T10:43:22.687+0800:
为了避免误报的情况,调整alert rule持续时间为超过两个检查周期异常才发出告警。
1、登录 NZ 系统
2、选择 Configuration / APM Settings
- 修改 ping interval 为 60s
- 保存
3、选择 Alerts / Rules
- 修改 asset_ping_failed 规则 duration 时间为:120 s
- 保存
!image-2022-08-16-10-36-52-190.png!
liuju commented on 2022-08-16T21:07:38.243+0800:
已将NZ Configuration / APM Settings ping interval 300s修改为60s,
已将NZ Alerts / Rules asset_ping_failed duration 60s修改为130s.
目前在持续关注后续效果中。
liuju commented on 2022-08-19T16:42:07.187+0800:
截止到2022-08-19今天查询过去两天NZ上关于ping告警消息,查询结果未再出现该issu反馈的问题。
liuju commented on 2022-08-26T15:46:12.825+0800:
截止到2022-08-26 ,更新之后未再出现该issu反馈的问题,现关闭该issue
Attachments
Attachment: 20220804_Last_7days_ping_failed.xlsx
20220804_Last_7days_ping_failed.xlsx
Attachment: alert-message-2022-08-04+16-28-01.xlsx
alert-message-2022-08-04+16-28-01.xlsx
Attachment: blackbox_exporter
Attachment: image-2022-08-08-15-07-58-373.png
Attachment: image-2022-08-08-16-09-06-021.png
Attachment: image-2022-08-08-16-09-49-835.png
Attachment: image-2022-08-12-09-52-24-114.png
Attachment: image-2022-08-16-10-36-52-190.png
Attachment: ping_test.sh
Attachment: ping20220808.rar
Attachment: 微信图片_20220804162536.png
Attachment: 微信图片_20220804162542.png
Attachment: 微信图片_20220804162548.png
Attachment: 微信图片_20220804162619.png
Attachment: 微信图片_20220804162907.png









