Files
geedge-jira/md/OMPUB-575.md

301 lines
7.8 KiB
Markdown
Raw Permalink Normal View History

2025-09-14 21:52:36 +00:00
# 【E21-NZ】BJR-IGW NPB 频繁出现短时间的asset_ping_failed告警
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-575 | 2022-08-04T21:30:17.000+0800 | 方顺健 | 已关闭 |
---
近期 BJR-IGW的NPB 、DIR-IGW的NPB、  national  center服务器出现短时间的asset_ping_failed告警其中BJR-IGW NPB 告警近期尤为频繁且每次是单个NPB告警持续时间多数都是4m30s
处理进展:
告警发生期间从nz-agent  ping  NPB 也可以ping 成功且无丢包现象但是NZ采集到的ping状态是失败 等于0查看告警期间的交换机日志也无相应的异常信息。
 
alert messages详情参考附件。**fangshunjian** commented on *2022-08-05T11:00:03.392+0800*:
{code:java}
// code placeholder
# 登录agent服务器
ssh 10.243.12.3
# 备份配置文件
cp /opt/nezha/nz-agent/blackbox_exporter/config.conf /opt/nezha/nz-agent/blackbox_exporter/config.conf.bak
# 修改log级别
echo "OPTION=\"--web.listen-address='0.0.0.0:19115' --config.file='/opt/nezha/nz-agent/blackbox_exporter/blackbox.yml' --log.level=debug\"" > /opt/nezha/nz-agent/blackbox_exporter/config.conf
# 重启 blackbox-exporter
systemctl restart blackbox-exporter.service{code}
[~liuju] 需要按照以上步骤开启debug级别日志待下一次出现时 进一步排查
---
**liuju** commented on *2022-08-05T15:28:25.953+0800*:
收到10.243.12.3 已开启debug级别日志观察中
---
**fangshunjian** commented on *2022-08-08T14:39:13.968+0800*:
导出blackbox exporter 日志
journalctl --since "2022-08-06" -u blackbox-exporter.service > blackbox.log && tar -zcvf blackbox.tar.gz ./blackbox.log
---
**fangshunjian** commented on *2022-08-08T15:08:58.422+0800*:
经排查 告警时出现 ping 响应超时
Cannot get TTL from the received packet
!image-2022-08-08-15-07-58-373.png!
!image-2022-08-08-16-09-06-021.png!!image-2022-08-08-16-09-49-835.png!
---
**fangshunjian** commented on *2022-08-08T17:07:34.954+0800*:
1、将脚本上传到 10.243.12.3
* [^ping_test.sh]
2、chmod +x ping_test.sh  #设置可执行权限
3、nohup ./ping_test.sh 10.243.11.2 &  # 每10s执行一次ping
4、待下次 10.243.11.2 再次出现告警时,检查 ping_test.log 是否同样出现错误信息,从而确定是否为 blackbox exporter bug
 
---
**liuju** commented on *2022-08-08T18:03:28.525+0800*:
收到,好的。
---
**liuju** commented on *2022-08-08T18:16:35.375+0800*:
已添加了ping状态检测脚本。
---
**liuju** commented on *2022-08-08T21:47:50.140+0800*:
 2022-08-08 16:24:58  10.243.11.2  出现ping告警告警消息active时显示持续时间9m,但是告警消失之后显示该条告警消息持续时间为4m30s 告警active期间2022-08-08  16:30:00 从NZ-agent ping 10.243.11.2 也进行了截图,无丢包。
告警消失之后,执行以下语句:
 journalctl --since "2022-08-08" -u blackbox-exporter.service > blackbox20220808.log && tar -zcvf blackbox20220808.tar.gz ./blackbox20220808.log
导出blackbox20220808.tar.gz及ping_test.log日志及NZ上告警消息截图打包到文件夹ping20220808压缩上传到附件。
---
**fangshunjian** commented on *2022-08-10T17:33:27.292+0800*:
[~liuju] 请按照以下步骤,更新 blackbox_exporter 程序
1、ssh 10.243.12.3
2、mv /opt/nezha/nz-agent/blackbox_exporter/blackbox_exporter /opt/nezha/nz-agent/blackbox_exporter/blackbox_exporter_bak  #备份
3、将 blackbox_exporter文件上传到  /opt/nezha/nz-agent/blackbox_exporter/
* [^blackbox_exporter]
4、systemctl restart blackbox-exporter.service # 重启服务
5、systemctl status blackbox-exporter.service  # 检查状态
 
 
---
**liuju** commented on *2022-08-11T14:33:12.745+0800*:
好的,已更新,继续观察。
---
**liuju** commented on *2022-08-12T14:52:44.939+0800*:
我在2022-08-11 09:30:46 在10.243.12.3  上更新的blackbox-exporter.service,查询更新之后NZ上ping告警目前还到仍有一个。 !image-2022-08-12-09-52-24-114.png!
---
**fangshunjian** commented on *2022-08-16T10:43:22.687+0800*:
为了避免误报的情况调整alert rule持续时间为超过两个检查周期异常才发出告警。
1、登录 NZ 系统
2、选择 Configuration / APM Settings
* 修改 ping interval 为 60s
* 保存
3、选择 Alerts / Rules 
* 修改 asset_ping_failed 规则 duration 时间为120 s
* 保存
 
!image-2022-08-16-10-36-52-190.png!
---
**liuju** commented on *2022-08-16T21:07:38.243+0800*:
已将NZ  Configuration / APM Settings  ping interval  300s修改为60s,
已将NZ  Alerts / Rules  asset_ping_failed  duration 60s修改为130s.
目前在持续关注后续效果中。
 
---
**liuju** commented on *2022-08-19T16:42:07.187+0800*:
截止到2022-08-19今天查询过去两天NZ上关于ping告警消息查询结果未再出现该issu反馈的问题。
---
**liuju** commented on *2022-08-26T15:46:12.825+0800*:
截止到2022-08-26 更新之后未再出现该issu反馈的问题现关闭该issue
---
2025-09-14 22:26:17 +00:00
# Attachments
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: 20220804_Last_7days_ping_failed.xlsx
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[20220804_Last_7days_ping_failed.xlsx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30119/20220804_Last_7days_ping_failed.xlsx)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: alert-message-2022-08-04+16-28-01.xlsx
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[alert-message-2022-08-04+16-28-01.xlsx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30120/alert-message-2022-08-04+16-28-01.xlsx)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: blackbox_exporter
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[blackbox_exporter](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30290/blackbox_exporter)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: image-2022-08-08-15-07-58-373.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![image-2022-08-08-15-07-58-373.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30187/image-2022-08-08-15-07-58-373.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: image-2022-08-08-16-09-06-021.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![image-2022-08-08-16-09-06-021.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30196/image-2022-08-08-16-09-06-021.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: image-2022-08-08-16-09-49-835.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![image-2022-08-08-16-09-49-835.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30195/image-2022-08-08-16-09-49-835.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: image-2022-08-12-09-52-24-114.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![image-2022-08-12-09-52-24-114.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30377/image-2022-08-12-09-52-24-114.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: image-2022-08-16-10-36-52-190.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![image-2022-08-16-10-36-52-190.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30456/image-2022-08-16-10-36-52-190.png)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: ping_test.sh
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[ping_test.sh](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30201/ping_test.sh)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: ping20220808.rar
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[ping20220808.rar](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30212/ping20220808.rar)
Attachment: 微信图片_20220804162536.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20220804162536.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30115/微信图片_20220804162536.png)
Attachment: 微信图片_20220804162542.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20220804162542.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30116/微信图片_20220804162542.png)
Attachment: 微信图片_20220804162548.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20220804162548.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30117/微信图片_20220804162548.png)
Attachment: 微信图片_20220804162619.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20220804162619.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30118/微信图片_20220804162619.png)
Attachment: 微信图片_20220804162907.png
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
![微信图片_20220804162907.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30121/微信图片_20220804162907.png)
2025-09-14 21:52:36 +00:00