2025-09-14 21:52:36 +00:00
|
|
|
|
# 【K18现场】NZ系统报警,但硬件设备正常
|
|
|
|
|
|
|
|
|
|
|
|
| ID | Creation Date | Assignee | Status |
|
|
|
|
|
|
|----|----------------|----------|--------|
|
|
|
|
|
|
| OMPUB-703 | 2022-11-23T13:13:32.000+0800 | 雷军 | 已关闭 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
客户反馈部分NP服务器出现报警,但服务器运行正常,请帮助排查~
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12月7日
|
|
|
|
|
|
|
|
|
|
|
|
故障再次出现,详见故障截图1 故障截图2。**jiaojianzhi** commented on *2022-11-23T13:16:57.307+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
附件是最近1小时、7天的信息,以及报错信息
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**leijun** commented on *2022-11-23T14:51:32.335+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
[~jiaojianzhi] 帮忙确认一下以下各组件链接地址是否可以访问
|
|
|
|
|
|
|
|
|
|
|
|
bifang-api : [http://10.4.62.3:8080/bifang/prometheus]
|
|
|
|
|
|
|
|
|
|
|
|
nginx : [https://10.4.62.3/status/format/prometheus]
|
|
|
|
|
|
|
|
|
|
|
|
mysql : [http://10.4.62.3:9104/metrics]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**jiaojianzhi** commented on *2022-11-24T18:51:07.489+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
业主自己重启了服务器。。重启之后故障解决了。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**shizhendong** commented on *2022-12-19T15:09:07.413+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
经排查,NZ 展示 Endpoint 状态无误,状态为 Down 的 Endpoint 原因如下:
|
|
|
|
|
|
Endpoint id: 1421
|
|
|
|
|
|
状态异常原因:该 endpoint 状态异常原因为,Prometheus 拉取 Mterics 数据超时导致。
|
|
|
|
|
|
当前系统配置 default_scrape_timeout=30s,通过 curl 请求 [http://10.1.61.1:9904/metrics] 超过 30s 未相应,通过 Prometheus targets 报错信息为 context deadline exceeded,确认为超时导致的。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Endpoint id: 1423、1424 等
|
|
|
|
|
|
状态异常原因:endpoint 所在服务器与 nz-agent 节点时钟不同步,导致指标数据异常,从而导致 endpoint 状态异常。
|
|
|
|
|
|
|
|
|
|
|
|
解决方式:恢复时钟同步正常。
|
|
|
|
|
|
时钟同步后,endpoint 状态恢复正常。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**jiaojianzhi** commented on *2022-12-21T19:01:50.104+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
客户反应新增了很多ADC相关的报警,详见附件图片
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**shizhendong** commented on *2023-01-09T10:03:44.882+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
经排查:NZ 展示 Endpoint 状态无误,状态为 Down 的 Endpoint 原因如下:
|
|
|
|
|
|
|
|
|
|
|
|
Error Msg: server returned HTTP status 502 Bad Gateway,因板卡问题导致设备配置 Endpoint 状态异常。
|
|
|
|
|
|
|
|
|
|
|
|
异常的内容已添加至附件 [^ADC 设备异常 Endpoint 详细信息.xlsx]
|
|
|
|
|
|
|
|
|
|
|
|
[~jiaojianzhi] 请查阅
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**luqiuwen** commented on *2023-01-09T10:57:10.476+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
2023年1月6日接[~jiaojianzhi] 报告现场ADC板卡的endpoint出现大量down的情况。登录交换板排查,发现不能ping通MCN0(ping 192.168.100.1),初步判断MCN0已崩溃。因现场人员当日不能到数据中心处理,此问题暂时搁置。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**leijun** commented on *2023-01-11T11:39:05.321+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
请[~jiaojianzhi] 帮忙查询以下信息
|
|
|
|
|
|
|
|
|
|
|
|
1、登录 10.1.62.2服务器,执行 docker ps 查看组件 STATUS 状态 Up 是否正常运行
|
|
|
|
|
|
|
|
|
|
|
|
如提示:Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
|
|
|
|
|
|
|
|
|
|
|
|
则执行 systemctl start docker 后再执行 docker ps
|
|
|
|
|
|
|
|
|
|
|
|
2、用浏览器确认以下链接地址是否可访问
|
|
|
|
|
|
|
|
|
|
|
|
redis: [http://10.1.62.2:9121/metric|http://10.1.62.2:9121/metrics]
|
|
|
|
|
|
|
|
|
|
|
|
minio: [http://10.1.62.2:9090/minio/prometheus/metrics|http://10.1.62.2:9121/metrics]
|
|
|
|
|
|
|
|
|
|
|
|
3、确认10.1.62.2时钟是否同步正常
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
# Attachments
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: 59705d3cbe846d4df56282ee66ef462.png
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: ADC+设备异常+Endpoint+详细信息.xlsx
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[ADC+设备异常+Endpoint+详细信息.xlsx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/34216/ADC+设备异常+Endpoint+详细信息.xlsx)
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: photo_2022-11-23_11-14-14.jpg
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: photo_2022-11-23_11-14-34.jpg
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: photo.jpg
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: 故障截图1.jpg
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 故障截图2.jpg
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 客户反应新增的ADC告警.png
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|