Files
geedge-jira/md/OMPUB-1106.md

47 lines
1.9 KiB
Markdown
Raw Permalink Normal View History

2025-09-14 21:52:36 +00:00
# BOL-IGW-T9K001-NPB002 rx_missed告警
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-1106 | 2024-01-11T10:04:26.000+0800 | 陆秋文 | 已关闭 |
---
告警和监控信息参见附件**luqiuwen** commented on *2024-01-12T16:16:32.946+0800*:
经查这块板卡的内存存在MCE错误一般是由于硬件接触不良、损坏等原因导致可能是造成丢包的原因。
{code:java}
[Sat Jan  6 13:42:39 2024] mce: [Hardware Error]: Machine check events logged
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: CPU 14: Machine Check Event: 0 Bank 8: 8c00004000010091
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: TSC 24bdbab4310381e 
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: ADDR 172a3fc900 
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: MISC 1425c9486 
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1704537757 SOCKET 1 APIC 20
[Sat Jan  6 13:42:39 2024] EDAC MC2: 1 CE memory read error on CPU_SrcID#1_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x172a3fc offset:0x900 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0091 socket:1 ha:1 channel_mask:2 rank:1)
[Sat Jan  6 23:25:10 2024] mce: [Hardware Error]: Machine check events logged {code}
建议合适的时候到机房,重新插拔内存条、清灰,如有备件建议更换。另外,近几日内反馈的告警列表中该问题已消失,也可以继续观察。
 
---
2025-09-14 22:26:17 +00:00
# Attachments
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: alert-message-2024-01-02+09-13-31.xlsx
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[alert-message-2024-01-02+09-13-31.xlsx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/50018/alert-message-2024-01-02+09-13-31.xlsx)
2025-09-14 21:52:36 +00:00
2025-09-14 22:26:17 +00:00
Attachment: BOL-IGW-T9K001-NPB02(2).html
2025-09-14 22:27:11 +00:00
2025-09-14 22:26:17 +00:00
[BOL-IGW-T9K001-NPB02(2).html](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/50017/BOL-IGW-T9K001-NPB02(2).html)
2025-09-14 21:52:36 +00:00