Files
geedge-jira/md/OMPUB-1106.md
2025-09-14 22:27:11 +00:00

1.9 KiB
Raw Permalink Blame History

BOL-IGW-T9K001-NPB002 rx_missed告警

ID Creation Date Assignee Status
OMPUB-1106 2024-01-11T10:04:26.000+0800 陆秋文 已关闭

告警和监控信息参见附件luqiuwen commented on 2024-01-12T16:16:32.946+0800:

经查这块板卡的内存存在MCE错误一般是由于硬件接触不良、损坏等原因导致可能是造成丢包的原因。 {code:java} [Sat Jan  6 13:42:39 2024] mce: [Hardware Error]: Machine check events logged [Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: CPU 14: Machine Check Event: 0 Bank 8: 8c00004000010091 [Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: TSC 24bdbab4310381e  [Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: ADDR 172a3fc900  [Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: MISC 1425c9486  [Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1704537757 SOCKET 1 APIC 20 [Sat Jan  6 13:42:39 2024] EDAC MC2: 1 CE memory read error on CPU_SrcID#1_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x172a3fc offset:0x900 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0091 socket:1 ha:1 channel_mask:2 rank:1) [Sat Jan  6 23:25:10 2024] mce: [Hardware Error]: Machine check events logged {code} 建议合适的时候到机房,重新插拔内存条、清灰,如有备件建议更换。另外,近几日内反馈的告警列表中该问题已消失,也可以继续观察。

 


Attachments

Attachment: alert-message-2024-01-02+09-13-31.xlsx

alert-message-2024-01-02+09-13-31.xlsx

Attachment: BOL-IGW-T9K001-NPB02(2).html

BOL-IGW-T9K001-NPB02(2).html