1.9 KiB
BOL-IGW-T9K001-NPB002 rx_missed告警
| ID | Creation Date | Assignee | Status |
|---|---|---|---|
| OMPUB-1106 | 2024-01-11T10:04:26.000+0800 | 陆秋文 | 已关闭 |
告警和监控信息参见附件luqiuwen commented on 2024-01-12T16:16:32.946+0800:
经查,这块板卡的内存存在MCE错误,一般是由于硬件接触不良、损坏等原因导致,可能是造成丢包的原因。 {code:java} [Sat Jan 6 13:42:39 2024] mce: [Hardware Error]: Machine check events logged [Sat Jan 6 13:42:39 2024] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR [Sat Jan 6 13:42:39 2024] EDAC sbridge MC0: CPU 14: Machine Check Event: 0 Bank 8: 8c00004000010091 [Sat Jan 6 13:42:39 2024] EDAC sbridge MC0: TSC 24bdbab4310381e [Sat Jan 6 13:42:39 2024] EDAC sbridge MC0: ADDR 172a3fc900 [Sat Jan 6 13:42:39 2024] EDAC sbridge MC0: MISC 1425c9486 [Sat Jan 6 13:42:39 2024] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1704537757 SOCKET 1 APIC 20 [Sat Jan 6 13:42:39 2024] EDAC MC2: 1 CE memory read error on CPU_SrcID#1_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x172a3fc offset:0x900 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0091 socket:1 ha:1 channel_mask:2 rank:1) [Sat Jan 6 23:25:10 2024] mce: [Hardware Error]: Machine check events logged {code} 建议合适的时候到机房,重新插拔内存条、清灰,如有备件建议更换。另外,近几日内反馈的告警列表中该问题已消失,也可以继续观察。
Attachments
Attachment: alert-message-2024-01-02+09-13-31.xlsx
alert-message-2024-01-02+09-13-31.xlsx
Attachment: BOL-IGW-T9K001-NPB02(2).html