Files
geedge-jira/md/OMPUB-1106.md

43 lines
1.6 KiB
Markdown
Raw Normal View History

2025-09-14 21:52:36 +00:00
# BOL-IGW-T9K001-NPB002 rx_missed告警
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-1106 | 2024-01-11T10:04:26.000+0800 | 陆秋文 | 已关闭 |
---
告警和监控信息参见附件**luqiuwen** commented on *2024-01-12T16:16:32.946+0800*:
经查这块板卡的内存存在MCE错误一般是由于硬件接触不良、损坏等原因导致可能是造成丢包的原因。
{code:java}
[Sat Jan  6 13:42:39 2024] mce: [Hardware Error]: Machine check events logged
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: CPU 14: Machine Check Event: 0 Bank 8: 8c00004000010091
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: TSC 24bdbab4310381e 
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: ADDR 172a3fc900 
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: MISC 1425c9486 
[Sat Jan  6 13:42:39 2024] EDAC sbridge MC0: PROCESSOR 0:406f1 TIME 1704537757 SOCKET 1 APIC 20
[Sat Jan  6 13:42:39 2024] EDAC MC2: 1 CE memory read error on CPU_SrcID#1_Ha#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x172a3fc offset:0x900 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0091 socket:1 ha:1 channel_mask:2 rank:1)
[Sat Jan  6 23:25:10 2024] mce: [Hardware Error]: Machine check events logged {code}
建议合适的时候到机房,重新插拔内存条、清灰,如有备件建议更换。另外,近几日内反馈的告警列表中该问题已消失,也可以继续观察。
 
---
## Attachments
**50018/alert-message-2024-01-02+09-13-31.xlsx**
---
**50017/BOL-IGW-T9K001-NPB02(2).html**
---