Files
geedge-jira/md/OMPUB-1278.md
2025-09-14 21:52:36 +00:00

1.6 KiB
Raw Blame History

[WMS-UTR] msh01 marsio出现段错误导致pod重启

ID Creation Date Assignee Status
OMPUB-1278 2024-05-09T16:28:00.000+0800 宋延超 已解决

时间2024-05-08 T14:15:23

设备MSH TSGX01

版本mrzcpd v4.6.71

现场:

!anydesk00001.png|width=537,height=302!

!anydesk00000.png|width=508,height=286!songyanchao commented on 2024-05-10T10:41:54.428+0800:

查看coredump发现 core 在了mlx5_rx_burst_vec里面的CQE处理流程google发现有人也出现过类似问题不过他触发的场景是开启了VF功能并且PF的mtu大于VF的mtu这样触发这个core查看现场发现现场开启了 rxq_cqe_comp_en。 针对此问题需要进一步跟进一下。 https://www.mail-archive.com/users@dpdk.org/msg07151.html https://bugs.dpdk.org/show_bug.cgi?id=334 !image-2024-05-10-10-44-13-537.png|thumbnail! !image-2024-05-10-10-43-27-154.png|thumbnail! !image-2024-05-10-10-43-37-359.png|thumbnail!


luqiuwen commented on 2024-05-10T15:55:22.750+0800:

建议调整一下mlx5固件的参数将CQE_COMPRESS等级由BALANCED调整到AGRESSIVE。


songyanchao commented on 2024-05-17T10:17:40.950+0800:

已将 msh-tsgx01、pcap-tsgx06设备的CQE_COMPRESS等级由BALANCED调整到AGRESSIVE。


Attachments

57081/anydesk00000.png


57080/anydesk00001.png


57139/image-2024-05-10-10-42-26-990.png


57137/image-2024-05-10-10-43-27-154.png


57136/image-2024-05-10-10-43-37-359.png


57141/image-2024-05-10-10-44-13-537.png