10 KiB
【E21现场】 怀疑LGH-PE环境某时刻分流不均,单核CPU压力过载导致10.230.11.2 10.230.11.5出现多次SAPP丢包或Bypass的现象(流量2~3Gbps)
| ID | Creation Date | Assignee | Status |
|---|---|---|---|
| OMPUB-525 | 2022-06-14T14:51:16.000+0800 | 刘学利 | 已关闭 |
LGH-PE 在过去的12h 10.230.11.2 10.230.11.5 共计出现6次tsg_9140_packet_io_rxdrop
查看该站点流量记录和单NPB流量情况,流量也不大。
过去12小时 LGH-PE 流量 10.230.11.2 单NPB流量、drop、cpu情况见附件。
liuxueli commented on 2022-06-14T15:01:06.660+0800:
- 截图中显示:在丢包的时刻流量、CPU都不是最高的时刻,怀疑时某个核CPU使用到100%导致丢包。 ** monit_stream结果
{code:java} Time: Tue Jun 14 09:44:19 2022, App: sapp4, Device: eth_vf_raw -------------- -------------- ------------------ ---------- -------------- ------------------ --------- RxPkts RxBits RxDrops TxPkts TxBits TxDrops -------------- -------------- ------------------ ---------- -------------- ------------------ --------- RX[0]TX[0] 18118616828 117919364207224 242457 18118607519 117919349595168 0 RX[1]TX[1] 18739592533 126167043182680 1759 18739578775 126167022900376 0 RX[2]TX[2] 22336079693 139275566256408 0 22336070152 139275550694744 0 RX[3]TX[3] 15554678830 95438493215088 0 15554593979 95437715629384 0 RX[4]TX[4] 15597051373 98738108753488 1378164 15597039748 98738090047920 0 RX[5]TX[5] 18268950776 124341803283256 16719 18268938500 124341783897480 0 RX[6]TX[6] 15916702833 97191714326912 87183 15916691219 97191694764312 0 RX[7]TX[7] 19545059575 123704962444096 15687 19545043366 123704938734584 0 RX[8]TX[8] 18467073100 114256124743464 10813 18340388546 114189224209936 0 RX[9]TX[9] 15833501424 97740776572952 0 15833479103 97740746268160 0 RX[10]TX[10] 18490845392 117901811495584 3161921 18490829767 117901779211480 0 RX[11]TX[11] 14781040384 90513697403600 4405337 14781028663 90513679421072 0 RX[12]TX[12] 16919643592 104770001451840 6107 16919605817 104769942990008 0 RX[13]TX[13] 14191754122 87835579579256 0 14191738745 87835557519144 0 RX[14]TX[14] 14369544542 89895028061584 31435 14369524724 89894998238384 0 RX[15]TX[15] 15300064013 94514785029648 21247 15300053232 94514768861480 0 RX[16]TX[16] 17595387699 96501425267848 0 17595371664 96501394201576 0 RX[17]TX[17] 18342845361 114243339038840 0 18342834712 114243322467360 0 RX[18]TX[18] 17094039811 110045040984736 0 17094020736 110045011218528 0 RX[19]TX[19] 15193779072 94926618317040 470 15193766050 94926598854176 0 RX[20]TX[20] 14959883403 92881645177544 220900 14959869912 92881625033008 0 RX[21]TX[21] 16274704096 105992637911784 2522943 16274689980 105992617479336 0 RX[22]TX[22] 17390029665 106462840177984 11739 17390020802 106462826139784 0 RX[23]TX[23] 17657806077 117652302140504 0 17657792673 117652282576792 0 RX[24]TX[24] 19782169261 131152562971392 5655795 19782156753 131152543781344 0 RX[25]TX[25] 19292736727 133725911304488 597482 19292725995 133725893999760 0 RX[26]TX[26] 15306380665 96479651150896 18434 15306369017 96479633343928 0 RX[27]TX[27] 18527182896 117553965629560 2847837 18527165625 117553941311360 0 RX[28]TX[28] 14677234732 90759294724456 217519 14677214444 90759268004936 0 RX[29]TX[29] 17094714790 109728625721096 1089635 17094696229 109728600231400 0 RX[30]TX[30] 15231028977 96225434135536 8769 15231017623 96225416842904 0 RX[31]TX[31] 15604471934 98123065745712 3851679 15604459014 98123046665928 0 RX[32]TX[32] 17767682894 110118173638272 0 17767666502 110118150575056 0 RX[33]TX[33] 15929845233 98214775281488 8812 15929829366 98214751660440 0 RX[34]TX[34] 16521164413 102971141389792 0 16521148268 102971118989896 0 RX[35]TX[35] 17263635819 109982026420192 126824 17263623039 109982005020192 0 RX[36]TX[36] 13850646744 85731740632712 3216 13850636883 85731725130824 0 RX[37]TX[37] 21317076179 151080622443848 0 21317062863 151080601979952 0 RX[38]TX[38] 15498956793 98990329382960 24691 15498943838 98990304732576 0 RX[39]TX[39] 15092389997 92865868787200 136665 15092376756 92865845423448 0 RX[40]TX[40] 14639467871 91525691682704 46260 14639454197 91525671515816 0 RX[41]TX[41] 14865833720 92716587594768 0 14865818768 92716566346960 0 RX[42]TX[42] 15289672779 97127305089520 920373 15289663087 97127289764608 0 Total 720490966618 4563983482749952 27688872 720363606651 4563914896275520 0 -------------- -------------- ------------------ ---------- -------------- ------------------ --------- {code}
liuxueli commented on 2022-06-14T15:51:41.131+0800:
- 使用cpusage采集cpu的使用率。[~liuju]
liuxueli commented on 2022-06-15T17:02:39.808+0800:
- cpusage采集的结果显示,确实存在单核CPU使用率100导致SAPP丢包,结果如下: ** [^cpu10.230.11.2.over90.txt] ** ^!image-2022-06-15-17-17-02-086.png!^
liuxueli commented on 2022-06-15T17:29:01.405+0800:
- [~liuju] 挑一个NPB(230.11.2)把分流模式改成4元组分流试试,修改/opt/tsg/mrzcpd/etc/mrglobal.conf文件,distmode参数由2改成3,修改完成后需重启驱动和SAPP。 ** 使用cpusage采集cpu的使用率
liuxueli commented on 2022-06-16T14:32:00.150+0800:
-
据[~liuju] 反馈,NPB(230.11.2)修改将marsio分流模式由二元组分流改成四元组分流后,观察一晚上(约15小时)未发现丢包的现象。 ** {code:java} 以上上LGH-PE 10.230.11.2(昨天16:17:00 更新了驱动文件参数重启了驱动和程序)10.230.11.5 (未进行任何更新) 从昨天16:17:00 到现在的流量曲线和drop曲线,通过数据对比结果看,昨天的更新是有效果的,昨天10.230.11.2 无任何drop告警消息,10.230.11.5 依然存在drop告警。10.230.11.2 cpusage采集未部署上昨天下午,刚跑起来@刘学利 。 {code}
-
建议[~liuju] 将NPB(230.11.5)也改成按四元组分流,观察是否还存在丢包。 ** 继续使用cpusage采集cpu的使用率
liuxueli commented on 2022-06-17T15:45:09.782+0800:
- 据[~liuju] 反馈,LGH-PE 10.230.11.5 修改将marsio分流模式由二元组分流改成四元组分流后,观察一晚上(约15小时)未发现丢包的现象。10.230.11.2 运行两天未发现丢包现象
liuxueli commented on 2022-06-17T15:58:07.125+0800:
- CPU很富裕的情况下,还是存在被SAPP Bypass的链接。[~yangwei]
Attachments
Attachment: cpu10.230.11.2.over90
Attachment: cpu10.230.11.2.over90.txt
Attachment: image-2022-06-15-17-17-02-086.png
Attachment: 微信图片_20220614094824.png
Attachment: 微信图片_20220614094830.png
Attachment: 微信图片_20220614094836.png
Attachment: 微信图片_20220614094841.png
Attachment: 微信图片_20220614094847.png





