Files
geedge-jira/md/OMPUB-1039.md
2025-09-14 22:26:17 +00:00

145 lines
6.4 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 【XJ-TEST】试验局TSG-OS处理流量大于25Gbps时丢包
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-1039 | 2023-10-19T18:07:53.000+0800 | 刘洋 | 已关闭 |
---
TSG-OS版本为v23.07.19-9dc3a7e版本夜间流量达到25Gbps时SAPP存在丢包(CPU使用率50%左右内存30%左右)现将4台设备流量全部分到一台设备(3.17)流量约60GbpsSAPP丢一半流量CPU的硬中断较高CPU使用率上不去(流量25GBps/60Gbps都一直30%左右)。
**luqiuwen** commented on *2023-10-19T18:19:13.697+0800*:
经排查sapp丢包时其包处理线程在等待锁没有进行收包收包队列满造成丢包。
{code:java}
   658.234 ( 1.660 ms): futex(uaddr: 0x7f1ac4cfcd88, op: WAIT_BITSET|PRIVATE_FLAG|CLOCK_REALTIME, val3: MATCH_ANY) = 0
                                       syscall (/usr/lib64/libc-2.28.so)
                                       [0xb62227] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0xb616d0] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0xb617be] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0x8206ac] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0x8a82f1] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0x82cd71] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0x1857a1] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       [0x1895d0] (/opt/tsg/sapp/plug/business/tsg_vulpes/libonnxruntime.so.1.10.0)
                                       auto_label_call_ML_c (/opt/tsg/sapp/plug/business/tsg_vulpes/tsg_vulpes.so)
                                       traffic_process (/opt/tsg/sapp/plug/business/tsg_vulpes/tsg_vulpes.so)
                                       plugin_call_streamentry (/opt/tsg/sapp/sapp)
                                       call_streamentry (/opt/tsg/sapp/sapp)
                                       stream_process (/opt/tsg/sapp/sapp)
                                       stream_process_udp (/opt/tsg/sapp/sapp)
                                       udp_free_stream (/opt/tsg/sapp/sapp)
                                       streamaddlist (/opt/tsg/sapp/sapp)
                                       [0x43948] (/opt/tsg/sapp/sapp)
                                       dealipv4udppkt (/opt/tsg/sapp/sapp)
                                       ipv4_entry (/opt/tsg/sapp/sapp)
                                       eth_entry (/opt/tsg/sapp/sapp)
                                       [0x2e9e1] (/opt/tsg/sapp/sapp)
                                       [0x107c66] (/opt/tsg/sapp/sapp)
                                       [0x108051] (/opt/tsg/sapp/sapp)
                                       start_thread (/usr/lib64/libpthread-2.28.so)
                                       __GI___clone (inlined) {code}
这一锁由tsg_vulpes使用通过tsg-os-cli关闭该功能后不再丢包运行正常。
---
**yangwei** commented on *2023-10-19T18:35:57.760+0800*:
临时解决方案关闭加密语音识别功能tsg-os-cli中做如下设置
set template name tsg_traffic_engine_default encrypt_traffic_identify voice_bahavior_engine no
---
**xiapeng** commented on *2023-10-20T12:20:30.973+0800*:
TSG-OS版本升级为tsg-os-v23.07.22-c92d517版本并关闭加密语音识别功能后单机流量在75Gbps以下时未出现严重丢包的情况在75Gbps以上时开始丢包流量峰值达到98Gbps时丢包量达到最大流量大于75Gbps时间段内存使用率稳定在35%以下cpu使用率在70%95%之间频繁波动
!image-2023-10-20-12-52-09-665.png|width=394,height=141!
!image-2023-10-20-12-52-33-703.png|width=394,height=139!
!image-2023-10-20-12-53-12-337.png|width=157,height=178!
---
**yangwei** commented on *2023-10-20T12:36:33.918+0800*:
贴下现场的监控,文字描述看不出丢包的量级和资源使用情况
---
**yangwei** commented on *2023-10-20T12:36:59.957+0800*:
现场测试环境怎么接的?有拓扑图么?
---
**xiapeng** commented on *2023-10-20T13:04:53.878+0800*:
测试环境设计拓扑图:[https://docs.geedge.net/pages/viewpage.action?pageId=94778025]
实际环境做了如下修改:
1.取消了串联设备回流交换机RCP交换机设备及所在线路
2.取消了ATCA通用流量接入设备与光保设备之间直连线路改为光保设备 ->光放设备–>ATCA通用流量 线路
---
**yangwei** commented on *2023-10-20T13:09:08.065+0800*:
上传下NZ上设备的完整监控
---
**yangwei** commented on *2023-10-20T18:31:45.663+0800*:
issue中描述的>25Gbps丢包原因已经定位并解决先关闭有其他情况另开bug
---
# Attachments
Attachment: 1697705757545.png
![1697705757545.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/46184/1697705757545.png)
Attachment: 1697705784889.png
![1697705784889.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/46183/1697705784889.png)
Attachment: image-2023-10-20-12-52-09-665.png
![image-2023-10-20-12-52-09-665.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/46205/image-2023-10-20-12-52-09-665.png)
Attachment: image-2023-10-20-12-52-33-703.png
![image-2023-10-20-12-52-33-703.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/46204/image-2023-10-20-12-52-33-703.png)
Attachment: image-2023-10-20-12-53-12-337.png
![image-2023-10-20-12-53-12-337.png](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/46203/image-2023-10-20-12-53-12-337.png)