85 lines
3.3 KiB
Markdown
85 lines
3.3 KiB
Markdown
|
|
# Microburst导致EtherFabric的Traffic Link TX丢包
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-1295 | 2024-05-20T10:50:57.000+0800 | 杨威 | 已关闭 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
运营商(Ooredoo)反馈MDY-ORD部分链路存在丢包现象,使用rapid ping测试丢包率在千分之一左右。同时,MYTEL也反馈其链路(主要集中在YGN)在高峰时间段存在BFD无法up的情况。
|
|||
|
|
* 运营商反馈的各链路的丢包情况
|
|||
|
|
|
|||
|
|
!image-2024-05-22-10-41-39-879.png|width=779,height=318!
|
|||
|
|
* 运营商在位于新加坡的PoP点向IGW路由器rapid ping的结果
|
|||
|
|
|
|||
|
|
!image-2024-05-22-10-44-02-318.png|width=779,height=430!
|
|||
|
|
|
|||
|
|
!image-2024-05-22-10-45-12-528.png|width=357,height=129!
|
|||
|
|
* Ooredoo提供了在全天各时间段执行rapid ping的汇总结果(自2024-5-11起至2024-5-14止),可见高峰时间段100G链路丢包严重。
|
|||
|
|
|
|||
|
|
!image-2024-05-22-10-52-24-554.png|width=751,height=305!
|
|||
|
|
|
|||
|
|
此时,从EF的统计结果上看,该链路的利用率非常高,满负荷运行(下图为EF统计的链路吞吐量)。
|
|||
|
|
|
|||
|
|
!image-2024-05-22-10-53-41-781.png|width=659,height=642!
|
|||
|
|
* 在Etherfabric上配置放行规则放行Ping报文后,Ooredoo反馈Ping丢包的现象有显著改善。
|
|||
|
|
|
|||
|
|
!image-2024-05-22-10-48-47-290.png|width=276,height=263!
|
|||
|
|
* 此时,Etherfabric和ASW均开启了流量控制功能,在EF、ASW、TSGX上均未观测到丢包。**luqiuwen** commented on *2024-05-22T12:04:48.117+0800*:
|
|||
|
|
|
|||
|
|
* 关闭了Etherfabric的流控功能,可见在EF的链路侧接口上约有千分之一的丢包率。联系了EF的供应商,供应商反馈EF链路侧TX丢包可能是由于交换芯片缓存不足,无法容纳流量中的Microburst。特别是多个100GE(业务侧)向同一链路侧接口转发时,该现象会更为明显。
|
|||
|
|
|
|||
|
|
!https://download.huawei.com/mdl/image/download?uuid=dd77a5d969254077a17721e8084dd28e!
|
|||
|
|
|
|||
|
|
[https://support.huawei.com/enterprise/en/doc/EDOC1100086962]
|
|||
|
|
* 开启了MARSIO的TX限速功能,配置以下参数:
|
|||
|
|
|
|||
|
|
{code:java}
|
|||
|
|
crudini --set /opt/tsg/mrzcpd/etc/mrglobal.conf device:bond en_tx_meter 1
|
|||
|
|
crudini --set /opt/tsg/mrzcpd/etc/mrglobal.conf device:bond tx_meter_cir_in_Kbps 50000000
|
|||
|
|
crudini --set /opt/tsg/mrzcpd/etc/mrglobal.conf device:bond tx_meter_cbs_in_KB 8192
|
|||
|
|
crudini --set /opt/tsg/mrzcpd/etc/mrglobal.conf device:bond tx_meter_ebs_in_KB 131072 {code}
|
|||
|
|
* 从统计上看,标记为Yellow的报文计数呈现周期性上涨,约10s上涨一次。考虑到重复流量识别功能中Bloom过滤器新、旧切换的周期为10s,可能导致了microburst。关闭重复流量识别功能,上述现象消失。
|
|||
|
|
* MDY-ORD关闭了FW的重复流量识别功能,UDP最大流表调整到10000,EF上停用放行ICMP报文的策略后,运营商反馈该站点各链路无丢包现象。
|
|||
|
|
|
|||
|
|
!image-2024-05-22-12-03-27-412.png!
|
|||
|
|
* 目前,全网各站点已进行以上修改,即关闭了FW重复流量识别功能,UDP流表最大数量调整到10000,EF上停用ICMP报文的放行策略。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**58164/image-2024-05-22-10-41-39-879.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**58165/image-2024-05-22-10-44-02-318.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**58166/image-2024-05-22-10-45-12-528.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**58167/image-2024-05-22-10-48-47-290.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**58168/image-2024-05-22-10-52-24-554.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**58169/image-2024-05-22-10-53-41-781.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**58182/image-2024-05-22-12-03-27-412.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|