Files
geedge-jira/md/OMPUB-1290.md
2025-09-14 21:52:36 +00:00

150 lines
4.2 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 【E21现场】BOLE-IGW多块NPB出现tsg_9140_packet_io_rxdrop丢包告警
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-1290 | 2024-05-15T16:33:59.000+0800 | 杨威 | 处理中 |
---
2024.05.13 09:00--2024.05.14 09:00
BOL-IGW-T9K002-NPB02 11次
2024.05.14 09:00--2024.05.15 09:00
BOL-IGW-T9K001-NPB03 1次
BOL-IGW-T9K002-NPB02 16次
BOL-IGW-T9K002-NPB03 1次
BOL-IGW-T9K002-NPB04 1次
**yangwei** commented on *2024-05-16T09:55:57.372+0800*:
根据现场返回的监控,存在下述两类情况:
* 5.13~5.14BOL-IGW-T9K002-NPB02持续丢包
* 出现丢包的时段自10:00至22:00~23:00丢包规模随流量增长
* 触发Overload Protection的时间20:00~21:00在流量峰值开始时段
** 说明日间丢包时未触发单核CPU占用超限的现象
结论Bole-IGW NPB02持续丢包的原因需要进一步观察日间丢包的情况根据现场判断是否由于单核分流或者流量处理延迟较高造成。
!image-2024-05-16-09-43-59-546.png|width=278,height=350!!image-2024-05-16-09-43-25-976.png|width=307,height=356!
!image-2024-05-16-09-45-17-408.png|width=287,height=300!!image-2024-05-16-09-45-36-112.png|width=302,height=327!
* 5.14 Bole-IGW站点在18:20前后相关设备均出现规模在10K~15kpps的丢包
* 丢包规模和发生时间点较为一致监控中TCP会话新建未出现突增
结论推测为UDP异常流量拟在2307的监控面板添加UDP会话监控指标确认是否当时是否存在异常流量
!image-2024-05-16-09-40-51-957.png|width=296,height=345!!image-2024-05-16-09-40-25-601.png|width=271,height=352!!image-2024-05-16-09-41-26-969.png|width=280,height=351!
---
**liuxueli** commented on *2024-05-16T21:37:21.940+0800*:
* 北京时间2024/5/16 21:30:00 BOL-IGW-T9K002-NPB02持续丢包登录设备查看丢包线程对应的UDP新建/删除流较高(20000~35000/秒)调整SAPP参数限制UDP的新建及淘汰速度UDP新建/淘汰各限制为5000/秒,持续观察。
** [^20240516213221.BOL-IGW-T9K002-NPB02.sysinfo.log.txt] 
** !20240516213221.BOL-IGW-T9K002-NPB02.png!
---
**liuxueli** commented on *2024-05-17T10:37:36.454+0800*:
* 北京时间2024/5/16 22:00:00 BOL-IGW-T9K002-NPB02捕获数据包分析捕获的数据包发现三元组固定但是源端口递增的DNS数据包(且数据包符合DNS格式)怀疑是DNS flood。
** 数据包:
*** NAS:  E21_pcap/Bole-IGW02-NPB02
** ip.addr==196.188.52.10 && ip.addr==208.87.242.217 && udp.port==53
*** [^dns.query.196.188.52.10-208.87.242.217.53.pcap]
** ip.addr==108.181.2.147 && ip.addr==213.55.125.42  && udp.port==53
*** [^dns.query.213.55.125.42-108.181.2.147.53.pcap]
---
**liuxueli** commented on *2024-05-17T14:52:24.760+0800*:
* 北京时间2024/5/17 01:15:00~01:25:00 BOL-IGW-T9K002-NPB02设备存在rxdrop丢包峰值约9Kpps进一步调整SAPP参数限制UDP的新建及淘汰速度UDP新建/淘汰各限制为3000/秒,持续观察。
** max_opening_per_sec=3000
** max_timeouts_per_sec=3000 
** !image-2024-05-17-14-52-13-953.png|width=1297,height=682!
---
**yangwei** commented on *2024-05-17T15:19:07.001+0800*:
* 查询现场返回的数据包中两个服务端IP 208.87.242.217 和108.181.2.147ASN均为“AS40676 Psychz Networks”属于一家提供CDN和DDoS Migration的公司
* 向上述服务端发起dns查询未收到响应
推测对应的DNS流量为针对该服务商的DoS攻击流量/Dos攻击牵引流量
---
## Attachments
**57777/1715775533880.jpg**
---
**57874/20240516213221.BOL-IGW-T9K002-NPB02.png**
---
**57875/20240516213221.BOL-IGW-T9K002-NPB02.sysinfo.log.txt**
---
**57886/dns.query.196.188.52.10-208.87.242.217.53.pcap**
---
**57887/dns.query.213.55.125.42-108.181.2.147.53.pcap**
---
**57779/image-2024-05-16-09-38-50-887.png**
---
**57780/image-2024-05-16-09-40-25-601.png**
---
**57781/image-2024-05-16-09-40-51-957.png**
---
**57782/image-2024-05-16-09-41-26-969.png**
---
**57783/image-2024-05-16-09-43-25-976.png**
---
**57784/image-2024-05-16-09-43-59-546.png**
---
**57785/image-2024-05-16-09-45-17-408.png**
---
**57786/image-2024-05-16-09-45-36-112.png**
---
**57923/image-2024-05-17-14-52-13-953.png**
---