Files
geedge-jira/md/OMPUB-1290.md
2025-09-14 22:27:11 +00:00

6.3 KiB
Raw Permalink Blame History

【E21现场】BOLE-IGW多块NPB出现tsg_9140_packet_io_rxdrop丢包告警

ID Creation Date Assignee Status
OMPUB-1290 2024-05-15T16:33:59.000+0800 杨威 处理中

2024.05.13 09:00--2024.05.14 09:00 BOL-IGW-T9K002-NPB02 11次

2024.05.14 09:00--2024.05.15 09:00 BOL-IGW-T9K001-NPB03 1次 BOL-IGW-T9K002-NPB02 16次 BOL-IGW-T9K002-NPB03 1次 BOL-IGW-T9K002-NPB04 1次 yangwei commented on 2024-05-16T09:55:57.372+0800:

根据现场返回的监控,存在下述两类情况:

  • 5.13~5.14BOL-IGW-T9K002-NPB02持续丢包
  • 出现丢包的时段自10:00至22:00~23:00丢包规模随流量增长
  • 触发Overload Protection的时间20:00~21:00在流量峰值开始时段 ** 说明日间丢包时未触发单核CPU占用超限的现象

结论Bole-IGW NPB02持续丢包的原因需要进一步观察日间丢包的情况根据现场判断是否由于单核分流或者流量处理延迟较高造成。

!image-2024-05-16-09-43-59-546.png|width=278,height=350!!image-2024-05-16-09-43-25-976.png|width=307,height=356!

!image-2024-05-16-09-45-17-408.png|width=287,height=300!!image-2024-05-16-09-45-36-112.png|width=302,height=327!

  • 5.14 Bole-IGW站点在18:20前后相关设备均出现规模在10K~15kpps的丢包
  • 丢包规模和发生时间点较为一致监控中TCP会话新建未出现突增

结论推测为UDP异常流量拟在2307的监控面板添加UDP会话监控指标确认是否当时是否存在异常流量

!image-2024-05-16-09-40-51-957.png|width=296,height=345!!image-2024-05-16-09-40-25-601.png|width=271,height=352!!image-2024-05-16-09-41-26-969.png|width=280,height=351!


liuxueli commented on 2024-05-16T21:37:21.940+0800:

  • 北京时间2024/5/16 21:30:00 BOL-IGW-T9K002-NPB02持续丢包登录设备查看丢包线程对应的UDP新建/删除流较高(20000~35000/秒)调整SAPP参数限制UDP的新建及淘汰速度UDP新建/淘汰各限制为5000/秒,持续观察。 ** [^20240516213221.BOL-IGW-T9K002-NPB02.sysinfo.log.txt]  ** !20240516213221.BOL-IGW-T9K002-NPB02.png!

liuxueli commented on 2024-05-17T10:37:36.454+0800:

  • 北京时间2024/5/16 22:00:00 BOL-IGW-T9K002-NPB02捕获数据包分析捕获的数据包发现三元组固定但是源端口递增的DNS数据包(且数据包符合DNS格式)怀疑是DNS flood。 ** 数据包: *** NAS:  E21_pcap/Bole-IGW02-NPB02 ** ip.addr==196.188.52.10 && ip.addr==208.87.242.217 && udp.port==53 *** [^dns.query.196.188.52.10-208.87.242.217.53.pcap] ** ip.addr==108.181.2.147 && ip.addr==213.55.125.42  && udp.port==53 *** [^dns.query.213.55.125.42-108.181.2.147.53.pcap]

liuxueli commented on 2024-05-17T14:52:24.760+0800:

  • 北京时间2024/5/17 01:15:00~01:25:00 BOL-IGW-T9K002-NPB02设备存在rxdrop丢包峰值约9Kpps进一步调整SAPP参数限制UDP的新建及淘汰速度UDP新建/淘汰各限制为3000/秒,持续观察。 ** max_opening_per_sec=3000

** max_timeouts_per_sec=3000 

** !image-2024-05-17-14-52-13-953.png|width=1297,height=682!


yangwei commented on 2024-05-17T15:19:07.001+0800:

  • 查询现场返回的数据包中两个服务端IP 208.87.242.217 和108.181.2.147ASN均为“AS40676 Psychz Networks”属于一家提供CDN和DDoS Migration的公司
  • 向上述服务端发起dns查询未收到响应

推测对应的DNS流量为针对该服务商的DoS攻击流量/Dos攻击牵引流量


Attachments

Attachment: 1715775533880.jpg

1715775533880.jpg

Attachment: 20240516213221.BOL-IGW-T9K002-NPB02.png

20240516213221.BOL-IGW-T9K002-NPB02.png

Attachment: 20240516213221.BOL-IGW-T9K002-NPB02.sysinfo.log.txt

20240516213221.BOL-IGW-T9K002-NPB02.sysinfo.log.txt

Attachment: dns.query.196.188.52.10-208.87.242.217.53.pcap

dns.query.196.188.52.10-208.87.242.217.53.pcap

Attachment: dns.query.213.55.125.42-108.181.2.147.53.pcap

dns.query.213.55.125.42-108.181.2.147.53.pcap

Attachment: image-2024-05-16-09-38-50-887.png

image-2024-05-16-09-38-50-887.png

Attachment: image-2024-05-16-09-40-25-601.png

image-2024-05-16-09-40-25-601.png

Attachment: image-2024-05-16-09-40-51-957.png

image-2024-05-16-09-40-51-957.png

Attachment: image-2024-05-16-09-41-26-969.png

image-2024-05-16-09-41-26-969.png

Attachment: image-2024-05-16-09-43-25-976.png

image-2024-05-16-09-43-25-976.png

Attachment: image-2024-05-16-09-43-59-546.png

image-2024-05-16-09-43-59-546.png

Attachment: image-2024-05-16-09-45-17-408.png

image-2024-05-16-09-45-17-408.png

Attachment: image-2024-05-16-09-45-36-112.png

image-2024-05-16-09-45-36-112.png

Attachment: image-2024-05-17-14-52-13-953.png

image-2024-05-17-14-52-13-953.png