154 lines
4.7 KiB
Markdown
154 lines
4.7 KiB
Markdown
# 【XJ-NPM】CN标签月报数据异常
|
||
|
||
| ID | Creation Date | Assignee | Status |
|
||
|----|----------------|----------|--------|
|
||
| OMPUB-1023 | 2023-09-13T11:17:26.000+0800 | 杨威 | 已解决 |
|
||
|
||
|
||
---
|
||
|
||
CN 8月标签月报存在以下问题需排查:
|
||
1.IDC整体流量 :16号-18号平均延迟过高;
|
||
2.IDC业务流量:bytedance 17号,alibaba18号固宽平均延迟过高;
|
||
|
||
初步排查原因为个别会话日志延迟数据异常导致,到达上亿秒,导致平均值过大,已添加附件和评论。**sunjiajia** commented on *2023-09-13T15:30:19.859+0800*:
|
||
|
||
如附件截图所示:
|
||
bytedance 17号存在两条异常数据 common_establish_latency_ms 达到 1.69227E+12
|
||
Alibaba 18号存在一条异常数据 common_establish_latency_ms 达到 1.69233E+12
|
||
|
||
|
||
|
||
|
||
---
|
||
|
||
**liuyang** commented on *2023-09-14T10:56:04.924+0800*:
|
||
|
||
[~yangwei] 麻烦看下日志中common_establish_latency_ms 达到 1.69227E+12的问题
|
||
|
||
|
||
|
||
---
|
||
|
||
**yangwei** commented on *2023-10-24T12:02:50.435+0800*:
|
||
|
||
[~sunjiajia] 帮忙在新疆省口的会话日志中,查询过去7天出现这种异常common_establish_latency_ms的日志量,并且导出
|
||
|
||
|
||
|
||
---
|
||
|
||
**sunjiajia** commented on *2023-10-24T13:05:32.712+0800*:
|
||
|
||
查看过去7天日志,延迟<200ms的日志条数有42,215,875,949条,延迟在200-1000ms日志条数有3,214,283,160条,延迟大于1000ms的有742,398,421条;延迟大于1000,000ms的有37条,延迟最高为23,410,417ms;延迟高于1000,000的37条日志详情见附件latency_logs.csv。
|
||
|
||
|
||
|
||
---
|
||
|
||
**yangwei** commented on *2023-10-24T16:43:54.391+0800*:
|
||
|
||
截至2023年10月24日,common_establish_latency_ms数值异常分两种情况:
|
||
|
||
问题1:数值量级在1.69227E+12,17号和18号分别出现2次和1次,{*}数值量级接近时间戳{*},推测计算创建延迟时,取出的当前时间为0(或者较小的数),导致计算出该异常值
|
||
|
||
问题2:10.23查询过去7天省口的会话日志,不存在数值高于1.69E+12的记录,分析附件中>1000,000ms的37条延迟较大的日志,信息如下:
|
||
* 所有的会话均为单向流
|
||
* 25条日志由10.111.192.161输出,5条由10.111.192.161输出
|
||
** 10.111.192.161的25条日志记录的对应会话,开始和结束的时段均在10.18 12:45-13:15之间
|
||
** 所有会话的持续时间,跟common_establish_latency_ms值接近
|
||
** 部分会话存在重传、丢包的情况
|
||
* 推测原因:{*}功能端当前计算TCP会话建立延迟的方式(从SYN开始,到第一个传输带负载的数据包),在单向流的情况下存在较大的误差{*}
|
||
|
||
|
||
|
||
---
|
||
|
||
**zhengchao** commented on *2023-10-26T17:54:04.504+0800*:
|
||
|
||
定位到bug代码了吗?
|
||
|
||
|
||
|
||
---
|
||
|
||
**yangwei** commented on *2023-10-27T11:11:30.254+0800*:
|
||
|
||
* 问题2
|
||
** 原因为计算逻辑不合理。目前实现判断单向流建立延迟,需要看到第一个TCP payload才认为连接建立,可能存在较大(数百-数千秒级别)的延迟值。
|
||
* 问题1
|
||
** 怀疑为时钟同步造成的异常。从代码逻辑上不会出现问题1级别的延迟值,怀疑是时钟同步导致时间戳计算错误,请孙佳佳查询了最近7天的全疆日志,未复现问题1,待持续观察。
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
---
|
||
|
||
**sunjiajia** commented on *2023-12-01T16:56:22.569+0800*:
|
||
|
||
CN出入口网络质量报告11月28日 Tencent Tcp会话延迟异常,如图CN日报-Tencent延迟异常;
|
||
|
||
查询当天日志,推测为一条高达17万亿ms的延迟拉高了均值;对应tsg日志见附件tencent_latency_logs.csv;且该延迟数值量级接近其时间戳。
|
||
|
||
|
||
|
||
---
|
||
|
||
**zhengchao** commented on *2024-11-19T16:53:25.882+0800*:
|
||
|
||
Issue closed due to no activity
|
||
|
||
|
||
|
||
---
|
||
|
||
|
||
|
||
# Attachments
|
||
|
||
Attachment: Alibaba_18.txt
|
||
|
||
[Alibaba_18.txt](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/44589/Alibaba_18.txt)
|
||
|
||
|
||
|
||
Attachment: Bytedance_17.txt
|
||
|
||
[Bytedance_17.txt](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/44590/Bytedance_17.txt)
|
||
|
||
|
||
|
||
Attachment: CN日报-Tencent延迟异常.png
|
||
|
||

|
||
|
||
|
||
|
||
Attachment: latency_logs.csv
|
||
|
||
[latency_logs.csv](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/46284/latency_logs.csv)
|
||
|
||
|
||
|
||
Attachment: screenshot-1.png
|
||
|
||

|
||
|
||
|
||
|
||
Attachment: screenshot-2.png
|
||
|
||

|
||
|
||
|
||
|
||
Attachment: tencent_latency_logs.csv
|
||
|
||
[tencent_latency_logs.csv](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/47612/tencent_latency_logs.csv)
|
||
|
||
|
||
|