2025-09-14 21:52:36 +00:00
|
|
|
|
# 【E21现场】report报表Traffic Statistics和实际统计结果差距过大。
|
|
|
|
|
|
|
|
|
|
|
|
| ID | Creation Date | Assignee | Status |
|
|
|
|
|
|
|----|----------------|----------|--------|
|
|
|
|
|
|
| OMPUB-598 | 2022-08-25T15:27:12.000+0800 | 戚岱杰 | 已关闭 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
在查看top IGW application 统计的report数据时,发现Traffic Statistics和实际统计结果差距过大。**qidaijie** commented on *2022-09-02T13:59:23.225+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
报表统计背景:
|
|
|
|
|
|
* Traffic Statistics数据源为TRAFFIC-METRICS。
|
|
|
|
|
|
* 报表统计的数据源为SESSION-RECORD原始日志。
|
|
|
|
|
|
|
|
|
|
|
|
初步结论:
|
|
|
|
|
|
* 通过sql直接查询数据库各局点数据情况,通过计算相差了70Gbps。
|
|
|
|
|
|
* IGW中心因数据量较大,在高峰期存在处理不了丢失日志的情况。
|
|
|
|
|
|
!日志曲线.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2022-09-23T16:54:45.517+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
因目前国家中心Kafka存在异常,无法明确的判断是否存在问题,待Kafka问题修复后,再进行详细排查。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2022-11-03T17:23:56.890+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
在修复国家中心问题后,日志汇聚已恢复正常,排除该问题对统计的影响。
|
|
|
|
|
|
后续排查情况如下:
|
|
|
|
|
|
# 选取20221026-28三天整天和00:00-02:30分(流量低峰)数据进行对比。
|
|
|
|
|
|
!image-2022-11-01-18-25-00-630.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
经过与 [~liuxueli] 确认,在凌晨流量低峰时刻,Traffic Metrics与Session Metrics的差距属于正常情况。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2022-11-03T17:44:53.063+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
现状:
|
|
|
|
|
|
通过 [^TSG-9140 Status (Device Groups).html] 监控图表查看,发现部分分中心(BOL-IGW|MWV-IGW|LGH-PE),在流量高峰时,存在日志写入Kafka失败情况,大约10%~15%。
|
|
|
|
|
|
!image-2022-11-01-18-21-12-894.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
情况:
|
|
|
|
|
|
# 目前分中心丢失日志的情况依据为功能端NZ监控,分中心Kafka无相关报错信息;无法确认功能端写入异常的原因。
|
|
|
|
|
|
** 同时了解到功能端发送日志的异常机制为:若出现发送Kafka的异常则丢失当前50%的日志。
|
|
|
|
|
|
** 通过监控粗略统计,在高峰期时BOL-IGW与MWV-IGW会话日志量均可达到16w/s左右,LGH-PE日志量在7w/s,更新后单条日志平均大小2KB~2.7KB,峰值IO预计在320MB/s ~ 432MB/s,已超出单节点处理能力。
|
|
|
|
|
|
# 三个分中心磁盘IO使用情况如下:
|
|
|
|
|
|
|
|
|
|
|
|
!image-2022-11-03-17-57-53-460.png|thumbnail! !image-2022-11-03-17-45-34-816.png|thumbnail! !image-2022-11-03-17-45-17-233.png|thumbnail!
|
|
|
|
|
|
# BOL-IGW与MWV-IGW存在读盘的情况,即Flink处理有一定延迟,该情况也会影响Kafka处理性能。
|
|
|
|
|
|
|
|
|
|
|
|
处理建议:
|
|
|
|
|
|
* LGH-PE目前会话日志Topic有10个分区,不足以支撑当前的日志量;增加Kafka分区到20,以提高处理能力。
|
|
|
|
|
|
* BOL-IGW与MWV-IGW:
|
|
|
|
|
|
** 将日志汇聚到国家中心的ack修改为0,即不进行验证数据是否成功接收;以增加处理性能。弊端是在数据汇聚的过程中可能会丢失数据。
|
|
|
|
|
|
** 原始日志部分字段不写入Kafka,以减少磁盘IO的使用,经过排查较大的字段有:
|
|
|
|
|
|
*** common_link_info_s2c : 1.3TB (7%)
|
|
|
|
|
|
*** common_link_info_c2s : 9.6TB (5%)
|
|
|
|
|
|
*** common_app_id :8TB (4.5%)
|
|
|
|
|
|
*** dns_rr: 4.1TB (2.3%)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2022-11-30T14:31:48.802+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
根据修改前后的监控对比:修改后日志写入相对更稳定,平均多写入了2.5-4K的日志。
|
|
|
|
|
|
|
|
|
|
|
|
!功能端日志情况记录-更新前.png|thumbnail!!功能端日志情况记录-更新后.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
!更新前后会话日志情况.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
但该局点IO使用率在高峰期依旧常态维持在95%左右。
|
|
|
|
|
|
|
|
|
|
|
|
!MWV-IGW IO使用率.png|thumbnail!
|
|
|
|
|
|
|
|
|
|
|
|
修改后相对写入了更多的日志,但因IO的情况依旧存在少部分丢失的情况。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2023-02-27T11:44:44.927+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
如上bug主要原因为Data Transporer MWV-IGW/SSM-IGW/BOL-IGW/LGH-PE Device Group 丢失日志数据导致。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
根据近期测试结果,Data Transport可针对以下几点进行优化:
|
|
|
|
|
|
# 修改Kafka配置,禁用刷盘策略,减少数据刷盘频率,较现在可提升30%左右的吞吐量。
|
|
|
|
|
|
## 生产者可增加batch.size/linger.ms参数配置,增大批次数据量减少请求次数。
|
|
|
|
|
|
# 生产者开启Snappy压缩;压缩后数据大小减少70%,较现在可提升2倍左右的吞吐量。
|
|
|
|
|
|
## 影响是生产者发送数据的CPU使用率,较之前增加30%。
|
|
|
|
|
|
## 压缩可以减少读盘操作对整体IO的影响,保证写入能力。
|
|
|
|
|
|
# 日志汇聚国家中心SSL请求响应延迟较高,比SASL多1.3倍;目前ETL程序存在数据堆积、等待超时丢失情况,对ETL程序优化:
|
|
|
|
|
|
## 生产者增大batch.size/linger.ms/request.timeout.ms阈值。
|
|
|
|
|
|
## 关闭producer.ack,不再等待国家中心Kafka的回执。
|
|
|
|
|
|
## 国家中心Kafka关闭刷盘策略;需要重启集群,待升级22.11版本时统一操作。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
优化点1和2需要麻烦 [~liuxueli] 协助测试;增加批次大小和开启压缩后,对功能端的影响。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2023-03-27T10:03:35.039+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
根据与功能端联调测试结果(TSG-14382),在更新22.11版本时开启压缩功能。[~liuxueli]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**doufenghu** commented on *2023-03-29T11:17:18.960+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
为提高大消息(平均2KB)吞吐量,E现场升级到22.11, Kafka Producer开启Snappy压缩。包括:
|
|
|
|
|
|
* 功能端Kafka Producer 写入 Data Transporter Kafka
|
|
|
|
|
|
* OLAP Data Transporter Kafka Producer 回传国家中心Kafka
|
|
|
|
|
|
|
|
|
|
|
|
{quote}生产者开启Snappy压缩;压缩后数据大小减少70%,较现在可提升2倍左右的吞吐量。影响是生产者发送数据的CPU使用率,较之前增加30%。压缩可以减少读盘操作对整体IO的影响,保证写入能力。{quote}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2023-04-14T11:21:27.177+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
升级至22.11版本,开启数据压缩优化后,对2023-04-07到2023-04-13每天的流量情况进行对比(数据基于业主每天制作的流量报告),如下:
|
|
|
|
|
|
|
|
|
|
|
|
Traffic Statistics-Total Bytes Transferred和Report-Total Bytes对比:
|
|
|
|
|
|
!image-2023-04-14-11-23-23-628.png|width=569,height=259!
|
|
|
|
|
|
|
|
|
|
|
|
Traffic Statistics-Total Packets Transferred和Report-Total Packets对比:
|
|
|
|
|
|
!image-2023-04-14-11-52-59-810.png|width=569,height=259!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**qidaijie** commented on *2023-07-03T15:38:40.458+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
对2023-06-21到2023-06-30流量情况进行对比,数据基于业主每天制作的流量报告(报告名称:Total Blocked Applications Traffic Report 报告ID:313)
|
|
|
|
|
|
|
|
|
|
|
|
情况如下:
|
|
|
|
|
|
|
|
|
|
|
|
Traffic Statistics-Total Bytes Transferred和Report-Total Bytes对比:
|
|
|
|
|
|
|
|
|
|
|
|
!image-2023-07-03-15-39-04-652.png|width=486,height=250!
|
|
|
|
|
|
|
|
|
|
|
|
Traffic Statistics-Total Packets Transferred和Report-Total Packets对比:
|
|
|
|
|
|
|
|
|
|
|
|
!image-2023-07-03-15-39-57-113.png|width=492,height=253!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
# Attachments
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: Device+Group+traffic2022-08-21-101810_2022-08-22-101810.xlsx
|
|
|
|
|
|
[Device+Group+traffic2022-08-21-101810_2022-08-22-101810.xlsx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30794/Device+Group+traffic2022-08-21-101810_2022-08-22-101810.xlsx)
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2022-11-01-18-21-12-894.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2022-11-01-18-25-00-630.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2022-11-03-17-45-17-233.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2022-11-03-17-45-34-816.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2022-11-03-17-57-53-460.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2023-04-14-11-23-23-628.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2023-04-14-11-52-59-810.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2023-07-03-15-39-04-652.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2023-07-03-15-39-20-544.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2023-07-03-15-39-57-113.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: MWV-IGW+IO使用率.png
|
|
|
|
|
|

|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: Top+100+Applications.pdf
|
|
|
|
|
|
[Top+100+Applications.pdf](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/30795/Top+100+Applications.pdf)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: TSG-9140+Status+(Device+Groups).html
|
|
|
|
|
|
[TSG-9140+Status+(Device+Groups).html](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/32554/TSG-9140+Status+(Device+Groups).html)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 更新前后会话日志情况.png
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 功能端日志情况记录-更新后.png
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 功能端日志情况记录-更新前.png
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 日志曲线.png
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 微信图片_20220825102505.png
|
|
|
|
|
|

|
|
|
|
|
|
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|