80 lines
2.5 KiB
Markdown
80 lines
2.5 KiB
Markdown
|
|
# 福建项目:功能端发日志出现大量丢日志的现象
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-513 | 2022-06-08T09:22:56.000+0800 | 戚岱杰 | 已关闭 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
2022.06.07 下午14点,用户生效了一个端口Monitor策略后,部分功能端出现大量丢日志的现象
|
|||
|
|
2022.06.07 晚关闭该策略,功能端就不丢日志了
|
|||
|
|
2022.06.08 白天继续打开该策略,用于排查丢日志现象
|
|||
|
|
|
|||
|
|
下图是2022.06.07下午统计的各个功能端发送日志情况:
|
|||
|
|
!screenshot-1.png|thumbnail!
|
|||
|
|
|
|||
|
|
备注:
|
|||
|
|
目前福州OLAP、泉州OLAP的kafka各3台
|
|||
|
|
功能端出现丢日志现象时,服务器网卡带宽约 7~9MB/s
|
|||
|
|
**liuxueli** commented on *2022-06-08T09:29:26.376+0800*:
|
|||
|
|
|
|||
|
|
* 根据杨阳反馈:kafka一共有三台服务器,三台kafka的带宽分别是 88MB 74MB 104MB ,怀疑已经达到了服务器的磁盘写入速度。
|
|||
|
|
* kafka broker的性能不足导致功能端的kafka客户端报错 _QUEUE_FULL(Local: Queue full) ;参照: [https://github.com/confluentinc/confluent-kafka-go/issues/346]
|
|||
|
|
** !image-2022-06-08-09-33-57-327.png!
|
|||
|
|
**
|
|||
|
|
{code:java}
|
|||
|
|
Tue Jun 7 18:35:58 2022, INFO, ./tsglog/tsglog, TSG_SEND_LOG, tsg_send_log to kafka is error of _QUEUE_FULL(Local: Queue full), status: -1, topic: SECURITY-EVENT payload:
|
|||
|
|
Tue Jun 7 18:35:58 2022, INFO, ./tsglog/tsglog, TSG_SEND_LOG, tsg_send_log to kafka is error of _QUEUE_FULL(Local: Queue full), status: -1, topic: SECURITY-EVENT payload:{code}
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**zhangzhihan** commented on *2022-06-08T16:56:37.690+0800*:
|
|||
|
|
|
|||
|
|
泉州OLAP三台部署kafka的服务器,写入在100MB/s时磁盘IO就已经达到100%,所以出现功能端丢日志现象
|
|||
|
|
最终将kafka从HDD切换到SSD后,写入正常,未出现IO高的问题,功能端也未丢日志
|
|||
|
|
后续将持续观察一段时间
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**doufenghu** commented on *2022-06-08T17:13:24.504+0800*:
|
|||
|
|
|
|||
|
|
需要跟踪下100MB/s IO 达到100%问题。[~qidaijie]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**qidaijie** commented on *2022-06-08T17:32:50.466+0800*:
|
|||
|
|
|
|||
|
|
现场情况:
|
|||
|
|
1:三台混部服务器仅有kafka使用磁盘;使用HDD时每秒写入次数在110左右,排除有大量小文件的情况。
|
|||
|
|
!FJ-Kafka-IO.png|thumbnail!
|
|||
|
|
2:Kafka使用SSD后,写入日志总量未变、Kafka配置未变动,使用率在20%左右。
|
|||
|
|
3:测试的HDD的IO性能与实际不符,需要对HDD阵列分析和重新测试IO。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**28502/FJ-Kafka-IO.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28453/image-2022-06-08-09-33-57-327.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**28451/screenshot-1.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|