2025-09-14 21:52:36 +00:00
|
|
|
|
# K现场21.09版本session record日志查询慢问题
|
|
|
|
|
|
|
|
|
|
|
|
| ID | Creation Date | Assignee | Status |
|
|
|
|
|
|
|----|----------------|----------|--------|
|
|
|
|
|
|
| OMPUB-306 | 2021-12-23T13:43:37.000+0800 | 窦凤虎 | 已关闭 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
2021.11 [~jiaojianzhi] 反馈:
|
|
|
|
|
|
|
|
|
|
|
|
TSG界面查询session records,页面中柱状图能够刷新出来,但是下面的日志条目无法刷新出来
|
|
|
|
|
|
|
|
|
|
|
|
参见附件视频**liuyang** commented on *2021-12-23T13:45:28.710+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
[~doufenghu] 反馈,根据文档逐步排查
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**jiaojianzhi** commented on *2021-12-23T23:30:56.272+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
根据《ck查询慢问题排查方案》排查,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
使用文档中第一部分,第二条命令排查。
|
|
|
|
|
|
|
|
|
|
|
|
排查到10.4.61.46、55、56三台clickhouse服务器网络连接不稳定,怀疑是导致问题的原因,继续排查发现其他服务器也出现过类似错误,
|
|
|
|
|
|
|
|
|
|
|
|
每台机器双向两台服务器(从61.38\61.39)ping了200个icmp数据包没有丢包和错报,
|
|
|
|
|
|
|
|
|
|
|
|
测试发送大文件,使用ddif 创建了一个20G的文件,传输没有问题。
|
|
|
|
|
|
|
|
|
|
|
|
遂排除;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
使用第二部分的的命令进行排查
|
|
|
|
|
|
|
|
|
|
|
|
发现61.39服务器查询相对于38非常缓慢,
|
|
|
|
|
|
|
|
|
|
|
|
更改所有clickhouse服务器如下参数(详见gohangout日志查询v2)
|
|
|
|
|
|
|
|
|
|
|
|
!image-2021-12-23-21-27-06-607.png|width=348,height=164!
|
|
|
|
|
|
|
|
|
|
|
|
查看12月日志显示时间在一秒以内,功能恢复正常。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**jiaojianzhi** commented on *2021-12-30T23:27:22.754+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
12.30日 上午
|
|
|
|
|
|
|
|
|
|
|
|
61.4可能有故障,日志和相关配置已经发送给王宽。
|
|
|
|
|
|
|
|
|
|
|
|
12.30日 下午
|
|
|
|
|
|
|
|
|
|
|
|
提取61.38 61.39 35.31 三台机器的配置和日志
|
|
|
|
|
|
|
|
|
|
|
|
重新所有clickhouse服务解决了问题。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**wangkuan** commented on *2021-12-31T17:22:42.593+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
21.12.30
|
|
|
|
|
|
|
|
|
|
|
|
主要排查步骤如下
|
|
|
|
|
|
详细排查步骤见《日志慢查询排查复盘文档21.12.30》
|
|
|
|
|
|
1.查看服务器连接数:正常
|
|
|
|
|
|
2.查看服务器网络状况:正常,有少量close_wait状态
|
|
|
|
|
|
3.检查ck查询节点数据节点配置文件:正常,有两个参数未同步
|
|
|
|
|
|
4.更改日志级别,查看各数据节点的sql查询时间:很多数据节点连接超时
|
|
|
|
|
|
5.检查登录超时服务器状态:正常
|
|
|
|
|
|
6.在超时服务器登录ck命令行:超时
|
|
|
|
|
|
7.检查登录超时服务器ck的metrcis:正常
|
|
|
|
|
|
8.检查登录超时服务器ck日志:有网络连接失败的日志
|
|
|
|
|
|
9.重启ck所有数据节点
|
|
|
|
|
|
10.查询恢复正常
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
后续排查
|
|
|
|
|
|
将提供详细的排查步骤。
|
|
|
|
|
|
1.查看界面有没有日志加载慢的情况。
|
|
|
|
|
|
2.查看登录ck数据节点的速度。
|
|
|
|
|
|
3.如果还出现慢查询问题,可停止gohangout写入,看看是否还出现慢查询问题
|
|
|
|
|
|
定位是否慢查询与gohangout有关
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
问题总结
|
|
|
|
|
|
当前基本可断定为CK问题,在未复现和定位具体原因前
|
|
|
|
|
|
需要对CK节点增加守护或心跳检测机制,出现连接超时,自动重启该节点。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**doufenghu** commented on *2022-04-02T11:44:58.538+0800*:
|
|
|
|
|
|
|
|
|
|
|
|
如上问题,在新疆电信复现,相应BUG https://jira.geedge.net/browse/OMPUB-422。
|
|
|
|
|
|
* 分析过程
|
|
|
|
|
|
** 经分析和排查,此问题为ClickHouse 自身bug,持续运行较长时间后(目前为6个月以上),使用clickhouse-client无法登录本地ClickHouse,Http SQL查询超时。
|
|
|
|
|
|
** 社区描述与线程池连接相关,未能有效复现该问题 ,可参考 [https://github.com/ClickHouse/ClickHouse/issues/19409]
|
|
|
|
|
|
** 社区提交反馈issue [https://github.com/ClickHouse/ClickHouse/issues/35606],因ClickHouse 版本问题(v20.3),Maintainers人员回复过时版本不做维护,无法准确知道该故障原因。
|
|
|
|
|
|
* 处置方式
|
|
|
|
|
|
** 增加ClickHouse慢节点监控与告警(TSG 22.03),及时发现查询超时现象
|
|
|
|
|
|
** 升级ClickHouse版本v20.3 -> v21.8.13.1,后续持续跟踪该问题。(PoC 测试中,预计TSG22.06升级)
|
|
|
|
|
|
** 当前处置方式为:重启ClickHouse各个节点进行恢复 ,参考[https://docs.geedge.net/display/PDG/Deployment+Specifiction]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
# Attachments
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: 77575209bf0f65d24830ad103cf00aba.mp4
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[77575209bf0f65d24830ad103cf00aba.mp4](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/24130/77575209bf0f65d24830ad103cf00aba.mp4)
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: ck查询慢问题排查方案21.6.7.docx
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[ck查询慢问题排查方案21.6.7.docx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/24131/ck查询慢问题排查方案21.6.7.docx)
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: ck慢查询排查文档及检测脚本_22.01.05.zip
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[ck慢查询排查文档及检测脚本_22.01.05.zip](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/24485/ck慢查询排查文档及检测脚本_22.01.05.zip)
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: gohangout日志查询v2.txt
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[gohangout日志查询v2.txt](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/24160/gohangout日志查询v2.txt)
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
Attachment: image-2021-12-23-21-27-06-607.png
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 会话日志查询慢排查20211124.docx
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[会话日志查询慢排查20211124.docx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/24132/会话日志查询慢排查20211124.docx)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attachment: 日志慢查询排查复盘文档21.12.30.docx
|
2025-09-14 22:27:11 +00:00
|
|
|
|
|
2025-09-14 22:26:17 +00:00
|
|
|
|
[日志慢查询排查复盘文档21.12.30.docx](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/24397/日志慢查询排查复盘文档21.12.30.docx)
|
|
|
|
|
|
|
2025-09-14 21:52:36 +00:00
|
|
|
|
|
|
|
|
|
|
|