117 lines
2.8 KiB
Markdown
117 lines
2.8 KiB
Markdown
|
|
# 【WMS-UTR项目】出现tsg_os_node_disk_pressure告警
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-1258 | 2024-04-25T11:47:36.000+0800 | 付明卫 | 已关闭 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
在当地时间4月24日出现了tsg_os_node_disk_pressure告警,附件为告警信息与对应的监控信息**caoshanfeng** commented on *2024-04-26T09:59:28.209+0800*:
|
|||
|
|
|
|||
|
|
当地时间4月25日再次出现tsg_os_node_disk_pressure告警 附件位告警信息与监控信息 !4月25日告警.png|thumbnail! [^4月25日pcap-tsgx03.html]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**fu_mingwei** commented on *2024-04-26T19:11:15.874+0800*:
|
|||
|
|
|
|||
|
|
北京时间2024/4/16 17点以后,在pcap-tsgx03上发现sd相关pod发生重启,sd redis的详细信息如下:
|
|||
|
|
* tsg-os-system命名空间下的所有pod信息([^tsg-os-system-pods.txt])
|
|||
|
|
* sd pods log信息([^sd-redis-pod-logs.txt])
|
|||
|
|
* sd pods describe信息([^sd-redis-describe-info.txt])
|
|||
|
|
* sd redis info信息([^sd-redis-cli-info.txt])
|
|||
|
|
* sd redis pod中的rdb文件信息(
|
|||
|
|
** !sd-redis-rdb-size.png!
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**fu_mingwei** commented on *2024-04-26T19:45:16.780+0800*:
|
|||
|
|
|
|||
|
|
问题原因分析:
|
|||
|
|
|
|||
|
|
在OS SD redis启动时,同步数据过程中,因网络原因导致OS SD从redis同步失败,同时失败后会遗留一个temp rdb文件,同时也导致SD redis container redisness指针不可用,导致SD redis container重启。 以上过程不断重复,会累计越来越多的temp rdb文件。 直到触发k3s diskpressure。当diskpressure出现时,触发pod驱逐。当pod被驱逐后,会删除pod遗留的相关文件。
|
|||
|
|
|
|||
|
|
问题1:
|
|||
|
|
|
|||
|
|
问:为什么SD redis container重启后不删除temp rdb文件?
|
|||
|
|
|
|||
|
|
答: SD redis container temp文件存储在empty dir卷中,empty dir卷的生命周期和pod生命周期一致。当redis container重启时,删除temp rdb文件。
|
|||
|
|
|
|||
|
|
问题2:
|
|||
|
|
|
|||
|
|
问:为什么OS SD redis同步数据不成功?
|
|||
|
|
|
|||
|
|
答:请跳转到https://jira.geedge.net/browse/TSG-19928查看原因。
|
|||
|
|
|
|||
|
|
参考文档:
|
|||
|
|
* [https://github.com/kubernetes/kubernetes/issues/117924]
|
|||
|
|
* https://github.com/kubernetes/kubernetes/issues/94861
|
|||
|
|
* [https://stackoverflow.com/questions/54824752/two-rdb-files-in-var-lib-redis-dir]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**fu_mingwei** commented on *2024-04-26T19:47:05.608+0800*:
|
|||
|
|
|
|||
|
|
修复问题版本:OS v24.02即以后版本。
|
|||
|
|
|
|||
|
|
该issue产生原因和[https://github.com/kubernetes/kubernetes/issues/117924]相同。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**caoshanfeng** commented on *2024-08-29T17:34:34.037+0800*:
|
|||
|
|
|
|||
|
|
已更新,观察无问题
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**56244/4月24日告警信息.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56293/4月25日pcap-tsgx03.html**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56292/4月25日告警.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56337/sd-redis-cli-info.txt**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56336/sd-redis-describe-info.txt**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56335/sd-redis-pod-logs.txt**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56338/sd-redis-rdb-size.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56334/tsg-os-system-pods.txt**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56243/twa-tsgx05+(1).html**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|