49 lines
1.2 KiB
Markdown
49 lines
1.2 KiB
Markdown
|
|
# P19环境:NZ 10.10.20.159服务器 prometheus wal 目录占用磁盘过大
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-967 | 2023-07-17T14:16:54.000+0800 | 史振东 | 已关闭 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
!image-2023-07-17-11-16-07-018.png!
|
|||
|
|
|
|||
|
|
!image-2023-07-17-11-16-30-474.png!
|
|||
|
|
|
|||
|
|
**shizhendong** commented on *2023-07-18T16:01:03.467+0800*:
|
|||
|
|
|
|||
|
|
配置模式:指标数据=本地存储;联邦关闭;10.159&20.159 部署 Global Nz-agent 水平扩展,互相写指标数据
|
|||
|
|
|
|||
|
|
问题原因:10.159 prometheus 通过 remote write 向 20.159 推送数据时,写入了脏数据
|
|||
|
|
|
|||
|
|
如何解决:
|
|||
|
|
* prometheus remote write 配置优化
|
|||
|
|
* 删除 wal 目录并重启 prometheus 服务
|
|||
|
|
|
|||
|
|
排查过程:
|
|||
|
|
* 排查 prometheus 服务日志,发现大量 “out of order sample” 异常
|
|||
|
|
* 猜测由于 “out of order sample” 引起的 wal 目录数据激增
|
|||
|
|
* 通过模拟 P现场 部署模式进行测试,确认为 promtheus remote write 数据有误,确认为 wal 目录数据激增的原因
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**41394/image-2023-07-17-11-16-07-018.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**41393/image-2023-07-17-11-16-30-474.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**41475/prometheus-20.159.log**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|