52 lines
1.6 KiB
Markdown
52 lines
1.6 KiB
Markdown
# P19环境:NZ 10.10.20.159服务器 prometheus wal 目录占用磁盘过大
|
||
|
||
| ID | Creation Date | Assignee | Status |
|
||
|----|----------------|----------|--------|
|
||
| OMPUB-967 | 2023-07-17T14:16:54.000+0800 | 史振东 | 已关闭 |
|
||
|
||
|
||
---
|
||
|
||
!image-2023-07-17-11-16-07-018.png!
|
||
|
||
!image-2023-07-17-11-16-30-474.png!
|
||
|
||
**shizhendong** commented on *2023-07-18T16:01:03.467+0800*:
|
||
|
||
配置模式:指标数据=本地存储;联邦关闭;10.159&20.159 部署 Global Nz-agent 水平扩展,互相写指标数据
|
||
|
||
问题原因:10.159 prometheus 通过 remote write 向 20.159 推送数据时,写入了脏数据
|
||
|
||
如何解决:
|
||
* prometheus remote write 配置优化
|
||
* 删除 wal 目录并重启 prometheus 服务
|
||
|
||
排查过程:
|
||
* 排查 prometheus 服务日志,发现大量 “out of order sample” 异常
|
||
* 猜测由于 “out of order sample” 引起的 wal 目录数据激增
|
||
* 通过模拟 P现场 部署模式进行测试,确认为 promtheus remote write 数据有误,确认为 wal 目录数据激增的原因
|
||
|
||
|
||
|
||
---
|
||
|
||
|
||
|
||
# Attachments
|
||
|
||
Attachment: image-2023-07-17-11-16-07-018.png
|
||

|
||
|
||
|
||
|
||
Attachment: image-2023-07-17-11-16-30-474.png
|
||

|
||
|
||
|
||
|
||
Attachment: prometheus-20.159.log
|
||
[prometheus-20.159.log](https://gfwleak.exec.li/admin/geedge-jira/raw/branch/master/attachment/41475/prometheus-20.159.log)
|
||
|
||
|
||
|