Files
geedge-jira/md/OMPUB-1207.md
2025-09-14 21:52:36 +00:00

82 lines
2.4 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 【WMS-UTR项目】多个站点的tsgx出现tsg_os_container_restart告警
| ID | Creation Date | Assignee | Status |
|----|----------------|----------|--------|
| OMPUB-1207 | 2024-03-29T20:47:45.000+0800 | 杨威 | 已关闭 |
---
根据当地时间3月28日的告警发现msh-tsgx01、pcap-tsgx01与 twa-tsgx01-05 均出现了一次或多次tsg_os_container_restart的告警且28日当天并未进行hotfix等包含重启的操作附件为当地时间28号的告警信息**yangwei** commented on *2024-03-31T15:20:32.134+0800*:
*现象*
告警中tsg_os_container_restart分为两个时间段
1、2024-03-28 13点-16点Firewall container restart
* pcap-tsgx01 16:15:14
* twa-tsgx05 15:31:14
* msh-tsgx01 14:51:14
* twa-tsgx02 13:52:44
2、2024-03-28 00:00-00:26
     twa-tsgx01至twa-tsgx05出现共计18次 packetIO engine和Firewall container重启
 
*分析*
* 检查现场导出的tws-tsgx02设备上的sos report执行sos report --log-size=0命令中的操作审计日志sos_command/auditd_info文件在时段200:00-00:26存在升级网卡驱动和使用tsg-os-cli重启container操作{*}时段2的18次重启应该是由于现场升级操作导致{*}
** *!image-2024-03-31-15-31-25-944.png|width=983,height=252!*
* 时段1 twa-tsgx02 13:52:44重启原因同OMPUB-1196 [WMS-UTR项目]: Firewall释放内存较慢触发watchdog timout导致SAPP应用重启
* 时段1中pcap-tsgx01twa-tsgx05 msh-tsgx01的重启原因待现场导出Firewall watchdog相关日志后进一步分析推测原因同twa-tsgx02 13:52:44重启原因
 
 
---
**yangwei** commented on *2024-05-06T14:38:35.587+0800*:
4.30-5.6出现两次container restart原因分别如下
* msh065.1ssl decoder段错误现场为解析chello->BtoL1BytesNum需要hotfix ssl decoder
* twa024.30watchdog timeout本地日志文件已经被回收推测原因同https://jira.geedge.net/browse/OMPUB-1196
---
**yangwei** commented on *2024-05-17T16:00:33.358+0800*:
至5.16WMS现场未再出现因ssl decoder触发的段错误暂时关闭本issue
因触发watchdog timeout导致的重启在https://jira.geedge.net/browse/OMPUB-1196追踪进度
---
**caoshanfeng** commented on *2024-08-29T17:30:03.070+0800*:
已更新,测试完成
---
## Attachments
**54400/alert-message-2024-03-29+06-39-46.xlsx**
---
**54405/image-2024-03-31-15-31-25-944.png**
---