82 lines
2.4 KiB
Markdown
82 lines
2.4 KiB
Markdown
|
|
# 【WMS-UTR项目】多个站点的tsgx出现tsg_os_container_restart告警
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-1207 | 2024-03-29T20:47:45.000+0800 | 杨威 | 已关闭 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
根据当地时间3月28日的告警,发现msh-tsgx01、pcap-tsgx01与 twa-tsgx01-05 均出现了一次或多次tsg_os_container_restart的告警且28日当天并未进行hotfix等包含重启的操作,附件为当地时间28号的告警信息**yangwei** commented on *2024-03-31T15:20:32.134+0800*:
|
|||
|
|
|
|||
|
|
*现象*
|
|||
|
|
|
|||
|
|
告警中tsg_os_container_restart分为两个时间段:
|
|||
|
|
|
|||
|
|
1、2024-03-28 13点-16点,Firewall container restart
|
|||
|
|
* pcap-tsgx01 16:15:14
|
|||
|
|
* twa-tsgx05 15:31:14
|
|||
|
|
* msh-tsgx01 14:51:14
|
|||
|
|
* twa-tsgx02 13:52:44
|
|||
|
|
|
|||
|
|
2、2024-03-28 00:00-00:26
|
|||
|
|
|
|||
|
|
twa-tsgx01至twa-tsgx05出现共计18次 packet–IO engine和Firewall container重启
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
*分析*
|
|||
|
|
* 检查现场导出的tws-tsgx02设备上的sos report(执行sos report --log-size=0命令)中的操作审计日志(sos_command/auditd_info文件),在时段2(00:00-00:26),存在升级网卡驱动和使用tsg-os-cli重启container操作,{*}时段2的18次重启应该是由于现场升级操作导致{*}
|
|||
|
|
** *!image-2024-03-31-15-31-25-944.png|width=983,height=252!*
|
|||
|
|
* 时段1 twa-tsgx02 13:52:44重启原因同OMPUB-1196 [WMS-UTR项目]: Firewall释放内存较慢触发watchdog timout导致SAPP应用重启
|
|||
|
|
* 时段1中pcap-tsgx01,twa-tsgx05 ,msh-tsgx01的重启原因,待现场导出Firewall watchdog相关日志后进一步分析,推测原因同twa-tsgx02 13:52:44重启原因
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**yangwei** commented on *2024-05-06T14:38:35.587+0800*:
|
|||
|
|
|
|||
|
|
4.30-5.6出现两次container restart,原因分别如下:
|
|||
|
|
* msh06,5.1,ssl decoder段错误,现场为解析chello->BtoL1BytesNum,需要hotfix ssl decoder
|
|||
|
|
* twa02,4.30,watchdog timeout,本地日志文件已经被回收,推测原因同https://jira.geedge.net/browse/OMPUB-1196
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**yangwei** commented on *2024-05-17T16:00:33.358+0800*:
|
|||
|
|
|
|||
|
|
至5.16,WMS现场未再出现因ssl decoder触发的段错误,暂时关闭本issue
|
|||
|
|
|
|||
|
|
因触发watchdog timeout导致的重启,在https://jira.geedge.net/browse/OMPUB-1196追踪进度
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**caoshanfeng** commented on *2024-08-29T17:30:03.070+0800*:
|
|||
|
|
|
|||
|
|
已更新,测试完成
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**54400/alert-message-2024-03-29+06-39-46.xlsx**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**54405/image-2024-03-31-15-31-25-944.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|