199 lines
5.9 KiB
Markdown
199 lines
5.9 KiB
Markdown
|
|
# 【M22项目】修改Split By类型Shaping Profile的速率后,无法打开网址
|
|||
|
|
|
|||
|
|
| ID | Creation Date | Assignee | Status |
|
|||
|
|
|----|----------------|----------|--------|
|
|||
|
|
| OMPUB-1269 | 2024-05-03T18:08:04.000+0800 | 杨威 | 处理中 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
描述:修改限制速率大小后无法打开网址,将Rule关闭后,可以打开网址
|
|||
|
|
|
|||
|
|
复现步骤:
|
|||
|
|
* 创建Shaping Rule如下图所示: !image-2024-05-03-16-35-44-425.png|thumbnail!
|
|||
|
|
* 修改Shaping Profile限制的速率大小如下图: !image-2024-05-03-16-36-34-036.png|thumbnail!
|
|||
|
|
* 访问地址:[https://www.youtube.com|https://www.youtube.com/]
|
|||
|
|
|
|||
|
|
当前问题:无法打开youtube
|
|||
|
|
|
|||
|
|
包已放到附件中**liuxueli** commented on *2024-05-04T12:25:33.291+0800*:
|
|||
|
|
|
|||
|
|
* 现象:
|
|||
|
|
** 命中shaping策略后,客户端正在下载文件的连接终止(下载速率为0),客户端访问网页(youtube)失败
|
|||
|
|
* 干扰
|
|||
|
|
** 测试时存在Monitor、statistics、shaping策略,仅存在一条生效的shaping策略
|
|||
|
|
** 因Hotfix存在重启设备,单位时间内仅存在一台设备重启
|
|||
|
|
* 复现
|
|||
|
|
** kill consul、shaping进程未复现问题
|
|||
|
|
** 随机挑选一台没有流量的设备重启即复现问题,测试仅存在一条生效的shaping策略且仅存在一个profile
|
|||
|
|
|
|||
|
|
*
|
|||
|
|
**
|
|||
|
|
*** 重启一台设备且仅重启一次
|
|||
|
|
* 结论
|
|||
|
|
** 复现问题时执行cluster sanity check命令查看存在异常的key,执行结果如下
|
|||
|
|
|
|||
|
|
*
|
|||
|
|
**
|
|||
|
|
***
|
|||
|
|
{code:java}
|
|||
|
|
[root@tsg-traffic-engine-vsys-1-shaping-c7d76f9c9-sgkd4 shaping_engine]# exit
|
|||
|
|
[root@MDY-ATOM-TSGX009 tsg-traffic-engine-vsys-1]# kubectl exec -it tsg-traffic-engine-vsys-1-shaping-c7d76f9c9-sgkd4 -- bash
|
|||
|
|
Defaulted container "shaping" out of: shaping, telegraf-shaping, log-dir-hook, init-default-svc (init), init-announce-svc (init), init-cm-svc (init), init-packet-io-engine-ready (init), shaping-init (init)
|
|||
|
|
[root@tsg-traffic-engine-vsys-1-shaping-c7d76f9c9-sgkd4 shaping_engine]# /opt/tsg/framework/bin/swarmkv-cli -n tsg-shaping-vsys1 -c 10.172.12.9:8500
|
|||
|
|
tsg-shaping-vsys1> cluster sanity check
|
|||
|
|
1) "tsg-shaping-4023-incoming"
|
|||
|
|
tsg-shaping-vsys1> cluster sanity check
|
|||
|
|
1) "tsg-shaping-4023-incoming"
|
|||
|
|
tsg-shaping-vsys1> cluster sanity check
|
|||
|
|
1) "tsg-shaping-4023-incoming"
|
|||
|
|
2) "tsg-shaping-4023-incoming"
|
|||
|
|
tsg-shaping-vsys1> cluster sanity check
|
|||
|
|
(integer) 0
|
|||
|
|
tsg-shaping-vsys1> cluster sanity check
|
|||
|
|
(integer) 0
|
|||
|
|
tsg-shaping-vsys1> {code}
|
|||
|
|
|
|||
|
|
* 疑点
|
|||
|
|
** 推测仅重启swarmkv集群的leader或者key owner会导致异常,但随机挑选一台无流量的设备重启即复现本问题
|
|||
|
|
* 下一步复测
|
|||
|
|
** 复现问题时,确认重启设备与swarmkv集群的leader或者key owner设备是否正相关。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**yangwei** commented on *2024-05-04T13:49:56.142+0800*:
|
|||
|
|
|
|||
|
|
M22现场 2024-05-04复测,在集群未有节点重启的情况下,概率性出现https://jira.geedge.net/browse/TSG-17649类似的现象,即一段时间后,客户端断网。
|
|||
|
|
|
|||
|
|
检查Shaping Engine的统计输出,发现异步调用的P80延迟为17.7ms,平均延迟为6.2ms
|
|||
|
|
|
|||
|
|
!image-2024-05-04-13-47-43-156.png|width=720,height=210!
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**yangwei** commented on *2024-05-04T17:49:59.229+0800*:
|
|||
|
|
|
|||
|
|
进一步进行测试验证,同样下发如下策略
|
|||
|
|
* splitby token bucket,限速为7.5Mbps
|
|||
|
|
* 限速条件为Client IP
|
|||
|
|
|
|||
|
|
{*}持续测试一段时间后,客户端出现断网的现象{*}。
|
|||
|
|
|
|||
|
|
!image-2024-05-04-17-41-45-545.png|width=613,height=305!
|
|||
|
|
|
|||
|
|
测试的{*}客户端流量分散在不同的站点的不同节点{*},在{*}YGN-MYTEL站点{*}检查Shaping Engine的{*}请求返回延迟,平均20ms。{*}
|
|||
|
|
|
|||
|
|
!image-2024-05-04-17-45-02-867.png|width=834,height=240!
|
|||
|
|
|
|||
|
|
使用 CLUSTER SANITY check检查集群状态,{*}显示存在异常的key{*}。
|
|||
|
|
|
|||
|
|
!image-2024-05-04-17-46-14-535.png|width=712,height=396!
|
|||
|
|
|
|||
|
|
{*}使用SwarmKV CLI模拟新的member(1233)进行消费{*},由于Split By的单位为local host,原则上模拟出的member可以独享一个7.5Mbps的Token_bucket,但是使用命令行{*}连续消费少量Token,仍然出现消费失败的情况,不符合预期。{*}
|
|||
|
|
|
|||
|
|
!image-2024-05-04-17-48-39-582.png!
|
|||
|
|
|
|||
|
|
上述客户端断网的情况,概率性出现,且在一段时间后能够自愈。
|
|||
|
|
|
|||
|
|
初步结论:issue中描述的问题,{*}与标题中修改profile的行为无直接关系{*},而是下发{*}SplitBy的限速Profile后{*},{*}一定概率会出现客户端断网的现象{*}。可能的原因为集群内部通信延迟较大后,由于SplitBy Token Bucket同步的数据量较大,Shaping Engine获取Token失败,造成客户端疑似断网的现象。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
M现场尝试下发Generic Token Bucket并测试,进一步确认上述结论。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**hebingning** commented on *2024-05-05T12:38:54.157+0800*:
|
|||
|
|
|
|||
|
|
5月4日多次进行测试,测试结果:
|
|||
|
|
|
|||
|
|
Shaping Rule引用Type为:Generic/Fair Share 的Shaping Profile
|
|||
|
|
|
|||
|
|
未发生断网情况
|
|||
|
|
|
|||
|
|
Shaping Rule引用Type为:Split By的Shaping Profile
|
|||
|
|
|
|||
|
|
出现客户端断网的情况[~yangwei]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**zhengchao** commented on *2024-05-06T10:46:21.985+0800*:
|
|||
|
|
|
|||
|
|
用SWARMKV的INFO命令看一下节点的同步带宽。
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**liuxueli** commented on *2024-05-06T11:51:31.355+0800*:
|
|||
|
|
|
|||
|
|
* 这个是客户端断网期间执行CLUSTER SANITY check / CLUSTER NODES / CLUSTER INFO三个命令的结果:
|
|||
|
|
** [^10.161.12.26_2024-05-04_15_34_58.txt]
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**zhengchao** commented on *2024-05-06T13:31:10.804+0800*:
|
|||
|
|
|
|||
|
|
Bulk Token Bucket同步消息是4Mb/msg,1个worker线程可能扛不住这么大的同步量
|
|||
|
|
{quote}17) 1) "10.168.12.6:30745"
|
|||
|
|
|
|||
|
|
sync_err: 882383
|
|||
|
|
|
|||
|
|
instantaneous_input_kbps: 234599.00
|
|||
|
|
instantaneous_input_msgs: 69.00{quote}
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Attachments
|
|||
|
|
|
|||
|
|
**56643/10.161.12.26_2024-05-04_15_34_58.txt**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56612/image-2024-05-03-16-35-44-425.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56611/image-2024-05-03-16-36-34-036.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56623/image-2024-05-04-13-47-43-156.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56625/image-2024-05-04-17-41-45-545.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56626/image-2024-05-04-17-45-02-867.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56627/image-2024-05-04-17-46-14-535.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56628/image-2024-05-04-17-48-39-582.png**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**56610/youtube.pcapng**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|