174 lines
7.0 KiB
Markdown
174 lines
7.0 KiB
Markdown
## Variable Monitor
|
|
|
|
changelog
|
|
|
|
```log
|
|
11.9 多个变量监控支持
|
|
11.10 按照 pid 区分不同内核结构, 支持每个进程单独申请取消自己的监控.
|
|
11.13 用户接口 cancel_all_watch -> cancel_watch, 每个进程互不干扰.
|
|
```
|
|
|
|
## 说明
|
|
|
|
监控 数值变量(给定 地址,长度), 超过设定条件打印系统堆栈信息.
|
|
|
|
同时监控数量
|
|
- 相同定时长度的监控 会被分为一组,对应一个定时器.
|
|
- 一组最多 32 个变量,超过后会分配一个新的定时器.
|
|
- 定时器数量全局最多 128 个.
|
|
- 以上数量限制定义在 `watch_module.h` 头部宏.
|
|
|
|
## 使用
|
|
|
|
示例如 helloworld.c
|
|
- 添加 `#include "watch.h"`
|
|
- 对每个需要监控的变量 设置: 名称 && 地址 && 长度, 设置阈值, 比较方式, 定时器间隔(ns) 等.
|
|
- `start_watch(watch_arg);` 启动监控
|
|
- 需要取消监控时调用 `cancel_watch();`
|
|
|
|
超出设定条件时,打印系统堆栈信息, `dmesg` 查看,如下示例:
|
|
- 一个定时器内,多个变量超过阈值,堆栈信息不会重复输出;
|
|
- 打印堆栈后定时器再启动时间为 1s, 1s 后开始下一个轮次监控.
|
|
|
|
```log
|
|
[ 713.225894] -------------------------------------
|
|
[ 713.225900] -------------watch monitor-----------
|
|
[ 713.225900] Threshold reached:
|
|
[ 713.225901] name: temp0, threshold: 150, pid: 4261
|
|
[ 713.225902] name: temp1, threshold: 151, pid: 4261
|
|
[ 713.225903] name: temp2, threshold: 152, pid: 4261
|
|
[ 713.225904] name: temp3, threshold: 153, pid: 4261
|
|
[ 713.225904] name: temp4, threshold: 154, pid: 4261
|
|
[ 713.225905] name: temp5, threshold: 155, pid: 4261
|
|
[ 713.225905] name: temp6, threshold: 156, pid: 4261
|
|
[ 713.225906] name: temp7, threshold: 157, pid: 4261
|
|
[ 713.225906] name: temp8, threshold: 158, pid: 4261
|
|
[ 713.225907] name: temp9, threshold: 159, pid: 4261
|
|
[ 713.225907] name: temp10, threshold: 160, pid: 4261
|
|
[ 713.225908] name: temp11, threshold: 161, pid: 4261
|
|
[ 713.225908] name: temp12, threshold: 162, pid: 4261
|
|
[ 713.225909] name: temp13, threshold: 163, pid: 4261
|
|
[ 713.225909] name: temp14, threshold: 164, pid: 4261
|
|
[ 713.225910] name: temp15, threshold: 165, pid: 4261
|
|
[ 713.225910] name: temp16, threshold: 166, pid: 4261
|
|
[ 713.225911] name: temp17, threshold: 167, pid: 4261
|
|
[ 713.225911] name: temp18, threshold: 168, pid: 4261
|
|
[ 713.225912] name: temp19, threshold: 169, pid: 4261
|
|
[ 713.225912] name: temp20, threshold: 170, pid: 4261
|
|
[ 713.225913] name: temp21, threshold: 171, pid: 4261
|
|
[ 713.225913] name: temp22, threshold: 172, pid: 4261
|
|
[ 713.225914] name: temp23, threshold: 173, pid: 4261
|
|
[ 713.225914] name: temp24, threshold: 174, pid: 4261
|
|
[ 713.225915] name: temp25, threshold: 175, pid: 4261
|
|
[ 713.225915] name: temp26, threshold: 176, pid: 4261
|
|
[ 713.225916] name: temp27, threshold: 177, pid: 4261
|
|
[ 713.225916] name: temp28, threshold: 178, pid: 4261
|
|
[ 713.225916] name: temp29, threshold: 179, pid: 4261
|
|
[ 713.225917] name: temp30, threshold: 180, pid: 4261
|
|
[ 713.225917] name: temp31, threshold: 181, pid: 4261
|
|
[ 713.225918] Timestamp (ns): 1699846710299420862
|
|
[ 713.225919] Recent Load: 0.05, 0.12, 0.08
|
|
[ 713.225921] task: name rcu_gp, pid 3, state 1026
|
|
[ 713.225926] rescuer_thread+0x290/0x390
|
|
[ 713.225931] kthread+0xd7/0x100
|
|
[ 713.225932] ret_from_fork+0x1f/0x30
|
|
[ 713.225935] task: name rcu_par_gp, pid 4, state 1026
|
|
[ 713.225936] rescuer_thread+0x290/0x390
|
|
[ 713.225937] kthread+0xd7/0x100
|
|
[ 713.225938] ret_from_fork+0x1f/0x30
|
|
[ 713.225940] task: name netns, pid 5, state 1026
|
|
[ 713.225941] rescuer_thread+0x290/0x390
|
|
[ 713.225942] kthread+0xd7/0x100
|
|
```
|
|
|
|
### 参数说明
|
|
|
|
start_watch 传入的是 watch_arg 结构体.各个字段意义如下
|
|
- name 限制 `MAX_NAME_LEN`(15) 个有效字符
|
|
|
|
```c
|
|
typedef struct
|
|
{
|
|
pid_t task_id; // current process id
|
|
char name[MAX_NAME_LEN + 1]; // name (15+1)
|
|
void *ptr; // virtual address
|
|
int length_byte; // byte
|
|
long long threshold; // threshold value
|
|
unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed)
|
|
unsigned char greater_flag; // reverse flag (true: >, false: <)
|
|
unsigned long time_ns; // timer interval (ns)
|
|
} watch_arg;
|
|
```
|
|
|
|
一个初始化示例
|
|
|
|
```c
|
|
watch_args = (watch_arg){
|
|
.task_id = getpid(),
|
|
.ptr = &temp,
|
|
.name = "temp",
|
|
.length_byte = sizeof(int),
|
|
.threshold = 150 + i,
|
|
.unsigned_flag = 0,
|
|
.greater_flag = 1,
|
|
.time_ns = 2000 + (i / 33) * 5000
|
|
};
|
|
```
|
|
|
|
## demo
|
|
|
|
项目主文件下
|
|
- `helloworld.c`: 测试大量变量监控
|
|
- `hptest.c`: 测试 hugePage 挂载
|
|
|
|
```bash
|
|
# 编译加载模块
|
|
make && insmod variable_monitor.ko
|
|
./helloworld
|
|
```
|
|
|
|
dmesg 可以看到打印的堆栈信息
|
|
|
|
```bash
|
|
# 卸载模块,清理编译文件
|
|
rmmod variable_monitor.ko && make clean
|
|
```
|
|
|
|
仅在 `kernel 5.17.15-1.el8.x86_64` 测试,其他内核版本未测试.
|
|
|
|
## 其他
|
|
|
|
程序分为两部分: 字符设备 和 用户空间接口, 两者通过 ioctl 通信.
|
|
|
|
用户空间地址访问
|
|
- 用户程序传入的变量 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核.
|
|
- 192.168.40.204 环境下,HugeTLB Pages 测试挂载正常.
|
|
- 内存页地址 + 偏移量存入定时器对应的 `kernel_watch_arg` 中, hrTimer 轮询时访问 `kernel_watch_arg` 得到真实值.
|
|
|
|
定时器分组
|
|
- hrTimer 数据结构定义在全局数组 `kernel_wtimer_list`.分配定时器时,会检查遍历 `kernel_wtimer_list` 比较定时器间隔,
|
|
- 相同定时间隔的 watch 分配到同一组,对应同一个 hrTimer.
|
|
- 若一个定时器监控变量数量超过 `TIMER_MAX_WATCH_NUM` (32),则会创建一个新的 hrTimer.
|
|
- hrTimer 的总数量(`kernel_wtimer_list` 数组长度)限制是 `MAX_TIMER_NUM`(128).
|
|
|
|
内存页 mount/unmount
|
|
- `get_user_pages_remote`/ `kmap` 会增加对应的计数,需要对等的 `put_page`/`kunmap`.
|
|
- 一个模块内全局链表 `watch_local_memory_list` 存储每一个成功挂载的变量对应的 page 和 kt,执行字符设备的 close 操作时,遍历并卸载.
|
|
|
|
variable monitor 添加/删除
|
|
- kernel_watch_arg 数据结构中有 pid 的成员变量,但添加变量监控时,不按照进程区分.
|
|
- 删除时遍历全部监控变量,比较 pid.
|
|
- 删除造成的缺位,将最后的变量移动到空位, sentinel--; hrTimer 同理.
|
|
|
|
堆栈输出条件: 条件参考自 [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209)
|
|
- `TASK` 要满足 TASK_RUNNING 和 `__task_contributes_to_load` 和 `TASK_IDLE`(可能有阻塞进程).
|
|
- `__task_contributes_to_load` 对应内核宏 `task_contributes_to_loa`.
|
|
|
|
```c
|
|
// https://www.spinics.net/lists/kernel/msg3582022.html
|
|
// remove from 5.8.rc3,but it still work
|
|
// whether the task contributes to the load
|
|
#define __task_contributes_to_load(task) \
|
|
((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 && \
|
|
(READ_ONCE(task->__state) & TASK_NOLOAD) == 0)
|
|
``` |