diff --git a/README.md b/README.md index 409f05b..c71deda 100644 --- a/README.md +++ b/README.md @@ -1,76 +1,83 @@ ## Variable Monitor -Monitor numerical variables (given address, length), and print system stack information when the set conditions are exceeded. - -Number of simultaneous monitoring -- Monitoring with the same timing length will be grouped into one group, corresponding to one timer. -- A set of up to 32 variables, after which a new timer is allocated. -- The global maximum number of timers is 128. -- The above quantity limit is defined in the `watch_module.h` header macro. - -Currently, monitoring is limited to the same application, and simultaneous calls from multiple applications are not currently supported. -- Multiple applications can work normally if only one program calls `cancel_all_watch();`. - -## Usage - -Example: helloworld.c -- Add `#include "watch.h"` -- Set each variable that needs to be monitored: name && address && length, set threshold, comparison method, timer interval (ns), etc. -- `start_watch(watch_arg);` Start monitoring -- Call `cancel_all_watch();` when you need to cancel monitoring - -When the set conditions are exceeded, the system stack information is printed and viewed with `dmesg`, as shown in the following example: -- Within a timer, if multiple variables exceed the threshold, the stack information will not be output repeatedly; -- The timer restart time after printing the stack is 1s, and the next round of monitoring will start after 1s. +changelog ```log -[86245.364861] ------------------------------------- -[86245.364864] -------------watch monitor----------- -[86245.364865] Threshold reached: - name: temp0, threshold: 150 -[86245.364866] Timestamp (ns): 1699589000606300743 -[86245.364867] Recent Load: 116.65, 126.83, 151.17 -[86245.365669] task: name lcore-worker-4, pid 803327 -[86245.365672] task: name lcore-worker-5, pid 803328 -[86245.365673] task: name lcore-worker-6, pid 803329 -[86245.365674] task: name lcore-worker-7, pid 803330 -[86245.365676] task: name lcore-worker-8, pid 803331 -[86245.365677] task: name lcore-worker-9, pid 803332 -[86245.365679] task: name lcore-worker-10, pid 803333 -[86245.365681] task: name lcore-worker-11, pid 803334 -[86245.365682] task: name lcore-worker-68, pid 803335 -[86245.365683] task: name lcore-worker-69, pid 803336 -[86245.365684] task: name lcore-worker-70, pid 803337 -[86245.365685] task: name lcore-worker-71, pid 803338 -[86245.365686] task: name lcore-worker-72, pid 803339 -[86245.365687] task: name lcore-worker-73, pid 803340 -[86245.365688] task: name lcore-worker-74, pid 803341 -[86245.365689] task: name lcore-worker-75, pid 803342 -[86245.365694] task: name pkt:worker-0, pid 803638 -[86245.365702] hrtimer_nanosleep+0x8d/0x120 -[86245.365709] __x64_sys_nanosleep+0x96/0xd0 -[86245.365711] do_syscall_64+0x37/0x80 -[86245.365716] entry_SYSCALL_64_after_hwframe+0x44/0xae -[86245.365718] task: name pkt:worker-1, pid 803639 -[86245.365721] hrtimer_nanosleep+0x8d/0x120 -[86245.365724] __x64_sys_nanosleep+0x96/0xd0 -[86245.365726] do_syscall_64+0x37/0x80 -[86245.365728] entry_SYSCALL_64_after_hwframe+0x44/0xae -[86245.365730] task: name pkt:worker-2, pid 803640 -[86245.365732] hrtimer_nanosleep+0x8d/0x120 -[86245.365734] __x64_sys_nanosleep+0x96/0xd0 -[86245.365737] do_syscall_64+0x37/0x80 -[86245.365739] entry_SYSCALL_64_after_hwframe+0x44/0xae -[86245.365740] task: name pkt:worker-3, pid 803641 -[86245.365743] hrtimer_nanosleep+0x8d/0x120 +11.9 多个变量监控支持 +11.10 按照 pid 区分不同内核结构, 支持每个进程单独申请取消自己的监控. +11.13 用户接口 cancel_all_watch -> cancel_watch, 每个进程互不干扰. +11.28 完全重构,更新文档. ``` -### Parameter Description +## 说明 -start_watch passes in the watch_arg structure. The meaning of each field is as follows -- name limit `MAX_NAME_LEN`(15) valid characters +监控 数值变量(给定 地址,长度), 达到设定条件打印系统内 Task 信息(用户态堆栈/内核态堆栈/调用链信息). +- 支持多进程, 单个进程退出时,取消该进程的所有监控. +- 相同定时间隔会分配到同一个定时器,一个定时器最多监控 32 个变量,全局最多 128 个定时器. + - 以上数量限制定义在 `source/module/monitor_timer.h`. + - `testcase/helloworld.c` 有测试到单进程 2049 个变量; + +文件结构 + +```log +├── build // output +├── source // all source code +│ ├── buffer // 模块与用户空间通信的缓冲区 +│ ├── module // 模块代码 +│ ├── uapi // 用户空间接口 +│ ├── ucli // 用户空间命令行工具 +│ └── ucli_py // 用户空间命令行 python (仅测试用,待完成) +│ └── libunwind // python 解析堆栈信息移植库 +├── testcase // 测试用例 +└── tools // 测试工具 +``` + +## 使用 + +设定对变量监控有两种函数: 宏定义 或 定义 watch_arg 结构体 +- 都需要添加 `source/uapi` 下的头文件 `#include "monitor_user.h"` + +需要取消监控时调用 `cancel_watch();` variant_monitor 会取消该进程所有监控. +- 当进程退出后,也会执行相同的操作,取消该进程所有监控. +- 因此调用 `cancel_watch();` 是个可选项,但依然建议调用以避免可能的内存泄漏. + +获取 Task 信息是一项耗时操作,这里使用了 workqueue 处理,且一次处理后该定时器重启间隔默认为 5s. +- 此值可以在 `/proc/variable_monitor/dump_reset_sec` 查看和修改. + +### 挂载驱动 + +项目根目录 + +```bash +# 编译加载模块 +make && insmod source/variable_monitor.ko +# 卸载模块,清理编译文件 +# rmmod source/variable_monitor.ko && make clean +# 仅在 `kernel 5.17.15-1.el8.x86_64` 测试,其他内核版本未测试. +``` + +### 宏定义 + +示例如 `testcase/helloworld.c`, 对常见数值类型宏定义 方便使用: +- 其他类型见 `source/uapi/monitor_user_sw.h` +```c +// 传入变量名 | 地址 | 阈值 +START_WATCH_INT("temp", &temp, 150); +START_WATCH_INT_LESS("temp", &temp, 150); +``` + +默认情况下,使用宏定义 定时器的时间间隔为 10us; 此值可以在 `/proc/variable_monitor/def_interval_ns` 查看和修改. + +### watch_arg 结构体 + +如果需要对定时间隔等有更多控制,请定义 watch_arg 结构体,start_watch 启动监控: +- 对每个需要监控的变量 设置: 名称 && 地址 && 长度, 设置阈值, 比较方式, 定时器间隔(ns) 等. +- `start_watch(watch_arg);` 启动监控 +- 需要取消监控时调用 `cancel_watch();` ```c +// start_watch 传入的是 watch_arg 结构体.各个字段意义如下 +// - name 限制 `MAX_NAME_LEN`(15) 个有效字符 typedef struct { pid_t task_id; // current process id @@ -82,63 +89,156 @@ typedef struct unsigned char greater_flag; // reverse flag (true: >, false: <) unsigned long time_ns; // timer interval (ns) } watch_arg; -``` -An initialization example - -```c +//一个初始化示例 watch_args = (watch_arg){ .task_id = getpid(), .ptr = &temp, .name = "temp", .length_byte = sizeof(int), - .threshold = 150 + i, + .threshold = 150, .unsigned_flag = 0, .greater_flag = 1, - .time_ns = 2000 + (i / 33) * 5000 + .time_ns = 2000 + 5000 }; +start_watch(watch_args); +``` + +### 打印输出 + +定时器不断按照设定间隔轮询变量,当达到设定条件时,采集此时系统内符合要求的 Task 信息(用户态堆栈/内核态堆栈/调用链信息). +- `dmesg` 可以查看到具体的超出设定条件的变量信息; +- Task 信息被输出到缓存区,使用 ucli 工具查看. + +`dmesg` 打印示例如下 + +```log +[42865.640988] ------------------------------------- +[42865.640992] -----------variable monitor---------- +[42865.640993] 超出阈值:1701141698684973655 +[42865.640994] : pid: 63936, name: temp0, ptr: 00000000bade6e61, threshold:110 +[42865.648068] ------------------------------------- +[42875.640703] ------------------------------------- +[42875.640706] -----------variable monitor---------- +[42875.640706] 超出阈值:1701141708684881779 +[42875.640708] : pid: 63936, name: temp0, ptr: 00000000bade6e61, threshold:110 +[42875.640710] : pid: 63936, name: temp1, ptr: 00000000ee645b96, threshold:111 +[42875.640711] : pid: 63936, name: temp2, ptr: 00000000f62b7afe, threshold:112 +[42875.640711] : pid: 63936, name: temp3, ptr: 00000000d100fa3c, threshold:113 +[42875.640712] : pid: 63936, name: temp4, ptr: 000000006d31cae1, threshold:114 +[42875.640712] : pid: 63936, name: temp5, ptr: 00000000723c7a2a, threshold:115 +[42875.640713] : pid: 63936, name: temp6, ptr: 0000000026ef6e83, threshold:116 +[42875.640714] : pid: 63936, name: temp7, ptr: 00000000fc1e5d5e, threshold:117 +[42875.640714] : pid: 63936, name: temp8, ptr: 0000000069b2666e, threshold:118 +[42875.640715] : pid: 63936, name: temp9, ptr: 000000000176263d, threshold:119 +[42875.648023] ------------------------------------- +``` + +默认情况下 `ucli` 编译后在 build 文件夹下 + +`ucli > output` +- ucli 会将缓存区内容解析后输出到 `output` 文件中. +- **此操作会清空缓存区** + +`ucli` 工具输出示例如下(详情见 output_example) +- userstack 是 testcase 下的堆栈信息测试程序. + +```log +##CGROUP:[/] 51666 [510] 采样命中[D] + 进程信息: [/ / userstack], PID: 51666 / 51666 +##C++ pid 51666 + 用户态堆栈SP:7ffcd5822298, BP:2, IP:7f071c720838 +#~ 0x7f071c720838 __GI___nanosleep ([symbol]) +#~ 0x7f071c72076e __sleep ([symbol]) +#~ 0x400a08 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a64 customFunction3 ([symbol]) +#~ 0x400a42 customFunction2 ([symbol]) +#~ 0x400a21 customFunction1 ([symbol]) +#~ 0x400a75 main ([symbol]) +#~ 0x7f071c661d85 __libc_start_main ([symbol]) +#~ 0x40081e _start ([symbol]) + 内核态堆栈: +#@ 0xffffffff811730dd hrtimer_nanosleep ([kernel.kallsyms]) +#@ 0xffffffff811733a6 __x64_sys_nanosleep ([kernel.kallsyms]) +#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) +#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) +#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) +#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) +#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) +#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) +#* 0xffffffffffffff userstack (UNKNOWN) + 进程链信息: +#^ 0xffffffffffffff ./build/userstack (UNKNOWN) +#^ 0xffffffffffffff /bin/bash --init-file /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh (UNKNOWN) +#^ 0xffffffffffffff /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/node /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/bootstrap-fork --type=ptyHost --logsPath /root/ (UNKNOWN) +#^ 0xffffffffffffff /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/node /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/server-main.js --connection-token=remotessh --a (UNKNOWN) +#^ 0xffffffffffffff sh /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/bin/code-server-insiders --connection-token=remotessh --accept-server-license-terms --start-server --enable-remote-auto-shutdown --socket-path=/tmp/code (UNKNOWN) +#^ 0xffffffffffffff /root/.vscode-server-insiders/code-insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6 command-shell --cli-data-dir /root/.vscode-server-insiders/cli --on-port --require-token b5a047063eb7 (UNKNOWN) +#^ 0xffffffffffffff /usr/lib/systemd/systemd --switched-root --system --deserialize 17 (UNKNOWN) +## ``` ## demo -In the main project directory: +usercase 文件夹下 +- `helloworld.c`: 测试大量变量监控 +- `userstack.c`: 测试用户态堆栈输出 +- `hptest.c`: 测试 hugePage 挂载 -```bash -make && insmod watch_module.ko -./watch -``` +## 其他 -You can see the printed stack information in dmesg +程序分为两部分: 字符设备 和 用户空间接口, 两者通过 ioctl 通信. -```bash -# Unload module and clean compile files -rmmod watch_module.ko && make clean -``` +用户空间地址访问 +- 用户程序传入的变量 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核. + - 192.168.40.204 环境下,HugeTLB Pages 测试挂载正常. +- 内存页地址 + 偏移量存入定时器对应的 `kernel_watch_arg` 中, hrTimer 轮询时访问 `kernel_watch_arg` 得到真实值. -Only tested on kernel 5.17.15-1.el8.x86_64. +定时器分组 +- hrTimer 数据结构定义在全局数组 `kernel_wtimer_list`.分配定时器时,会检查遍历 `kernel_wtimer_list` 比较定时器间隔, +- 相同定时间隔的 watch 分配到同一组,对应同一个 hrTimer. +- 若一个定时器监控变量数量超过 `TIMER_MAX_WATCH_NUM` (32),则会创建一个新的 hrTimer. +- hrTimer 的总数量(`kernel_wtimer_list` 数组长度)限制是 `MAX_TIMER_NUM`(128). -## Other +内存页 mount/unmount +- `get_user_pages_remote`/ `kmap` 会增加对应的计数,需要对等的 `put_page`/`kunmap`. +- 一个模块内全局链表 `watch_local_memory_list` 存储每一个成功挂载的变量对应的 page 和 kt,执行字符设备的 close 操作时,遍历并卸载. -The program is divided into two parts: character device and user space interface, both of which communicate through ioctl. +variable monitor 添加/删除 +- kernel_watch_arg 数据结构中有 pid 的成员变量,但添加变量监控时,不按照进程区分. +- 删除时遍历全部监控变量,比较 pid. +- 删除造成的缺位,将最后的变量移动到空位, sentinel--; hrTimer 同理. -User space address access -- The variable virtual address passed in by the user program, use `get_user_pages_remote` to obtain the memory page where the address is located, and `kmap` maps it to the kernel. - - In the 192.168.40.204 environment, the HugeTLB Pages test mounts normally. -- The memory page address + offset is stored in the `kernel_watch_arg` corresponding to the timer, and hrTimer accesses `kernel_watch_arg` when polling to get the real value. - -timer grouping -- The hrTimer data structure is defined in the global array `kernel_wtimer_list`. When allocating a timer, it will check the traversal `kernel_wtimer_list` to compare the timer interval. -- Watches with the same timing interval are assigned to the same group and correspond to the same hrTimer. -- If the number of variables monitored by a timer exceeds `TIMER_MAX_WATCH_NUM` (32), a new hrTimer will be created. -- The total number of hrTimers (`kernel_wtimer_list` array length) limit is `MAX_TIMER_NUM`(128). - -Memory page mount/unmount -- `get_user_pages_remote`/ `kmap` will increase the corresponding count and requires the equivalent `put_page`/`kunmap`. -- A global linked list in the module `watch_local_memory_list` stores the page and kt corresponding to each successfully mounted variable. When performing the close operation of the character device, it is traversed and unloaded. - -Stack output conditions: The conditions are referenced from [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) -- `TASK` must satisfy TASK_RUNNING and `__task_contributes_to_load`. -- `__task_contributes_to_load` corresponds to the kernel macro `task_contributes_to_loa`. +堆栈输出条件: 条件参考自 [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) +- `TASK` 要满足 TASK_RUNNING 和 `__task_contributes_to_load` 和 `TASK_IDLE`(可能有阻塞进程). +- `__task_contributes_to_load` 对应内核宏 `task_contributes_to_loa`. ```c // https://www.spinics.net/lists/kernel/msg3582022.html diff --git a/README_en.md b/README_en.md new file mode 100644 index 0000000..409f05b --- /dev/null +++ b/README_en.md @@ -0,0 +1,150 @@ +## Variable Monitor + +Monitor numerical variables (given address, length), and print system stack information when the set conditions are exceeded. + +Number of simultaneous monitoring +- Monitoring with the same timing length will be grouped into one group, corresponding to one timer. +- A set of up to 32 variables, after which a new timer is allocated. +- The global maximum number of timers is 128. +- The above quantity limit is defined in the `watch_module.h` header macro. + +Currently, monitoring is limited to the same application, and simultaneous calls from multiple applications are not currently supported. +- Multiple applications can work normally if only one program calls `cancel_all_watch();`. + +## Usage + +Example: helloworld.c +- Add `#include "watch.h"` +- Set each variable that needs to be monitored: name && address && length, set threshold, comparison method, timer interval (ns), etc. +- `start_watch(watch_arg);` Start monitoring +- Call `cancel_all_watch();` when you need to cancel monitoring + +When the set conditions are exceeded, the system stack information is printed and viewed with `dmesg`, as shown in the following example: +- Within a timer, if multiple variables exceed the threshold, the stack information will not be output repeatedly; +- The timer restart time after printing the stack is 1s, and the next round of monitoring will start after 1s. + +```log +[86245.364861] ------------------------------------- +[86245.364864] -------------watch monitor----------- +[86245.364865] Threshold reached: + name: temp0, threshold: 150 +[86245.364866] Timestamp (ns): 1699589000606300743 +[86245.364867] Recent Load: 116.65, 126.83, 151.17 +[86245.365669] task: name lcore-worker-4, pid 803327 +[86245.365672] task: name lcore-worker-5, pid 803328 +[86245.365673] task: name lcore-worker-6, pid 803329 +[86245.365674] task: name lcore-worker-7, pid 803330 +[86245.365676] task: name lcore-worker-8, pid 803331 +[86245.365677] task: name lcore-worker-9, pid 803332 +[86245.365679] task: name lcore-worker-10, pid 803333 +[86245.365681] task: name lcore-worker-11, pid 803334 +[86245.365682] task: name lcore-worker-68, pid 803335 +[86245.365683] task: name lcore-worker-69, pid 803336 +[86245.365684] task: name lcore-worker-70, pid 803337 +[86245.365685] task: name lcore-worker-71, pid 803338 +[86245.365686] task: name lcore-worker-72, pid 803339 +[86245.365687] task: name lcore-worker-73, pid 803340 +[86245.365688] task: name lcore-worker-74, pid 803341 +[86245.365689] task: name lcore-worker-75, pid 803342 +[86245.365694] task: name pkt:worker-0, pid 803638 +[86245.365702] hrtimer_nanosleep+0x8d/0x120 +[86245.365709] __x64_sys_nanosleep+0x96/0xd0 +[86245.365711] do_syscall_64+0x37/0x80 +[86245.365716] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365718] task: name pkt:worker-1, pid 803639 +[86245.365721] hrtimer_nanosleep+0x8d/0x120 +[86245.365724] __x64_sys_nanosleep+0x96/0xd0 +[86245.365726] do_syscall_64+0x37/0x80 +[86245.365728] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365730] task: name pkt:worker-2, pid 803640 +[86245.365732] hrtimer_nanosleep+0x8d/0x120 +[86245.365734] __x64_sys_nanosleep+0x96/0xd0 +[86245.365737] do_syscall_64+0x37/0x80 +[86245.365739] entry_SYSCALL_64_after_hwframe+0x44/0xae +[86245.365740] task: name pkt:worker-3, pid 803641 +[86245.365743] hrtimer_nanosleep+0x8d/0x120 +``` + +### Parameter Description + +start_watch passes in the watch_arg structure. The meaning of each field is as follows +- name limit `MAX_NAME_LEN`(15) valid characters + +```c +typedef struct +{ + pid_t task_id; // current process id + char name[MAX_NAME_LEN + 1]; // name (15+1) + void *ptr; // virtual address + int length_byte; // byte + long long threshold; // threshold value + unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed) + unsigned char greater_flag; // reverse flag (true: >, false: <) + unsigned long time_ns; // timer interval (ns) +} watch_arg; +``` + +An initialization example + +```c +watch_args = (watch_arg){ + .task_id = getpid(), + .ptr = &temp, + .name = "temp", + .length_byte = sizeof(int), + .threshold = 150 + i, + .unsigned_flag = 0, + .greater_flag = 1, + .time_ns = 2000 + (i / 33) * 5000 +}; +``` + +## demo + +In the main project directory: + +```bash +make && insmod watch_module.ko +./watch +``` + +You can see the printed stack information in dmesg + +```bash +# Unload module and clean compile files +rmmod watch_module.ko && make clean +``` + +Only tested on kernel 5.17.15-1.el8.x86_64. + +## Other + +The program is divided into two parts: character device and user space interface, both of which communicate through ioctl. + +User space address access +- The variable virtual address passed in by the user program, use `get_user_pages_remote` to obtain the memory page where the address is located, and `kmap` maps it to the kernel. + - In the 192.168.40.204 environment, the HugeTLB Pages test mounts normally. +- The memory page address + offset is stored in the `kernel_watch_arg` corresponding to the timer, and hrTimer accesses `kernel_watch_arg` when polling to get the real value. + +timer grouping +- The hrTimer data structure is defined in the global array `kernel_wtimer_list`. When allocating a timer, it will check the traversal `kernel_wtimer_list` to compare the timer interval. +- Watches with the same timing interval are assigned to the same group and correspond to the same hrTimer. +- If the number of variables monitored by a timer exceeds `TIMER_MAX_WATCH_NUM` (32), a new hrTimer will be created. +- The total number of hrTimers (`kernel_wtimer_list` array length) limit is `MAX_TIMER_NUM`(128). + +Memory page mount/unmount +- `get_user_pages_remote`/ `kmap` will increase the corresponding count and requires the equivalent `put_page`/`kunmap`. +- A global linked list in the module `watch_local_memory_list` stores the page and kt corresponding to each successfully mounted variable. When performing the close operation of the character device, it is traversed and unloaded. + +Stack output conditions: The conditions are referenced from [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) +- `TASK` must satisfy TASK_RUNNING and `__task_contributes_to_load`. +- `__task_contributes_to_load` corresponds to the kernel macro `task_contributes_to_loa`. + +```c +// https://www.spinics.net/lists/kernel/msg3582022.html +// remove from 5.8.rc3,but it still work +// whether the task contributes to the load +#define __task_contributes_to_load(task) \ + ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 && \ + (READ_ONCE(task->__state) & TASK_NOLOAD) == 0) +``` \ No newline at end of file diff --git a/README_zh.md b/README_zh.md deleted file mode 100644 index c71deda..0000000 --- a/README_zh.md +++ /dev/null @@ -1,250 +0,0 @@ -## Variable Monitor - -changelog - -```log -11.9 多个变量监控支持 -11.10 按照 pid 区分不同内核结构, 支持每个进程单独申请取消自己的监控. -11.13 用户接口 cancel_all_watch -> cancel_watch, 每个进程互不干扰. -11.28 完全重构,更新文档. -``` - -## 说明 - -监控 数值变量(给定 地址,长度), 达到设定条件打印系统内 Task 信息(用户态堆栈/内核态堆栈/调用链信息). -- 支持多进程, 单个进程退出时,取消该进程的所有监控. -- 相同定时间隔会分配到同一个定时器,一个定时器最多监控 32 个变量,全局最多 128 个定时器. - - 以上数量限制定义在 `source/module/monitor_timer.h`. - - `testcase/helloworld.c` 有测试到单进程 2049 个变量; - -文件结构 - -```log -├── build // output -├── source // all source code -│ ├── buffer // 模块与用户空间通信的缓冲区 -│ ├── module // 模块代码 -│ ├── uapi // 用户空间接口 -│ ├── ucli // 用户空间命令行工具 -│ └── ucli_py // 用户空间命令行 python (仅测试用,待完成) -│ └── libunwind // python 解析堆栈信息移植库 -├── testcase // 测试用例 -└── tools // 测试工具 -``` - -## 使用 - -设定对变量监控有两种函数: 宏定义 或 定义 watch_arg 结构体 -- 都需要添加 `source/uapi` 下的头文件 `#include "monitor_user.h"` - -需要取消监控时调用 `cancel_watch();` variant_monitor 会取消该进程所有监控. -- 当进程退出后,也会执行相同的操作,取消该进程所有监控. -- 因此调用 `cancel_watch();` 是个可选项,但依然建议调用以避免可能的内存泄漏. - -获取 Task 信息是一项耗时操作,这里使用了 workqueue 处理,且一次处理后该定时器重启间隔默认为 5s. -- 此值可以在 `/proc/variable_monitor/dump_reset_sec` 查看和修改. - -### 挂载驱动 - -项目根目录 - -```bash -# 编译加载模块 -make && insmod source/variable_monitor.ko -# 卸载模块,清理编译文件 -# rmmod source/variable_monitor.ko && make clean -# 仅在 `kernel 5.17.15-1.el8.x86_64` 测试,其他内核版本未测试. -``` - -### 宏定义 - -示例如 `testcase/helloworld.c`, 对常见数值类型宏定义 方便使用: -- 其他类型见 `source/uapi/monitor_user_sw.h` -```c -// 传入变量名 | 地址 | 阈值 -START_WATCH_INT("temp", &temp, 150); -START_WATCH_INT_LESS("temp", &temp, 150); -``` - -默认情况下,使用宏定义 定时器的时间间隔为 10us; 此值可以在 `/proc/variable_monitor/def_interval_ns` 查看和修改. - -### watch_arg 结构体 - -如果需要对定时间隔等有更多控制,请定义 watch_arg 结构体,start_watch 启动监控: -- 对每个需要监控的变量 设置: 名称 && 地址 && 长度, 设置阈值, 比较方式, 定时器间隔(ns) 等. -- `start_watch(watch_arg);` 启动监控 -- 需要取消监控时调用 `cancel_watch();` - -```c -// start_watch 传入的是 watch_arg 结构体.各个字段意义如下 -// - name 限制 `MAX_NAME_LEN`(15) 个有效字符 -typedef struct -{ - pid_t task_id; // current process id - char name[MAX_NAME_LEN + 1]; // name (15+1) - void *ptr; // virtual address - int length_byte; // byte - long long threshold; // threshold value - unsigned char unsigned_flag; // unsigned flag (true: unsigned, false: signed) - unsigned char greater_flag; // reverse flag (true: >, false: <) - unsigned long time_ns; // timer interval (ns) -} watch_arg; - -//一个初始化示例 -watch_args = (watch_arg){ - .task_id = getpid(), - .ptr = &temp, - .name = "temp", - .length_byte = sizeof(int), - .threshold = 150, - .unsigned_flag = 0, - .greater_flag = 1, - .time_ns = 2000 + 5000 -}; -start_watch(watch_args); -``` - -### 打印输出 - -定时器不断按照设定间隔轮询变量,当达到设定条件时,采集此时系统内符合要求的 Task 信息(用户态堆栈/内核态堆栈/调用链信息). -- `dmesg` 可以查看到具体的超出设定条件的变量信息; -- Task 信息被输出到缓存区,使用 ucli 工具查看. - -`dmesg` 打印示例如下 - -```log -[42865.640988] ------------------------------------- -[42865.640992] -----------variable monitor---------- -[42865.640993] 超出阈值:1701141698684973655 -[42865.640994] : pid: 63936, name: temp0, ptr: 00000000bade6e61, threshold:110 -[42865.648068] ------------------------------------- -[42875.640703] ------------------------------------- -[42875.640706] -----------variable monitor---------- -[42875.640706] 超出阈值:1701141708684881779 -[42875.640708] : pid: 63936, name: temp0, ptr: 00000000bade6e61, threshold:110 -[42875.640710] : pid: 63936, name: temp1, ptr: 00000000ee645b96, threshold:111 -[42875.640711] : pid: 63936, name: temp2, ptr: 00000000f62b7afe, threshold:112 -[42875.640711] : pid: 63936, name: temp3, ptr: 00000000d100fa3c, threshold:113 -[42875.640712] : pid: 63936, name: temp4, ptr: 000000006d31cae1, threshold:114 -[42875.640712] : pid: 63936, name: temp5, ptr: 00000000723c7a2a, threshold:115 -[42875.640713] : pid: 63936, name: temp6, ptr: 0000000026ef6e83, threshold:116 -[42875.640714] : pid: 63936, name: temp7, ptr: 00000000fc1e5d5e, threshold:117 -[42875.640714] : pid: 63936, name: temp8, ptr: 0000000069b2666e, threshold:118 -[42875.640715] : pid: 63936, name: temp9, ptr: 000000000176263d, threshold:119 -[42875.648023] ------------------------------------- -``` - -默认情况下 `ucli` 编译后在 build 文件夹下 - -`ucli > output` -- ucli 会将缓存区内容解析后输出到 `output` 文件中. -- **此操作会清空缓存区** - -`ucli` 工具输出示例如下(详情见 output_example) -- userstack 是 testcase 下的堆栈信息测试程序. - -```log -##CGROUP:[/] 51666 [510] 采样命中[D] - 进程信息: [/ / userstack], PID: 51666 / 51666 -##C++ pid 51666 - 用户态堆栈SP:7ffcd5822298, BP:2, IP:7f071c720838 -#~ 0x7f071c720838 __GI___nanosleep ([symbol]) -#~ 0x7f071c72076e __sleep ([symbol]) -#~ 0x400a08 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a64 customFunction3 ([symbol]) -#~ 0x400a42 customFunction2 ([symbol]) -#~ 0x400a21 customFunction1 ([symbol]) -#~ 0x400a75 main ([symbol]) -#~ 0x7f071c661d85 __libc_start_main ([symbol]) -#~ 0x40081e _start ([symbol]) - 内核态堆栈: -#@ 0xffffffff811730dd hrtimer_nanosleep ([kernel.kallsyms]) -#@ 0xffffffff811733a6 __x64_sys_nanosleep ([kernel.kallsyms]) -#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) -#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) -#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) -#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) -#@ 0xffffffff819fa117 do_syscall_64 ([kernel.kallsyms]) -#@ 0xffffffff81c0007c entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) -#* 0xffffffffffffff userstack (UNKNOWN) - 进程链信息: -#^ 0xffffffffffffff ./build/userstack (UNKNOWN) -#^ 0xffffffffffffff /bin/bash --init-file /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh (UNKNOWN) -#^ 0xffffffffffffff /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/node /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/bootstrap-fork --type=ptyHost --logsPath /root/ (UNKNOWN) -#^ 0xffffffffffffff /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/node /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/out/server-main.js --connection-token=remotessh --a (UNKNOWN) -#^ 0xffffffffffffff sh /root/.vscode-server-insiders/cli/servers/Insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6/server/bin/code-server-insiders --connection-token=remotessh --accept-server-license-terms --start-server --enable-remote-auto-shutdown --socket-path=/tmp/code (UNKNOWN) -#^ 0xffffffffffffff /root/.vscode-server-insiders/code-insiders-ca9da6c177fc4cf7429e1d0c1c52f710d6d953c6 command-shell --cli-data-dir /root/.vscode-server-insiders/cli --on-port --require-token b5a047063eb7 (UNKNOWN) -#^ 0xffffffffffffff /usr/lib/systemd/systemd --switched-root --system --deserialize 17 (UNKNOWN) -## -``` - -## demo - -usercase 文件夹下 -- `helloworld.c`: 测试大量变量监控 -- `userstack.c`: 测试用户态堆栈输出 -- `hptest.c`: 测试 hugePage 挂载 - -## 其他 - -程序分为两部分: 字符设备 和 用户空间接口, 两者通过 ioctl 通信. - -用户空间地址访问 -- 用户程序传入的变量 虚拟地址, 使用 `get_user_pages_remote` 获取地址所在内存页, `kmap` 将其映射到内核. - - 192.168.40.204 环境下,HugeTLB Pages 测试挂载正常. -- 内存页地址 + 偏移量存入定时器对应的 `kernel_watch_arg` 中, hrTimer 轮询时访问 `kernel_watch_arg` 得到真实值. - -定时器分组 -- hrTimer 数据结构定义在全局数组 `kernel_wtimer_list`.分配定时器时,会检查遍历 `kernel_wtimer_list` 比较定时器间隔, -- 相同定时间隔的 watch 分配到同一组,对应同一个 hrTimer. -- 若一个定时器监控变量数量超过 `TIMER_MAX_WATCH_NUM` (32),则会创建一个新的 hrTimer. -- hrTimer 的总数量(`kernel_wtimer_list` 数组长度)限制是 `MAX_TIMER_NUM`(128). - -内存页 mount/unmount -- `get_user_pages_remote`/ `kmap` 会增加对应的计数,需要对等的 `put_page`/`kunmap`. -- 一个模块内全局链表 `watch_local_memory_list` 存储每一个成功挂载的变量对应的 page 和 kt,执行字符设备的 close 操作时,遍历并卸载. - -variable monitor 添加/删除 -- kernel_watch_arg 数据结构中有 pid 的成员变量,但添加变量监控时,不按照进程区分. -- 删除时遍历全部监控变量,比较 pid. -- 删除造成的缺位,将最后的变量移动到空位, sentinel--; hrTimer 同理. - -堆栈输出条件: 条件参考自 [diagnose-tools::load.c](https://github.com/alibaba/diagnose-tools/blob/e285bc4626a7d207eabd4a69cb276e1a3b1b7c76/SOURCE/module/kernel/load.c#L209) -- `TASK` 要满足 TASK_RUNNING 和 `__task_contributes_to_load` 和 `TASK_IDLE`(可能有阻塞进程). -- `__task_contributes_to_load` 对应内核宏 `task_contributes_to_loa`. - -```c -// https://www.spinics.net/lists/kernel/msg3582022.html -// remove from 5.8.rc3,but it still work -// whether the task contributes to the load -#define __task_contributes_to_load(task) \ - ((READ_ONCE(task->__state) & TASK_UNINTERRUPTIBLE) != 0 && (task->flags & PF_FROZEN) == 0 && \ - (READ_ONCE(task->__state) & TASK_NOLOAD) == 0) -``` \ No newline at end of file