diff --git a/nezha-fronted/src/components/page/config/about.vue b/nezha-fronted/src/components/page/config/about.vue index 2c7b91250..e9e3dfeec 100644 --- a/nezha-fronted/src/components/page/config/about.vue +++ b/nezha-fronted/src/components/page/config/about.vue @@ -44,7 +44,7 @@ export default { return { noData: false, timeLineData: [], - language: '', + // language: '', version: { nezha: { name: '', version: '' }, components: [{ name: '', version: '' }] @@ -274,18 +274,18 @@ export default { methods: { getVersion () { this.$get('/about').then(response => { - this.language = localStorage.getItem('nz-language') || 'en' + // this.language = localStorage.getItem('nz-language') || 'en' this.version = response.data }) } }, mounted () { this.getVersion() + }, + computed: { + language () { + return this.$store.getters.getLanguage + } } - // computed: { - // language () { - // return this.$store.getters.getLanguage - // } - // } } diff --git a/nezha-fronted/src/components/page/dashboard/explore/exploreItem.vue b/nezha-fronted/src/components/page/dashboard/explore/exploreItem.vue index 43aed7532..31b859ea1 100644 --- a/nezha-fronted/src/components/page/dashboard/explore/exploreItem.vue +++ b/nezha-fronted/src/components/page/dashboard/explore/exploreItem.vue @@ -200,7 +200,8 @@
Prometheus 提供一种名为 PromQL(Prometheus Query Language)的函数式查询语言,允许用户实时选择和聚合时间序列数据。表达式的结果可以显示为图形,也可以作为表格数据在 Prometheus 表达式浏览器中查看,或通过 HTTP API 由外部系统使用。
+本文件可作为参考。首先了解一些示例会对学习有帮助。
+在 Prometheus 的表达式语言中,表达式或子表达式可以分为以下四种类型:
+根据使用情况(例如绘制或显示表达式输出结果时),这些类型中只有一部分可以作为用户指定表达式的结果。例如,返回即时向量的表达式是唯一可以直接绘图的类型。
+字符串可以在单引号、双引号或反引号中指定为字面量。
+PromQL 遵循与 Go 相同的 转义规则。 在单引号或双引号中,转义序列以反斜杠开头,其后可以是 a, b, f, n, r, t, v 或 \。 可以使用八进制 (\nnn) 十六进制 (\xnn, \unnnn 和 \Unnnnnnnn)来提供特定字符。
反引号内不处理任何转义。与 Go 不同,Prometheus 不丢弃反斜杠内的换行符。
+示例:
+"this is a string" +'these are unescaped: \n \\ \t' +`these are not unescaped: \n ' " \t`+
标量浮点值可以按以下格式写成整数字面量或浮点数(空白的目的是提供更好的可读性):
+[-+]?( + [0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? + | 0[xX][0-9a-fA-F]+ + | [nN][aA][nN] + | [iI][nN][fF] +)+
示例:
+23 +-2.43 +3.4e-9 +0x8f +-Inf +NaN+
即时向量选择器允许选择一组时间序列和在每个给定时间戳(即时)的单个样本值:在最简单的形式中,只需指定一个度量名称。这将产生一个即时向量,其中包含具有该度量名称的所有时间序列的元素。
+此示例选择具有 http_requests_total 度量名称的所有时间序列:
http_requests_total+
通过在大括号 ({}) 中附加一个逗号分隔的标签匹配器列表,可以进一步过滤这些时间序列。
此示例仅选择度量名称为 http_requests_total ,且 job 标签设置为 prometheus 、 group 标签设置为 canary 的时间序列:
http_requests_total{job="prometheus",group="canary"}
+ 也能负匹配标签值,或者根据正则表达式匹配标签值。有以下标签匹配运算符:
+=: 选择与提供的字符串完全相同的标签。!=: 选择与提供的字符串不相同的标签。=~: 选择与提供的字符串正则表达式匹配的标签。!~: 选择与提供的字符串正则表达式不匹配的标签。正则表达式匹配是完全锚定的。env=~"foo" 匹配被视为 env=~"^foo$"。
本示例将为 staging、testingand development 环境和除 GET 之外的 HTTP 方法选择所有 http_requests_total 的时间序列。
http_requests_total{environment=~"staging|testing|development",method!="GET"}
+ 匹配空标签值的标签匹配器还能选择没有特定标签集的所有时间序列。同一个标签名称可以有多个匹配器。
+向量选择器必须指定一个名称或至少一个与空字符串不匹配的标签匹配器。下面的表达式是非法的:
+{job=~".*"} # Bad!
+ 相反,下面这些表达式则是有效的,因为它们有与空标签值不匹配的选择器。
+{job=~".+"} # Good!
+{job=~".*",method="get"} # Good!
+ 标签匹配器也可以通过与内部 __name__ 标签进行匹配而应用于指标名称。例如,表达式 http_requests_total 等效于 {__name__="http_requests_total"}。 除 = ( !=, =~, !~ ) 以外的匹配器也可以使用。以下表达式选择名称以 job: 开头的所有度量:
{__name__=~"job:.*"}
+ 度量名称不能为 bool、 on、 ignoring、 group_left、 group_right 等关键字,下面的表达式是非法的:
on{} # Bad!
+ 使用 __name__ 标签可以规避这一限制:
{__name__="on"} # Good!
+ Prometheus 中的所有正则表达式都使用 RE2 语法。
+范围向量字面量的使用与即时向量字面量类似,只是前者从当前的即时选择一个范围的样本。从语法上来说,在向量选择器的末尾,方括号 ([]) 中会附加一个时间长度,以指定为每个结果范围向量元素获取过去多久的时间值。
在以下示例中,我们选择在过去5分钟内为度量名称为 http_requests_total 、 job 标签为 prometheus 的时间序列记录的所有值:
http_requests_total{job="prometheus"}[5m]
+ 时间长度指定为一个数字,其后单位如下:
+ms - 毫秒s - 秒m - 分钟h - 小时d - 天 - 假设一天恒定为24小时w - 周 - 假设一周恒定为7天y - 年 - 假设一年恒定为365天时间长度可以连接组合。单位必须从大到小排序。给定的单位在一个时间长度里只能出现一次。
+以下是一些有效的时间长度:
+5h +1h30m +5m +10s+
offset(偏移)修饰符可以改变查询中各个即时向量和范围向量的时间偏移。
例如,以下表达式返回相对于当前查询评估时间的过去5分钟内 http_requests_total 的值:
http_requests_total offset 5m+
注意 offset 修饰符需要紧跟选择器,以下表达式是正确的:
sum(http_requests_total{method="GET"} offset 5m) // GOOD.
+ 而以下表达式则是的错误:
+sum(http_requests_total{method="GET"}) offset 5m // INVALID.
+ 范围向量也是一样。以下表达式返回一周前 http_requests_total 的5分钟速率:
rate(http_requests_total[5m] offset 1w)+
为了与时间上的时间前移进行比较,可以指定负偏移:
+rate(http_requests_total[5m] offset -1w)+
注意:该表达式允许查询超过评估时间的数据。
+@ 修饰符允许改变查询中各个即时和范围向量的评估时间。提供给 @ 修饰符的时间是一个 unix 时间戳,用浮点字面量描述。
例如,以下表达式返回 http_requests_total 在 2021-01-04T07:40:00+00:00 的值:
http_requests_total @ 1609746000+
注意 @ 修饰符需要紧跟选择器,以下表达式是正确的:
sum(http_requests_total{method="GET"} @ 1609746000) // GOOD.
+ 以下表达式则是错误的:
+sum(http_requests_total{method="GET"}) @ 1609746000 // INVALID.
+ 范围向量也是一样。 以下的表达式返回 http_requests_total 在 2021-01-04T07:40:00+00:00 的五分钟速率:
rate(http_requests_total[5m] @ 1609746000)+
@ 修饰符支持上述浮点字面量在 int64 范围内的所有表示。它还可以与 offset 修饰符一起使用,一起使用时,不论先写哪个修饰符,将应用相对于 @ 修饰符时间的偏移。以下两个查询将产生相同的结果。
# offset after @ +http_requests_total @ 1609746000 offset 5m +# offset before @ +http_requests_total offset 5m @ 1609746000+
此外,start() 和 end() 也可以作为特殊值用作 @ 修饰符的值。
对于范围查询,它们分别解析到范围查询的开始时间和结束时间,并在所有步骤中保持不变。
+对于即时查询,start() 和 end() 都解析到评估时间。
http_requests_total @ start() +rate(http_requests_total[5m] @ end())+
请注意,@ 修饰符允许查询超过评估时间的数据。
子查询允许您针对给定的范围和分辨率运行即时查询。子查询的结果是一个范围向量。
+语法:<instant_query> '[' <range> ':' [<resolution>] ']' [ @ <float_literal> ] [ offset <duration> ]
<resolution> 为选填。默认为全局评估时间间隔。Prometheus 支持许多二进制和聚合运算符,这些运算符在表达式语言运算符页面中有详细描述。
+Prometheus 支持一些对数据进行运算的函数,这些函数在表达式语言函数页面有详细描述。
+PromQL 支持以 # 开头的行注释。示例:
# This is a comment+
当查询运行时,独立于当前实际时间序列数据来选择采样数据的时间戳。这主要是为了支持聚合 (sum, avg, 等),其中多个聚合的时间序列在时间上不完全一致。由于其独立性, Prometheus 需要在这些时间戳上为每个相关的时间序列赋值,这一赋值是通过简单地获取该时间戳之前的最新样本来实现的。
如果目标抓取或规则评估不再返回先前存在的时间序列的样本,则该时间序列会被标记为过时。如果一个目标被删除,其先前返回的时间序列将很快被标记为过时。
+如果在某个时间序列被标记为过时后,在采样时间戳处对查询进行评估,则不会为该时间序列返回任何值。如果该时间序列随后摄入了新样本,会正常返回值。
+如果在采样时间戳之前5分钟(默认)没有找到样本,则不会为该时间点的时间序列返回任何值。这意味着当时间序列的最新收集样本超过5分钟或被标记为过时时,时间序列就会从图表中“消失”。
+在抓取中包含时间戳的时间序列不会被标记为过时,而将只应用5分钟阈值。
+如果一个查询需要对大量数据进行操作,那么绘图可能会超时或者导致服务器或浏览器过载。因此,当对未知数据构建查询时,应总是在 Prometheus 表达式浏览器的表格视图中开始构建查询,直到结果集是合理的(最多几百个,而不是几千个时间序列)。在充分过滤或聚合数据后,才能切换到图表模式。如果表达式仍然需要很长时间来绘制,则通过记录规则预先记录。
+这对 Prometheus 的查询语言尤其重要,在 Prometheus 的查询语言中,像 api_http_requests_total 这样简单的度量名称选择器可以扩展成数千个带有不同标签的时间序列。还要注意,即使只输出了少量的时间序列,聚合多个时间序列的表达式也会在服务器上产生负载;就像在关系数据库中,即使输出值只是一个数字,对一列中的所有值求和也会很慢。
Prometheus 的查询语言支持基本的逻辑和算术运算符。对于两个即时向量之间的运算,可以修改匹配行为。
+Prometheus 中有以下二进制算术运算符:
++ (加法)- (减法)* (乘法)/ (除法)% (取模)^ (幂)二进制算术运算符支持标量 / 标量、向量 / 标量、以及向量 / 向量之间的运算。
+标量 / 标量, 行为是显而易见的:它们评估另一个标量,该标量是运算符应用于两个标量操作数的结果。
+即时向量 / 标量, 将运算符应用于向量中每个数据样本的值。 例如, 如果时间序列即时向量乘以2,则结果是另一个将原始向量的每个样本值乘以2的向量。度量名称被删除。
+即时向量 / 即时向量, 二进制算术运算符应用于左侧向量中的每个条目及其右侧向量中的匹配元素。结果将传播到结果向量中,分组标签成为输出标签集。度量名称被删除。 结果不包含在右侧向量中找不到匹配条目的条目。
+Prometheus 中有以下采用弧度制的三角二进制运算符:
+atan2 (基于 https://pkg.go.dev/math#Atan2)三角运算符允许使用向量匹配对两个向量运行三角函数,普通函数则不允许。三角运算符的运算方式与算术运算符相同。
+Prometheus 中有以下比较二进制运算符有:
+== (等于)!= (不等于)> (大于)< (小于)>= (大于等于)<= (小于等于)比较二进制运算符支持标量 / 标量、向量 / 标量、以及向量 / 向量之间的运算。默认情况下过滤。 可以通过在运算符之后提供 bool 来修改其行为,从而为值返回 0 或 1 而不是过滤。
标量 / 标量, 必须提供 bool 修饰符,这些运算符会产生另一个 0 (假) 或 1 (真) 的标量,具体取决于比较结果。
即时向量 / 标量, 将这些运算符应用于向量中的每个数据样本的值,并且从结果向量中删除比较结果为false 的向量元素。 如果提供了 bool 修饰符,则将被删除的向量元素的值为0,将保留的向量元素的值为1 。如果提供了 bool 修饰符,度量名称将被删除。
即时向量 / 即时向量, 这些运算符默认表现为过滤器,应用于匹配条目。 表达式不正确或在表达式的另一侧找不到匹配项的向量元素会被从结果中删除,其他元素则传播到结果向量中,分组标签成为输出标签集。 如果提供了 bool 修饰符,则将被删除的向量元素的值为0 ,将保留的向量元素的值为1, 分组标签再次成为输出标签集。如果提供了 bool 修饰符,度量名称将被删除。
逻辑 / 集合二进制运算符只支持即时向量之间的操作:
+and (交集)or (并集)unless (补集)vector1 and vector2 得到一个由 vector1 元素组成的向量,其中 vector2 中的元素具有完全匹配的标签集。 其他元素被删除。 度量名称和值从左侧向量转移。
vector1 or vector2 得到包含vector1 的所有原始元素(标签集+值)的向量,以及在 vector1 中没有匹配标签集的 vector2 的所有元素。
vector1 unless vector2 得到一个由 vector1 元素组成的向量, vector2 中没有元素的标签集与该向量完全匹配。 两个向量中的所有匹配元素都被删除。
向量之间的操作尝试在左侧的每个条目的右侧向量中找到匹配元素。 有两种基本匹配行为:一对一和多对一 / 一对多。
+向量匹配关键词允许具有不同标签集的序列之间的匹配,关键词有:
+onignoring提供给匹配关键词的标签列表将决定向量的组合方式。可以在一对一向量匹配和多对一 / 一对多向量匹配中找到示例。
+这些分组修饰符支持多对一 / 一对多向量匹配:
+group_leftgroup_right可以提供标签列表给分组修饰符,分组修饰符包含来自“一”方的标签,以将其纳入结果度量中。
+多对一 / 一对多匹配是应该仔细考虑的高级用例。通常,正确使用 ignoring(<labels>) 可以令您得到想要的结果。
分组修饰符只能用于比较和算术。像 and、 unless 和 or 这样的运算默认情况下与右向量中所有可能的条目匹配。
一对一从操作的每一侧找到一对唯一条目。在默认情况下是格式 vector1 <operator> vector2 之后的运算。如果两个条目具有完全相同的标签集和相应的值,则这两个条目匹配。ignoring 关键词允许在匹配时忽略某些标签,而 on 关键词允许将所考虑的标签集减少为提供的列表:
<vector expr> <bin-op> ignoring(<label list>) <vector expr> +<vector expr> <bin-op> on(<label list>) <vector expr>+
输入示例:
+method_code:http_errors:rate5m{method="get", code="500"} 24
+method_code:http_errors:rate5m{method="get", code="404"} 30
+method_code:http_errors:rate5m{method="put", code="501"} 3
+method_code:http_errors:rate5m{method="post", code="500"} 6
+method_code:http_errors:rate5m{method="post", code="404"} 21
+
+method:http_requests:rate5m{method="get"} 600
+method:http_requests:rate5m{method="del"} 34
+method:http_requests:rate5m{method="post"} 120
+ 查询示例:
+method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m
+ 这将为每个方法返回一个结果向量,包含在过去的5分钟内状态代码为500的 HTTP 请求部分。 没有 ignoring(code) 就没有匹配,因为度量不共享同一组标签。 方法 put 和 del 的条目没有匹配,不会在结果中显示:
{method="get"} 0.04 // 24 / 600
+{method="post"} 0.05 // 6 / 120
+ 多对一 / 一对多匹配指“一”侧的每个向量元素都可以与“多”侧的多个元素匹配。必须使用 group_left 或 group_right 修饰符明确请求,其中 left / right 确定有更高基数的向量。
<vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr> +<vector expr> <bin-op> ignoring(<label list>) group_right(<label list>) <vector expr> +<vector expr> <bin-op> on(<label list>) group_left(<label list>) <vector expr> +<vector expr> <bin-op> on(<label list>) group_right(<label list>) <vector expr>+
随分组修饰符提供的标签列表包含来自“一”侧的其他标签,以将其纳入结果度量中。 对于 on ,一个标签只能出现在其中一个列表中。 每次结果向量的序列都必须是唯一可识别的。
查询示例:
+method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m+
在这种情况下,左向量每个 method 标签值包含多个条目。 因此,我们使用 group_left 来表明这一点。 右侧的元素现在与左侧具有相同的 method 标签的多个元素匹配:
{method="get", code="500"} 0.04 // 24 / 600
+{method="get", code="404"} 0.05 // 30 / 600
+{method="post", code="500"} 0.05 // 6 / 120
+{method="post", code="404"} 0.175 // 21 / 120
+ Prometheus 支持以下内置聚合运算符,这些运算符可用于聚合单个即时向量的元素,生成具有聚合值的更少元素的新向量:
+sum (在维度上求和)min (在维度上选择最小值)max (在维度上选择最大值)avg (在维度上求平均值)group (结果向量中的所有值都是1)stddev (在维度上求总体标准偏差)stdvar (在维度上求总体标准方差)count (统计向量元素的数量)count_values (统计有相同数据值的元素数量)bottomk (样本值最大的 k 个元素)topk (样本值最大的 k 个元素)quantile (在维度上统计 φ 分位数(0 ≤ φ ≤ 1))这些运算符可以用于聚合所有标签维度,也可以通过包含 without 或 by 子句来保留不同的维度。这些子句可以在表达式之前或之后使用。
<aggr-op> [without|by (<label list>)] ([parameter,] <vector expression>)+
或
+<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]+
label list 是可以包括尾部逗号的、未加引号的标签列表,即 (label1, label2) 和 (label1, label2,) 都是有效的语法。
without 从结果向量中删除列出的标签,而其他标签都保留在输出中。 by 则相反,它会删除 by 子句中未列出的标签,即使其标签值在向量的所有元素之间是相同的。
parameter 仅用于 count_values、quantile、topk 和 bottomk.
count_values 为每个唯一样本值输出一个时间序列。每个序列都有一个额外的标签,该标签的名称由聚合参数给出,该标签的值是唯一的样本值。每个时间序列的值是样本值出现的次数。
topk 和 bottomk 与其他聚合器的不同之处在于它们在结果向量中返回输入样本的子集(包括原始标签)。 by 和 without 仅用于存储输入向量。
quantile 计算 φ 分位数,即在聚合维度的 N 个度量值中排名第 φ * N 的值。提供 φ 作为聚合参数。例如, quantile(0.5, ...) 计算中位数, quantile(0.95, ...) 计算第 95 个百分位数。如果 φ = NaN, 返回 NaN 如果 φ < 0, 返回 -Inf; 如果 φ > 1, 返回 +Inf。
示例:
+如果度量 http_requests_total 具有按 application、instance 以及 group 扇出的时间序列,我们可以通过以下方式,计算每个应用程序和分组在所有实例上看到的 HTTP 请求总数:
sum without (instance) (http_requests_total)+
等价于:
+sum by (application, group) (http_requests_total)+
如果我们只想得到在所有应用程序中看到的 HTTP 请求总数,我们可以写成:
+sum(http_requests_total)+
要计算运行每个构建版本的二进制文件的数量,我们可以写成:
+count_values("version", build_version)
+ 要在所有实例中获取最大的5个 HTTP 请求计数,我们可以写成:
+topk(5, http_requests_total)+
以下列表从高到低显示了 Prometheus 中二进制运算符的优先级:
+^*, /, %, atan2+, -==, !=, <=, <, >=, >and, unlessor具有相同优先级的运算符为从左到右运算的。 例如,2 * 3 % 2 相当于 (2 * 3) % 2。 但 ^ 是从右到左运算的,所以 2 ^ 3 ^ 2 相当于 2 ^ (3 ^ 2)。
有些函数有默认参数,例如 year(v=vector(time()) instant-vector)。这意味着有一个即时向量:参数 v ,如果不提供该参数,将默认为 vector(time()) 表达式的值。
abs(v instant-vector) 返回输入向量的所有样本值都转换为了绝对值。
absent(v instant-vector) 如果传递给它的向量含有元素,则返回一个空向量;如果传递给它的向量没有元素,则返回一个值为1的单元素向量。
这有助于在给定的度量名称和标签组合不存在时间序列时发出警报。
+absent(nonexistent{job="myjob"})
+# => {job="myjob"}
+
+absent(nonexistent{job="myjob",instance=~".*"})
+# => {job="myjob"}
+
+absent(sum(nonexistent{job="myjob"}))
+# => {}
+ 在前两个例子中,absent() 试图从输入向量中导出单元素输出向量的标签。
absent_over_time(v range-vector) 如果传递给它的范围向量含有元素,则返回一个空向量;如果传递给它的范围向量没有元素,则返回一个值为1的单元素向量。
这有助于在给定的度量名称和标签组合不存在时间序列一段时间后发出警报。
+absent_over_time(nonexistent{job="myjob"}[1h])
+# => {job="myjob"}
+
+absent_over_time(nonexistent{job="myjob",instance=~".*"}[1h])
+# => {job="myjob"}
+
+absent_over_time(sum(nonexistent{job="myjob"})[1h:])
+# => {}
+ 在前两个例子中,absent_over_time() 试图从输入向量中导出单元素输出向量的标签。
ceil(v instant-vector) 将 v 中所有元素的样本值向上舍入为最接近的整数。
对于每个输入时间序列,changes(v range-vector) 返回该值在提供的时间范围内作为即时向量改变的次数。
clamp(v instant-vector, min scalar, max scalar) 将 v 中所有元素的样本值钳制为下限 min 和上限 max。
特殊情况:如果 min > max,则返回一个空向量;如果 min 或 max 是 NaN, 返回NaN。
clamp_max(v instant-vector, max scalar) 将 v 中所有元素的样本值钳制为上限 max。
clamp_min(v instant-vector, min scalar) 将 v 中所有元素的样本值钳制为下限 min。
day_of_month(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的月份中的某天。返回值的范围为1到31。
day_of_week(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的星期中的某天。 返回值的范围为 0 到 6,其中 0 表示星期日。
day_of_year(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的年中的某天。对于平年,返回值的范围为1到365,对于闰年,返回值的范围为1到366。
days_in_month(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的一个月中的天数。返回值的范围为28到31。
delta(v range-vector) 计算范围向量 v 中每个时间序列元素的第一个值和最后一个值之间的差,返回具有给定 delta 和等效标签的即时向量。Delta 被外推以覆盖范围向量选择器中指定的完整时间范围,因此即使样本值都是整数,也有可能得到非整数结果。
下面的表达式返回现在和2小时前的 CPU 温度差:
+delta(cpu_temp_celsius{host="zeus"}[2h])
+ delta 应仅用于 gauge。
deriv(v range-vector) 使用简单线性回归计算范围向量 v 中时间序列的每秒导数。范围向量必须至少有两个样本才能执行计算。当在范围向量中找到 +Inf 或 -Inf 时,所计算的斜率和偏移值将为 NaN。
deriv 应仅用于 gauge。
exp(v instant-vector) 计算 v 中所有元素的指数函数。特殊情况有:
Exp(+Inf) = +InfExp(NaN) = NaNfloor(v instant-vector) 将 v 中所有元素的样本值向下舍入到最接近的整数。
histogram_quantile(φ scalar, b instant-vector) 从直方图的 b 桶计算 φ 分位数(0 ≤ φ ≤ 1)。(有关 φ- 分位数的详细解释和直方图度量类型的一般用法,请参见 直方图和总结。) b 中的样本是每个桶中的观察计数。每个样本必须具有标签 le ,该标签值表示桶的包含上限。(没有该标签的样本会被悄然忽略。)直方图度量类型自动为时间序列提供 _bucket 后缀和适当的标签。
使用 rate() 函数指定分位数计算的时间窗口。
示例:一个直方图度量的名称为 http_request_duration_seconds 。要计算过去10分钟内请求持续时间的第90个百分位数,请使用以下表达式:
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m]))+
计算http_request_duration_seconds 中每个标签组合的分位数。若要聚合,请在 rate() 函数周围使用 sum() 聚合器。因为 histogram_quantile() 需要 le 标签,所以 by 子句必须包含该标签。以下表达式按 job 聚合第90个百分位数:
histogram_quantile(0.9, sum by (job, le) (rate(http_request_duration_seconds_bucket[10m])))+
要聚合所有内容,仅指定 le 标签:
histogram_quantile(0.9, sum by (le) (rate(http_request_duration_seconds_bucket[10m])))+
histogram_quantile() 函数通过假设桶内的线性分布来内插分位数值。最高的存储桶必须有 +Inf 的上限(没有则返回NaN )。如果分位数位于最高的桶中,则返回第二高桶的上限。如果最低桶的上限大于0,则该桶的下限被假定为0。在这种情况下,通常的线性插值会被应用于该桶。否则,为位于最低桶中的分位数返回最低桶的上限。
如果 b 有0个观察值,则返回 NaN 。如果 b 包含的桶少于两个,则返回 NaN。如果 φ < 0,返回 -Inf 。如果 φ > 1,返回 +Inf 如果 φ = NaN, 返回 NaN。
holt_winters(v range-vector, sf scalar, tf scalar) 基于 v 中的范围为时间序列生成平滑值。平滑因子 sf 越低,旧数据越重要。趋势因子 tf 越高,考虑的数据趋势越多。sf 和 tf 都必须介于0和1之间。
holt_winters 应仅用于gauge。
hour(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的一天中的时刻。返回值的范围为0到23。
idelta(v range-vector) 计算范围向量 v中最后两个样本之间的差,返回具有给定 delta 和等效标签的即时向量。
idelta 应仅用于gauge。
increase(v range-vector) 计算范围向量中时间序列的 increase。会对单调性的中断(如由于目标重启导致的计数器重置)自动进行调整。Increase 被外推以覆盖范围向量选择器中指定的整个时间范围,因此即使计数器仅增加整数,也有可能得到非整数结果。
以下表达式返回范围向量中每个时间序列在过去5分钟内的 HTTP 请求数:
+increase(http_requests_total{job="api-server"}[5m])
+ increase 仅用于计数器。它是 rate(v) 乘以指定时间范围窗口下的秒数的语法糖,主要目的是提高可读性。在记录规则中使用 rate ,以持续跟踪每秒的 increase。
irate(v range-vector) 计算范围向量中时间序列的每秒瞬时增长率。计算基于最后两个数据点。会对单调性的中断(如由于目标重启导致的计数器重置)自动进行调整。
下面的表达式返回范围向量中每个时间序列里最近的两个数据点的每秒 HTTP 请求速率,最多可追溯5分钟:
+irate(http_requests_total{job="api-server"}[5m])
+ 只有在绘制易变的、快速移动的计数器时,才应使用 irate 。警报和慢速计数器应使用 rate,因为 rate 的短暂变化会重置 FOR 子句,完全由罕见峰值构成的图读起来很难。
请注意,当将 irate() 与聚合运算符 (如 sum() )或随时间聚合的函数(任何以 _over_time 结尾的函数)结合使用时,始终先取 irate() ,然后再聚合。否则,当目标重启时,irate() 无法检测计数器重置。
对于 v 中的每个时间序列,label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...) 用 separator 连接所有 src_labels 的所有值,并返回带有包含连接值的 dst_label 标签的时间序列。 此函数中可以有任意数量的 src_labels。
下面的表达式返回一个向量,该向量中的每个时间序列都有一个添加了值 a,b,c 的 foo 标签:
label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3")
+ 对于 v 中的每个时间序列,label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string) 将正则表达式 regex 与标签 src_label 的值进行匹配。如果匹配成功,返回的时间序列中的标签 dst_label 的值将是 replacement 的扩展,以及输入中的原始标签。正则表达式中的捕获组可以用 $1, $2 等引用。如果正则表达式不匹配,则返回的时间序列不发生改变。
此表达式将返回带有值为 a:c 的 service 标签和值为 a 的 foo 标签的时间序列:
label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*")
+ ln(v instant-vector) 计算 v 中所有元素的自然对数。特殊情况有:
ln(+Inf) = +Infln(0) = -Infln(x < 0) = NaNln(NaN) = NaNlog2(v instant-vector) 计算 v中所有元素的二进制对数。特殊情况与 ln 中的特殊情况相同。
log10(v instant-vector) 计算 v中所有元素的十进制对数。特殊情况与 ln 中的特殊情况相同。
minute(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的一小时内的分钟数。返回值的范围为0到59。
month(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的一年中的月份数。返回值的范围为1到12,其中1表示一月。
predict_linear(v range-vector, t scalar) 使用简单线性回归,基于范围向量 v 预测从现在起 t 秒后时间序列的值。范围向量必须有至少两个样本才能执行计算。当在范围向量中发现 +Inf 或 -Inf 时,计算的斜率和偏移值将为 NaN。
predict_linear 应仅用于 gauge。
rate(v range-vector) 计算范围向量中时间序列的每秒平均增长率。会对单调性的中断(如由于目标重启导致的计数器重置)自动进行调整。此外,考虑到遗漏的抓取,或抓取周期与该范围时间段的不完全一致,计算会外推至时间范围的端点。
下面的表达式返回范围向量中的每个时间序列在过去5分钟内的每秒 HTTP 请求速率:
+rate(http_requests_total{job="api-server"}[5m])
+ rate 仅用于计数器。最适合于报警和绘制移动缓慢的计数器。
请注意,当将 rate() 与聚合运算符(如 sum() )或随时间聚合的函数(任何以 _over_time 结尾的函数)结合使用时,始终先取 rate(),然后再聚合。否则,当目标重启时,rate() 无法检测计数器重置。
对于每个输入时间序列,resets(v range-vector) 将在给定时间范围内计数器重置的次数返回为一个即时向量。两个连续样本之间值的任何减少都会被看作计数器重置。
resets 仅用于计数器。
round(v instant-vector, to_nearest=1 scalar) 将 v 中所有元素的样本值舍入到最接近的整数。通过四舍五入来解决平局。可选的 to_nearest 参数允许指定样本值应舍入到的最接近倍数。这个倍数也可能是一个分数。
输入一个单元素向量,scalar(v instant-vector) 将该单元素的样本值作为标量返回。如果输入的向量不是只有一个元素,则 scalar 返回 NaN。
sgn(v instant-vector) 返回一个所有样本值都转换为其符号的向量,定义为:如果 v 为正,则为1;如果 v 为负,则为-1;如果 v 等于0,则为0。
sort(v instant-vector) 返回按样本值升序排序的向量元素。
与 sort 相同,但按降序排序。
sqrt(v instant-vector) 计算 v 中所有元素的平方根。
time() 返回自1970年1月1日(UTC)以来的秒数。请注意,该算法并不返回当前时间,而是返回表达式评估的时间。
timestamp(v instant-vector) 将给定向量的每个样本的时间戳作为自1970年1月1日(UTC)以来的秒数返回。
vector(s scalar) 将标量 s 作为不带标签的向量返回。
year(v=vector(time()) instant-vector) 返回 UTC 中每个给定时间的年份。
以下函数允许聚合一段时间内给定范围向量的每个序列,并返回包含每个序列聚合结果的即时向量:
+avg_over_time(range-vector): 指定间隔内所有点的平均值。min_over_time(range-vector): 指定间隔内所有点的最小值。max_over_time(range-vector): 指定间隔内所有点的最大值。sum_over_time(range-vector): 指定间隔内所有值的总和。count_over_time(range-vector): 指定间隔内所有值的计数。quantile_over_time(scalar, range-vector): 指定间隔内值的 φ 分位数(0 ≤ φ ≤ 1)。stddev_over_time(range-vector): 指定间隔内值的总体标准偏差。stdvar_over_time(range-vector): 指定间隔内值的总体标准方差。last_over_time(range-vector): 指定间隔内最近的点值。present_over_time(range-vector): 指定间隔内任何序列的值1。注意:指定间隔中的所有值在聚合中都具有相同的权重,即使这些值在整个间隔中的间距并不相同。
+The trigonometric functions work in radians:
+acos(v instant-vector): 计算 v 中所有元素的反余弦值(特殊情况)。acosh(v instant-vector): 计算 v 中所有元素的反双曲余弦值(特殊情况)。asin(v instant-vector): 计算 v 中所有元素的反正弦值(特殊情况)。asinh(v instant-vector): 计算 v 中所有元素的反双曲正弦值(特殊情况)。atan(v instant-vector): 计算 v 中所有元素的反正切值(特殊情况)。atanh(v instant-vector): 计算 v 中所有元素的反双曲正切值(特殊情况)。cos(v instant-vector): 计算 v 中所有元素的余弦值(特殊情况)。cosh(v instant-vector): 计算 v 中所有元素的双曲余弦值(特殊情况)。sin(v instant-vector): 计算 v 中所有元素的正弦值(特殊情况)。sinh(v instant-vector): 计算 v 中所有元素的双曲正弦值(特殊情况)。tan(v instant-vector): 计算 v 中所有元素的正切值(特殊情况)。tanh(v instant-vector): 计算 v 中所有元素的双曲正切值(特殊情况)。以下函数可以对角度和弧度进行转换:
+deg(v instant-vector): 将 v 中所有元素的弧度转换为角度。pi(): 返回pi。rad(v instant-vector): 将 v 中所有元素的角度转换为弧度。返回度量为 http_requests_total 的所有时间序列:
http_requests_total+
返回带有度量 http_requests_total 和给定 job 和 handler 标签的时间序列:
http_requests_total{job="apiserver", handler="/api/comments"}
+ 返回同一个向量的整个时间范围(该示例中为查询时间之前的5分钟),使其成为范围向量:
+http_requests_total{job="apiserver", handler="/api/comments"}[5m]
+ 注意,产生范围向量的表达式不能被直接绘制成图形,而是要在表达式浏览器的表格(“控制台“)视图中查看
+使用正则表达式,您可以只为名称符合特定模式的任务选择时间序列,本示例中要求作业以 server 结尾:
http_requests_total{job=~".*server"}
+ Prometheus 中的所有正则表达式都使用 RE2 语法。
+要选择除 4xx 之外的所有 HTTP 状态代码,您可以运行:
+http_requests_total{status!~"4.."}
+ 返回过去30分钟内 http_requests_total 度量的5分钟速率,分辨率为1分钟。
rate(http_requests_total[5m])[30m:1m]+
这是嵌套子查询的一个示例。deriv 函数的子查询使用默认分辨率。注意应该在必要的时候才使用子查询。
max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])+
返回度量名称为 http_requests_total 的所有时间序列在过去5分钟内的每秒速率:
rate(http_requests_total[5m])+
假设 http_requests_total 时间序列都有标签 job(按任务名扇出)和 instance(按任务实例扇出),我们可能会对所有实例的速率求和,这样我们得到较少的输出时间序列,但仍然保留 job 这一维度:
sum by (job) ( + rate(http_requests_total[5m]) +)+
如果我们有两个具有相同维度标签的不同度量,我们可以对其应用二进制运算符,则两侧具有相同标签集的元素将匹配并传播到输出。例如,该表达式返回每个实例在 MiB 中未使用的内存(在一个虚构的集群调度程序上公开有关其所运行实例的度量):
+(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024+
同样的表达式由应用程序求和,则可以写为:
+sum by (app, proc) ( + instance_memory_limit_bytes - instance_memory_usage_bytes +) / 1024 / 1024+
如果相同的虚拟群集调度程序针对每个实例公开了如下 CPU 使用度量:
+instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}
+instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}
+instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}
+instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}
+...
+ 我们得到可以按应用程序(app )和进程类型( proc )分组的前3名 CPU 用户,如下所示:
topk(3, sum by (app, proc) (rate(instance_cpu_time_ns[5m])))+
假设此度量包含每个正在运行实例的一个时间序列,您可以按如下方式计算每个应用程序的运行实例数:
+count by (app) (instance_cpu_time_ns)+
This example calculates the p99 of the nginx-ingress latency by path.
sum by (org_id) (
sum_over_time(
+ {cluster="ops-tools1",container="loki-dev"}
+ |= "metrics.go"
+ | logfmt
+ | unwrap bytes_processed [1m])
+ )
+ This calculates the amount of bytes processed per organization ID.
+Like PromQL, LogQL supports a subset of built-in aggregation operators that can be used to aggregate the element of a single vector, resulting in a new vector of fewer elements but with aggregated values:
+ sum: Calculate sum over labels avg: Calculate the average over labels min: Select minimum over labels max: Select maximum over labels stddev: Calculate the population standard deviation over labels stdvar: Calculate the population standard variance over labels count: Count number of elements in the vector topk: Select largest k elements by sample value bottomk: Select smallest k elements by sample valueThe aggregation operators can either be used to aggregate over all label values or a set of distinct label values by including a without or a by clause:
<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]
parameter is required when using topk and bottomk. topk and bottomk are different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector.
by and without are only used to group the input vector. The without clause removes the listed labels from the resulting vector, keeping all others. The by clause does the opposite, dropping labels that are not listed in the clause, even if their label values are identical between all elements of the vector.
Get the top 10 applications by the highest log throughput:
+topk(10,sum(rate({region="us-east1"}[5m])) by (name))
Get the count of log lines for the last five minutes for a specified job, grouping by level:
+sum(count_over_time({job="mysql"}[5m])) by (level)
Get the rate of HTTP GET requests to the /home endpoint for NGINX logs by region:
avg(rate(({job="nginx"} |= "GET" | json | path="/home")[10s])) by (region)
LogQL is Grafana Loki’s PromQL-inspired query language. Queries act as if they are a distributed grep to aggregate log sources. LogQL uses labels and operators for filtering.
+There are two types of LogQL queries:
+All LogQL queries contain a log stream selector.
+Optionally, the log stream selector can be followed by a log pipeline. A log pipeline is a set of stage expressions that are chained together and applied to the selected log streams. Each expression can filter out, parse, or mutate log lines and their respective labels.
+The following example shows a full log query in action:
+{container="query-frontend",namespace="loki-dev"} |= "metrics.go" | logfmt | duration > 10s and throughput_mb < 500
The query is composed of:
+{container="query-frontend",namespace="loki-dev"} which targets the query-frontend container in the loki-dev namespace.|= "metrics.go" | logfmt | duration > 10s and throughput_mb < 500 which will filter out log that contains the word metrics.go, then parses each log line to extract more labels and filter with them.To avoid escaping special characters you can use the+`(backtick) instead of"when quoting strings. For example`\w+`is the same as"\\w+". This is specially useful when writing a regular expression which contains multiple
backslashes that require escaping.
The stream selector determines which log streams to include in a query’s results. A log stream is a unique source of log content, such as a file. A more granular log stream selector then reduces the number of searched streams to a manageable volume. This means that the labels passed to the log stream selector will affect the relative performance of the query’s execution.
+The log stream selector is specified by one or more comma-separated key-value pairs. Each key is a log label and each value is that label’s value. Curly braces ({ and }) delimit the stream selector.
Consider this stream selector:
+{app="mysql",name="mysql-backup"}
All log streams that have both a label of app whose value is mysql and a label of name whose value is mysql-backup will be included in the query results. A stream may contain other pairs of labels and values, but only the specified pairs within the stream selector are used to determine which streams will be included within the query results.
The same rules that apply for Prometheus Label Selectors apply for Grafana Loki log stream selectors.
+The = operator after the label name is a label matching operator. The following label matching operators are supported:
=: exactly equal!=: not equal=~: regex matches!~: regex does not matchRegex log stream examples:
+{name =~ "mysql.+"}{name !~ "mysql.+"}{name !~ `mysql-\d+`}Note: The =~ regex operator is fully anchored, meaning regex must match against the entire string, including newlines. The regex . character does not match newlines by default. If you want the regex dot character to match newlines you can use the single-line flag, like so: (?s)search_term.+ matches search_term\n.
A log pipeline can be appended to a log stream selector to further process and filter log streams. It is composed of a set of expressions. Each expression is executed in left to right sequence for each log line. If an expression filters out a log line, the pipeline will stop processing the current log line and start processing the next log line.
+Some expressions can mutate the log content and respective labels, which will be then be available for further filtering and processing in subsequent expressions. An example that mutates is the expression
+| line_format "{{.status_code}}"
Log pipeline expressions fall into one of three categories:
+The line filter expression does a distributed grep over the aggregated logs from the matching log streams. It searches the contents of the log line, discarding those lines that do not match the case sensitive expression.
Each line filter expression has a filter operator followed by text or a regular expression. These filter operators are supported:
+|=: Log line contains string!=: Log line does not contain string|~: Log line contains a match to the regular expression!~: Log line does not contain a match to the regular expressionLine filter expression examples:
+Keep log lines that have the substring “error”:
+|= "error"
A complete query using this example:
+{job="mysql"} |= "error"
Discard log lines that have the substring “kafka.server:type=ReplicaManager”:
+!= "kafka.server:type=ReplicaManager"
A complete query using this example:
+{instance=~"kafka-[23]",name="kafka"} != "kafka.server:type=ReplicaManager"
Keep log lines that contain a substring that starts with tsdb-ops and ends with io:2003. A complete query with a regular expression:
{name="kafka"} |~ "tsdb-ops.*io:2003"
Keep log lines that contain a substring that starts with error=, and is followed by 1 or more word characters. A complete query with a regular expression:
{name="cassandra"} |~ `error=\w+`
Filter operators can be chained. Filters are applied sequentially. Query results will have satisfied every filter. This complete query example will give results that include the string error, and do not include the string timeout.
{job="mysql"} |= "error" != "timeout"
When using |~ and !~, Go (as in Golang) RE2 syntax regex may be used. The matching is case-sensitive by default. Switch to case-insensitive matching by prefixing the regular expression with (?i).
While line filter expressions could be placed anywhere within a log pipeline, it is almost always better to have them at the beginning. Placing them at the beginning improves the performance of the query, as it only does further processing when a line matches. For example, while the results will be the same, the query specified with
+{job="mysql"} |= "error" | json | line_format "{{.err}}"
will always run faster than
+{job="mysql"} | json | line_format "{{.message}}" |= "error"
Line filter expressions are the fastest way to filter logs once the log stream selectors have been applied.
+Line filter expressions have support matching IP addresses. See Matching IP addresses for details.
+Label filter expression allows filtering log line using their original and extracted labels. It can contain multiple predicates.
+A predicate contains a label identifier, an operation and a value to compare the label with.
+For example with cluster="namespace"the cluster is the label identifier, the operation is = and the value is “namespace”. The label identifier is always on the right side of the operation.
We support multiple value types which are automatically inferred from the query input.
+"200" or `us-central1`.250, 89.923.String type work exactly like Prometheus label matchers use in log stream selector. This means you can use the same operations (=,!=,=~,!~).
The string type is the only one that can filter out a log line with a label __error__.
+ Using Duration, Number and Bytes will convert the label value prior to comparision and support the following comparators:
+== or = for equality.!= for inequality.> and >= for greater than and greater than or equal.< and <= for lesser than and lesser than or equal.For instance, logfmt | duration > 1m and bytes_consumed > 20MB
If the conversion of the label value fails, the log line is not filtered and an __error__ label is added. To filters those errors see the pipeline errors section.
You can chain multiple predicates using and and or which respectively express the and and or binary operations. and can be equivalently expressed by a comma, a space or another pipe. Label filters can be place anywhere in a log pipeline.
This means that all the following expressions are equivalent:
+
+ | duration >= 20ms or size == 20kb and method!~"2.."
+ | duration >= 20ms or size == 20kb | method!~"2.."
+ | duration >= 20ms or size == 20kb , method!~"2.."
+ | duration >= 20ms or size == 20kb method!~"2.."
+
By default the precedence of multiple predicates is right to left. You can wrap predicates with parenthesis to force a different precedence left to right.
+For example the following are equivalent.
+
+ | duration >= 20ms or method="GET" and size <= 20KB
+ | ((duration >= 20ms or method="GET") and size <= 20KB)
+
It will evaluate first duration >= 20ms or method="GET". To evaluate first method="GET" and size <= 20KB, make sure to use proper parenthesis as shown below.
| duration >= 20ms or (method="GET" and size <= 20KB)
Label filter expressions are the only expression allowed after the unwrap expression. This is mainly to allow filtering errors from the metric extraction.+
Label filter expressions have support matching IP addresses. See Matching IP addresses for details.
+Parser expression can parse and extract labels from the log content. Those extracted labels can then be used for filtering using label filter expressions or for metric aggregations.
+Extracted label keys are automatically sanitized by all parsers, to follow Prometheus metric name convention.(They can only contain ASCII letters and digits, as well as underscores and colons. They cannot start with a digit.)
+For instance, the pipeline | json will produce the following mapping:
{ "a.b": {c: "d"}, e: "f" }
->
+{a_b_c="d", e="f"}
In case of errors, for instance if the line is not in the expected format, the log line won’t be filtered but instead will get a new __error__ label added.
If an extracted label key name already exists in the original log stream, the extracted label key will be suffixed with the _extracted keyword to make the distinction between the two labels. You can forcefully override the original label using a label formatter expression. However if an extracted key appears twice, only the latest label value will be kept.
Loki supports JSON, logfmt, pattern, regexp and unpack parsers.
+It’s easier to use the predefined parsers json and logfmt when you can. If you can’t, the pattern and regexp parsers can be used for log lines with an unusual structure. The pattern parser is easier and faster to write; it also outperforms the regexp parser. Multiple parsers can be used by a single log pipeline. This is useful for parsing complex logs. There are examples in Multiple parsers.
The json parser operates in two modes:
+1. without parameters:
+Adding | json to your pipeline will extract all json properties as labels if the log line is a valid json document. Nested properties are flattened into label keys using the _ separator.
Note: Arrays are skipped.
+For example the json parsers will extract from the following document:
+{
+ "protocol": "HTTP/2.0",
+ "servers": ["129.0.1.1","10.2.1.3"],
+ "request": {
+ "time": "6.032",
+ "method": "GET",
+ "host": "foo.grafana.net",
+ "size": "55",
+ "headers": {
+ "Accept": "*/*",
+ "User-Agent": "curl/7.68.0"
+ }
+ },
+ "response": {
+ "status": 401,
+ "size": "228",
+ "latency_seconds": "6.031"
+ }
+}
+ The following list of labels:
+"protocol" => "HTTP/2.0" +"request_time" => "6.032" +"request_method" => "GET" +"request_host" => "foo.grafana.net" +"request_size" => "55" +"response_status" => "401" +"response_size" => "228" +"response_latency_seconds" => "6.031"+
2. with parameters:
+Using | json label="expression", another="expression" in your pipeline will extract only the specified json fields to labels. You can specify one or more expressions in this way, the same as label_format; all expressions must be quoted.
Currently, we only support field access (my.field, my["field"]) and array access (list[0]), and any combination of these in any level of nesting (my.list[0]["field"]).
For example, | json first_server="servers[0]", ua="request.headers[\"User-Agent\"] will extract from the following document:
{
+ "protocol": "HTTP/2.0",
+ "servers": ["129.0.1.1","10.2.1.3"],
+ "request": {
+ "time": "6.032",
+ "method": "GET",
+ "host": "foo.grafana.net",
+ "size": "55",
+ "headers": {
+ "Accept": "*/*",
+ "User-Agent": "curl/7.68.0"
+ }
+ },
+ "response": {
+ "status": 401,
+ "size": "228",
+ "latency_seconds": "6.031"
+ }
+}
+
+ The following list of labels:
+"first_server" => "129.0.1.1" +"ua" => "curl/7.68.0"+
If an array or an object returned by an expression, it will be assigned to the label in json format.
+For example, | json server_list="servers", headers="request.headers" will extract:
"server_list" => `["129.0.1.1","10.2.1.3"]`
+"headers" => `{"Accept": "*/*", "User-Agent": "curl/7.68.0"}`
+ If the label to be extracted is same as the original JSON field, expression can be written as just | json <label>
For example, to extract servers fields as label, expression can be written as following
| json servers will extract:
"servers" => `["129.0.1.1","10.2.1.3"]`+
Note that | json servers is same as | json servers="servers"
The logfmt parser can be added using the | logfmt and will extract all keys and values from the logfmt formatted log line.
For example the following log line:
+at=info method=GET path=/ host=grafana.net fwd="124.133.124.161" service=8ms status=200+
will get those labels extracted:
+"at" => "info" +"method" => "GET" +"path" => "/" +"host" => "grafana.net" +"fwd" => "124.133.124.161" +"service" => "8ms" +"status" => "200"+
The pattern parser allows the explicit extraction of fields from log lines by defining a pattern expression (| pattern "<pattern-expression>"). The expression matches the structure of a log line.
Consider this NGINX log line.
+0.191.12.2 - - [10/Jun/2021:09:14:29 +0000] "GET /api/plugins/versioncheck HTTP/1.1" 200 2 "-" "Go-http-client/2.0" "13.76.247.102, 34.120.177.193" "TLSv1.2" "US" ""
+
This log line can be parsed with the expression
+<ip> - - <_> "<method> <uri> <_>" <status> <size> <_> "<agent>" <_>
to extract these fields:
+"ip" => "0.191.12.2" +"method" => "GET" +"uri" => "/api/plugins/versioncheck" +"status" => "200" +"size" => "2" +"agent" => "Go-http-client/2.0"+
A pattern expression is composed of captures and literals.
+A capture is a field name delimited by the < and > characters. <example> defines the field name example. An unnamed capture appears as <_>.
The unnamed capture skips matched content.
Captures are matched from the line beginning or the previous set of literals, to the line end or the next set of literals. If a capture is not matched, the pattern parser will stop.
+Literals can be any sequence of UTF-8 characters, including whitespace characters.
+By default, a pattern expression is anchored at the start of the log line. If the expression starts with literals, then the log line must also start with the same set of literals. Use <_> at the beginning of the expression if you don’t want to anchor the expression at the start.
Consider the log line
+level=debug ts=2021-06-10T09:24:13.472094048Z caller=logging.go:66 traceID=0568b66ad2d9294c msg="POST /loki/api/v1/push (204) 16.652862ms"+
To match msg=", use the expression:
<_> msg="<method> <path> (<status>) <latency>"+
A pattern expression is invalid if
+Unlike the logfmt and json, which extract implicitly all values and takes no parameters, the regexp parser takes a single parameter | regexp "<re>" which is the regular expression using the Golang RE2 syntax.
The regular expression must contain a least one named sub-match (e.g (?P<name>re)), each sub-match will extract a different label.
For example the parser | regexp "(?P<method>\\w+) (?P<path>[\\w|/]+) \\((?P<status>\\d+?)\\) (?P<duration>.*)" will extract from the following line:
+POST /api/prom/api/v1/query_range (200) 1.5s+
those labels:
+"method" => "POST" +"path" => "/api/prom/api/v1/query_range" +"status" => "200" +"duration" => "1.5s"+
The unpack parser parses a JSON log line, unpacking all embedded labels from Promtail’s pack stage. A special property _entry will also be used to replace the original log line.
For example, using | unpack with the log line:
{
+ "container": "myapp",
+ "pod": "pod-3223f",
+ "_entry": "original log message"
+}
+ extracts the container and pod labels; it sets original log message as the new log line.
You can combine the unpack and json parsers (or any other parsers) if the original embedded log line is of a specific format.
The line format expression can rewrite the log line content by using the text/template format. It takes a single string parameter | line_format "{{.label_name}}", which is the template format. All labels are injected variables into the template and are available to use with the {{.label_name}} notation.
For example the following expression:
+{container="frontend"} | logfmt | line_format "{{.query}} {{.duration}}"
+ Will extract and rewrite the log line to only contains the query and the duration of a request.
+You can use double quoted string for the template or backticks `{{.label_name}}` to avoid the need to escape special characters.
line_format also supports math functions. Example:
If we have the following labels ip=1.1.1.1, status=200 and duration=3000(ms), we can divide the duration by 1000 to get the value in seconds.
{container="frontend"} | logfmt | line_format "{{.ip}} {{.status}} {{div .duration 1000}}"
+ The above query will give us the line as 1.1.1.1 200 3
See template functions to learn about available functions in the template format.
+The | label_format expression can rename, modify or add labels. It takes as parameter a comma separated list of equality operations, enabling multiple operations at once.
When both side are label identifiers, for example dst=src, the operation will rename the src label into dst.
The right side can alternatively be a template string (double quoted or backtick), for example dst="{{.status}} {{.query}}", in which case the dst label value is replaced by the result of the text/template evaluation. This is the same template engine as the | line_format expression, which means labels are available as variables and you can use the same list of functions.
In both cases, if the destination label doesn’t exist, then a new one is created.
+The renaming form dst=src will drop the src label after remapping it to the dst label. However, the template form will preserve the referenced labels, such that dst="{{.src}}" results in both dst and src having the same value.
A single label name can only appear once per expression. This means+| label_format foo=bar,foo="new"is not allowed but you can use two expressions for the desired effect:| label_format foo=bar | label_format foo="new"
Filtering should be done first using label matchers, then line filters (when possible) and finally using label filters. The following query demonstrate this.
+{cluster="ops-tools1", namespace="loki-dev", job="loki-dev/query-frontend"} |= "metrics.go" !="out of order" | logfmt | duration > 30s or status_code!="200"
To extract the method and the path of the following logfmt log line:
+level=debug ts=2020-10-02T10:10:42.092268913Z caller=logging.go:66 traceID=a9d4d8a928d8db1 msg="POST /api/prom/api/v1/query_range (200) 1.5s"
You can use multiple parsers (logfmt and regexp) like this.
+{job="loki-ops/query-frontend"} | logfmt | line_format "{{.msg}}" | regexp "(?P<method>\\w+) (?P<path>[\\w|/]+) \\((?P<status>\\d+?)\\) (?P<duration>.*)"
This is possible because the | line_format reformats the log line to become POST /api/prom/api/v1/query_range (200) 1.5s which can then be parsed with the | regexp ... parser.
The following query shows how you can reformat a log line to make it easier to read on screen.
+{cluster="ops-tools1", name="querier", namespace="loki-dev"}
+ |= "metrics.go" != "loki-canary"
+ | logfmt
+ | query != ""
+ | label_format query="{{ Replace .query \"\\n\" \"\" -1 }}"
+ | line_format "{{ .ts}}\t{{.duration}}\ttraceID = {{.traceID}}\t{{ printf \"%-100.100s\" .query }} "
+ Label formatting is used to sanitize the query while the line format reduce the amount of information and creates a tabular output.
+For these given log lines:
+level=info ts=2020-10-23T20:32:18.094668233Z caller=metrics.go:81 org_id=29 traceID=1980d41501b57b68 latency=fast query="{cluster=\"ops-tools1\", job=\"loki-ops/query-frontend\"} |= \"query_range\"" query_type=filter range_type=range length=15m0s step=7s duration=650.22401ms status=200 throughput_mb=1.529717 total_bytes_mb=0.994659
+level=info ts=2020-10-23T20:32:18.068866235Z caller=metrics.go:81 org_id=29 traceID=1980d41501b57b68 latency=fast query="{cluster=\"ops-tools1\", job=\"loki-ops/query-frontend\"} |= \"query_range\"" query_type=filter range_type=range length=15m0s step=7s duration=624.008132ms status=200 throughput_mb=0.693449 total_bytes_mb=0.432718
+ The result would be:
+2020-10-23T20:32:18.094668233Z 650.22401ms traceID = 1980d41501b57b68 {cluster="ops-tools1", job="loki-ops/query-frontend"} |= "query_range"
+2020-10-23T20:32:18.068866235Z 624.008132ms traceID = 1980d41501b57b68 {cluster="ops-tools1", job="loki-ops/query-frontend"} |= "query_range"
+ Metric queries extend log queries by applying a function to log query results. This powerful feature creates metrics from logs.
+Metric queries can be used to calculate the rate of error messages or the top N log sources with the greatest quantity of logs over the last 3 hours.
+Combined with parsers, metric queries can also be used to calculate metrics from a sample value within the log line, such as latency or request size. All labels, including extracted ones, will be available for aggregations and generation of new series.
+LogQL shares the range vector concept of Prometheus. In Grafana Loki, the selected range of samples is a range of selected log or label values.
+The aggregation is applied over a time duration. Loki defines Time Durations with the same syntax as Prometheus.
+Loki supports two types of range vector aggregations: log range aggregations and unwrapped range aggregations.
+A log range aggregation is a query followed by a duration. A function is applied to aggregate the query over the duration. The duration can be placed after the log stream selector or at end of the log pipeline.
+The functions:
+ rate(log-range): calculates the number of entries per second count_over_time(log-range): counts the entries for each log stream within the given range. bytes_rate(log-range): calculates the number of bytes per second for each stream. bytes_over_time(log-range): counts the amount of bytes used by each log stream for a given range. absent_over_time(log-range): returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (absent_over_time is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)Examples:
+Count all the log lines within the last five minutes for the MySQL job.
+count_over_time({job="mysql"}[5m])
This aggregation includes filters and parsers. It returns the per-second rate of all non-timeout errors within the last minutes per host for the MySQL job and only includes errors whose duration is above ten seconds.
+sum by (host) (rate({job="mysql"} |= "error" != "timeout" | json | duration > 10s [1m]))
Unwrapped ranges uses extracted labels as sample values instead of log lines. However to select which label will be used within the aggregation, the log query must end with an unwrap expression and optionally a label filter expression to discard errors.
+The unwrap expression is noted | unwrap label_identifier where the label identifier is the label name to use for extracting sample values.
Since label values are string, by default a conversion into a float (64bits) will be attempted, in case of failure the __error__ label is added to the sample. Optionally the label identifier can be wrapped by a conversion function | unwrap <function>(label_identifier), which will attempt to convert the label value from a specific format.
We currently support the functions:
+duration_seconds(label_identifier) (or its short equivalent duration) which will convert the label value in seconds from the go duration format (e.g 5m, 24s30ms).bytes(label_identifier) which will convert the label value to raw bytes applying the bytes unit (e.g. 5 MiB, 3k, 1G).Supported function for operating over unwrapped ranges are:
+ rate(unwrapped-range): calculates per second rate of the sum of all values in the specified interval. rate_counter(unwrapped-range): calculates per second rate of the values in the specified interval and treating them as “counter metric” sum_over_time(unwrapped-range): the sum of all values in the specified interval. avg_over_time(unwrapped-range): the average value of all points in the specified interval. max_over_time(unwrapped-range): the maximum value of all points in the specified interval. min_over_time(unwrapped-range): the minimum value of all points in the specified interval first_over_time(unwrapped-range): the first value of all points in the specified interval last_over_time(unwrapped-range): the last value of all points in the specified interval stdvar_over_time(unwrapped-range): the population standard variance of the values in the specified interval. stddev_over_time(unwrapped-range): the population standard deviation of the values in the specified interval. quantile_over_time(scalar,unwrapped-range): the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval. absent_over_time(unwrapped-range): returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. ( absent_over_time is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)Except for sum_over_time,absent_over_time and rate, unwrapped range aggregations support grouping.
<aggr-op>([parameter,] <unwrapped-range>) [without|by (<label list>)]
Which can be used to aggregate over distinct labels dimensions by including a without or by clause.
without removes the listed labels from the result vector, while all other labels are preserved the output. by does the opposite and drops labels that are not listed in the by clause, even if their label values are identical between all elements of the vector.
quantile_over_time(0.99,
+ {cluster="ops-tools1",container="ingress-nginx"}
+ | json
+ | __error__ = ""
+ | unwrap request_time [1m]) by (path)
+ This example calculates the p99 of the nginx-ingress latency by path.
+sum by (org_id) (
+ sum_over_time(
{cluster="ops-tools1",container="loki-dev"}
|= "metrics.go"
| logfmt
@@ -1882,6 +3507,7 @@ export default {
],
selectValue: this.$t('overall.metric'),
selectIcon: 'nz-icon nz-icon-Metrics',
+ // language: localStorage.getItem('nz-language') || 'en',
showMetrics: true,
promqlCount: 1,
promqlKeys: [],
@@ -2667,6 +4293,11 @@ export default {
})
}
},
+ computed: {
+ language () {
+ return this.$store.getters.getLanguage
+ }
+ },
watch: {
promqlCount (n, o) {
this.expressionChange()