Grafana Article Series (XIV): Helm Installing Loki

This article was last updated on: July 24, 2024 am

preface

I’ve written or translated so many Loki-related articles, but I haven’t written how to install 😓 it

Now start with how to install Loki using Helm.

precondition

There is Helm, and the official source of Grafana is added:

1
2
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

🐾Warning:

The network is limited, and it is necessary to ensure that the network is smooth.

deploy

Architecture

Promtail (collection) + Loki (storage and processing) + Grafana (display)

Loki 架构图

Promtail

  1. Enable Prometheus Operator Service Monitor for monitoring
  2. increaseexternal_labels - clusterto identify which K8S cluster it is;
  3. pipeline_stages Changed to crito process the cri log (because my cluster uses a container runtime of CRI, and Loki Helm is configured by default.) docker)
  4. Increase to the pair systemd-journal Log collection:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
promtail:
config:
snippets:
pipelineStages:
- cri: {}

extraArgs:
- -client.external-labels=cluster=ctyun
# systemd-journal 额外配置:
# Add additional scrape config
extraScrapeConfigs:
- job_name: journal
journal:
path: /var/log/journal
max_age: 12h
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__hostname']
target_label: 'hostname'

# Mount journal directory into Promtail pods
extraVolumes:
- name: journal
hostPath:
path: /var/log/journal

extraVolumeMounts:
- name: journal
mountPath: /var/log/journal
readOnly: true

Loki

  1. Enable persistent storage
  2. Enable Prometheus Operator Service Monitor for monitoring
    1. And configure Loki-related Prometheus Rules to make alarms
  3. Because the amount of personal cluster logs is small, Adjust the ingester-related configuration appropriately

Grafana

  1. Enable persistent storage
  2. Enable Prometheus Operator Service Monitor for monitoring
  3. Sidecars are configured to facilitate dynamic update of dashboards/datasources/plugins/notifiers;

Helm installation

Install with the following command:

1
helm upgrade --install loki --namespace=loki --create-namespace grafana/loki-stack -f values.yaml

The custom values.yaml is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
loki:
enabled: true
persistence:
enabled: true
storageClassName: local-path
size: 20Gi
serviceScheme: https
user: admin
password: changit!
config:
ingester:
chunk_idle_period: 1h
max_chunk_age: 4h
compactor:
retention_enabled: true
serviceMonitor:
enabled: true
prometheusRule:
enabled: true
rules:
# Some examples from https://awesome-prometheus-alerts.grep.to/rules.html#loki
- alert: LokiProcessTooManyRestarts
expr: changes(process_start_time_seconds{job=~"loki"}[15m]) > 2
for: 0m
labels:
severity: warning
annotations:
summary: Loki process too many restarts (instance {{ $labels.instance }})
description: "A loki process had too many restarts (target {{ $labels.instance }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: LokiRequestErrors
expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10
for: 15m
labels:
severity: critical
annotations:
summary: Loki request errors (instance {{ $labels.instance }})
description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: LokiRequestPanic
expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
for: 5m
labels:
severity: critical
annotations:
summary: Loki request panic (instance {{ $labels.instance }})
description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: LokiRequestLatency
expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le))) > 1
for: 5m
labels:
severity: critical
annotations:
summary: Loki request latency (instance {{ $labels.instance }})
description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

promtail:
enabled: true
config:
snippets:
pipelineStages:
- cri: {}
extraArgs:
- -client.external-labels=cluster=ctyun
serviceMonitor:
# -- If enabled, ServiceMonitor resources for Prometheus Operator are created
enabled: true

# systemd-journal 额外配置:
# Add additional scrape config
extraScrapeConfigs:
- job_name: journal
journal:
path: /var/log/journal
max_age: 12h
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__hostname']
target_label: 'hostname'

# Mount journal directory into Promtail pods
extraVolumes:
- name: journal
hostPath:
path: /var/log/journal

extraVolumeMounts:
- name: journal
mountPath: /var/log/journal
readOnly: true

fluent-bit:
enabled: false

grafana:
enabled: true
adminUser: caseycui
adminPassword: changit!
## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders
## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards
sidecar:
image:
repository: quay.io/kiwigrid/k8s-sidecar
tag: 1.15.6
sha: ''
dashboards:
enabled: true
SCProvider: true
label: grafana_dashboard
datasources:
enabled: true
# label that the configmaps with datasources are marked with
label: grafana_datasource
plugins:
enabled: true
# label that the configmaps with plugins are marked with
label: grafana_plugin
notifiers:
enabled: true
# label that the configmaps with notifiers are marked with
label: grafana_notifier
image:
tag: 8.3.5
persistence:
enabled: true
size: 2Gi
storageClassName: local-path
serviceMonitor:
enabled: true
imageRenderer:
enabled: disable

filebeat:
enabled: false

logstash:
enabled: false

The installed resource topology is as follows:

Loki K8S 资源拓扑

Day 2 Configuration (On-Demand)

Grafana adds Dashboards

Under the same NS, create the following ConfigMap: (Just type.)grafana_dashboard This label will be automatically imported by Grafana’s sidecar.)

1
2
3
4
5
6
7
8
9
apiVersion: v1
kind: ConfigMap
metadata:
name: sample-grafana-dashboard
labels:
grafana_dashboard: "1"
data:
k8s-dashboard.json: |-
[...]

Grafana adds DataSource

Under the same NS, create the following ConfigMap: (Just type.)grafana_datasource This label will be automatically imported by Grafana’s sidecar.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-loki-stack
labels:
grafana_datasource: '1'
data:
loki-stack-datasource.yaml: |-
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
version: 1

Traefik configures Grafana IngressRoute

Since I am using Traefik 2, configure Ingress via CRD IngressRoute, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: grafana
spec:
entryPoints:
- web
- websecure
routes:
- kind: Rule
match: Host(`grafana.e-whisper.com`)
middlewares:
- name: hsts-header
namespace: kube-system
- name: redirectshttps
namespace: kube-system
services:
- name: loki-grafana
namespace: monitoring
port: 80
tls: {}

The final effect

As follows:

Grafana Explore Logs

🎉🎉🎉

📚️ Reference documentation

Grafana series of articles

Grafana series of articles