Grafana Series (I): A full-stack observability demo based on Grafana

This article was last updated on: February 7, 2024 pm

📚️Reference:

https://github.com/grafana/intro-to-mlt

This is a companion library of talks on the three pillars of observability in Grafana.

It comes in the form of a self-enclosed Docker sandbox that includes all the components needed to run and experiment with the services provided on your local machine.

Grafana full-stack observability products

Grafana 全栈可观察性

Concrete observability conversion diagram

可观察性转换图

precondition

overview

This series of demos is based on the applications and code in this library, including:

  • Docker Compose manifest for easy setup.
  • Applications of three services:
    • A service that requests data from a REST API server.
    • A REST API server that receives requests and leverages a database to store/retrieve data for those requests.
    • A Postgres database for storing/retrieving data.
  • Tempo instances are used to store trace information.
  • A Loki instance to store log information.
  • An instance of Prometheus, which stores metrics information.
  • A Grafana instance to visualize observability information.
  • An instance of Grafana Agent that receives traces and generates metrics and logs based on those traces.
  • A Node Exporter instance that retrieves resource metrics from localhost.

Run the demo environment

Docker Compose downloads the required Docker image and then launches the demo environment. Data will be emitted from microservices applications and stored in Loki, Tempo, and Prometheus. You can log in to your Grafana instance to visualize this data. To execute the environment and log in.

  1. Launch a new command line interface in your operating system and run:

    1
    docker-compose up
  2. Log in to your local Grafana instance at:http://localhost:3000/ Note: This assumes that port 3000 is not already in use. If this port is not available, edit itdocker-compose.ymlfile, and modify this line

    1
    - "3000:3000"

    To some other free host port, for example:

    1
    - "3123:3000"
  3. visit MLT dashboard. (MLT: Metrics/Logging/Tracing)

  4. use Grafana Explorer Access the data source.

🐾 Note:

For users in China, you can do so on demand build Add proxy, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mythical-requester:
build:
context: ./source
dockerfile: docker/Dockerfile
args:
HTTP_PROXY: http://192.168.2.9:7890
HTTPS_PROXY: http://192.168.2.9:7890
SERVICE: mythical-beasts-requester

mythical-server:
build:
context: ./source
dockerfile: docker/Dockerfile
args:
HTTP_PROXY: http://192.168.2.9:7890
HTTPS_PROXY: http://192.168.2.9:7890
SERVICE: mythical-beasts-server

prometheus:
build:
context: ./prometheus
args:
HTTP_PROXY: http://192.168.2.9:7890
HTTPS_PROXY: http://192.168.2.9:7890

Grafana

Grafana is a visualization tool that allows you to create dashboards from a variety of data sources. More information can be found here Found it

The Grafana instance is indocker-compose.ymlThe Grafana section of the manifest is described.

1
2
3
4
5
6
7
8
9
10
# The Grafana dashboarding server.
grafana:
image: grafana/grafana
volumes:
- "./grafana/definitions:/var/lib/grafana/dashboards"
- "./grafana/provisioning:/etc/grafana/provisioning"
ports:
- "3000:3000"
environment:
- GF_FEATURE_TOGGLES_ENABLE=tempoSearch,tempoServiceGraph

It:

  • Mount two repository directories to provide a prebuilt data source for the data (./grafana/provisioning/datasources.yaml)。
  • Prebuilt dashboards to correlate metrics, logs, and traces. (./grafana/definitions/mlt.yaml)
  • Provided for local login3000Port.
  • Enable two Tempo features, span search and service graph support.

No custom configuration is used.

📚️ Reference:

Grafana Agent|Grafana Laboratory (grafana.com)

  • “It’s often used as a trace pipeline to offload traces from applications and forward them to the storage backend. The Grafana Agent trace stack is built using OpenTelemetry.”
  • “Grafana Agent supports receiving traces in multiple formats: OTLP (OpenTelemetry), Jaeger, Zipkin, and OpenCensus.”

Generate metrics from spans - Grafana Laboratory (grafana.com)

Prometheus

Prometheus is a back-end storage and service for scraping (pulling) metric data from a variety of sources. More information can be found at Over here Found it. In addition, Mimir is a long-term retention store of Prometheus data, and information about it can be found at Over here Found it.

The Prometheus instance is indocker-compose.ymlof the listprometheussection is described.

1
2
3
4
5
6
7
8
prometheus:
build:
context: ./prometheus
args:
HTTP_PROXY: http://192.168.2.9:7890
HTTPS_PROXY: http://192.168.2.9:7890
ports:
- "9090:9090"

It is made ofprometheusA modified Dockerfile build in the directory. This copies the configuration file to the new image and enables some features by modifying the command string used at startup (including Exemplar support - "--enable-feature=exemplar-storage")。 Prometheus exposes its primary interface on port 9090.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.

remote_read:
scrape_configs:
# Scrape Prometheus' own metrics.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
labels:
group: 'prometheus'

# Scrape from the Mythical Server service.
- job_name: 'mythical-server'
scrape_interval: 2s
static_configs:
- targets: ['mythical-server:4000']
labels:
group: 'mythical'

# Scrape from the Mythical Requester service.
- job_name: 'mythical-requester'
scrape_interval: 2s
static_configs:
- targets: ['mythical-requester:4001']
labels:
group: 'mythical'

# Scrape from the Node exporter, giving us resource usage.
- job_name: 'node'
scrape_interval: 5s
static_configs:
- targets: ['nodeexporter:9100']
labels:
group: 'resources'

# Scrape from Grafana Agent, giving us metrics from traces it collects.
- job_name: 'span-metrics'
scrape_interval: 2s
static_configs:
- targets: ['agent:12348']
labels:
group: 'mythical'

# Scrape from Grafana Agent, giving us metrics from traces it collects.
- job_name: 'agent-metrics'
scrape_interval: 2s
static_configs:
- targets: ['agent:12345']
labels:
group: 'mythical'

Configuration file (prometheus/prometheus.ymlSeveral scrape jobs are defined, including:

  • Retrieve metrics from the Prometheus instance itself. (job_name: 'prometheus')

  • Get metrics from microservices apps. (job_name: 'mythical-server' and job_name: 'mythical-requester')

  • Metrics from installed instances of Node Exporter. (job_name: 'node')

  • Metrics from the Grafana Agent, derived from incoming trace data. (job_name: 'span-metrics')

📚️References:

Exemplars storage | Prometheus Docs

  • OpenMetrics The ability to scrape targets has been introduced to add examples (Exemplars) to specific metrics.A paradigm is a reference to data outside the set of measures. A common use case is the ID of a program trace.

Loki

Loki is a back-end store for long-term log retention. More information can be found at Over here Found it.

The Loki instance is indocker-compose.ymlof the listlokisection is described.

1
2
3
4
loki:
image: grafana/loki
ports:
- "3100:3100"

This instance is only availablelatest loki mirror, and in3100The port exposes its interface.

Microservices applications send their logs directly to Loki instances in that environment through their REST APIs.

Tempo

Tempo is a back-end store for long-term retention of traces. More information can be found at Over here Found it.

The Tempo instance is indocker-compose.ymlof the listtemposection is described.

The Tempo service imports a configuration file (tempo/tempo.yaml), the file initializes the service with some reasonable defaults and allows receiving traces in a variety of different formats.

1
2
3
4
5
6
7
8
9
10
11
tempo:
image: grafana/tempo:1.2.1
ports:
- "3200:3200"
- "4317:4317"
- "55680:55680"
- "55681:55681"
- "14250:14250"
command: ["-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo/tempo.yaml:/etc/tempo.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
server:
http_listen_port: 3200

distributor:
receivers: # 此配置将监听 tempo 能够监听的所有端口和协议。
jaeger: # 更多的配置信息可以从 OpenTelemetry 收集器中获得
protocols: # 在这里:https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
thrift_http: #
grpc: # 对于生产部署来说,你应该只启用你需要的接收器!
thrift_binary:
thrift_compact:
otlp:
protocols:
http:
grpc:

ingester:
trace_idle_period: 10s # 在一个追踪没有收到跨度后,认为它已经完成并将其冲走的时间长度。
max_block_bytes: 1_000_000 # 当它达到这个尺寸时,切掉头块或。..
max_block_duration: 5m # 这么长时间

compactor:
compaction:
compaction_window: 1h # 在这个时间窗口中的块将被压缩在一起
max_block_bytes: 100_000_000 # 压实块的最大尺寸
block_retention: 1h
compacted_block_retention: 10m

storage:
trace:
backend: local # 使用的后端配置
block:
bloom_filter_false_positive: .05 # 较低的值会产生较大的过滤器,但会产生较少的假阳性结果。
index_downsample_bytes: 1000 # 每条索引记录的字节数
encoding: zstd # 块编码 / 压缩。 选项:none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
wal:
path: /tmp/tempo/wal # 在本地存储 wal 的地方
encoding: none # wal 编码 / 压缩。 选项:none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
local:
path: /tmp/tempo/blocks
pool:
max_workers: 100 # worker 池决定了对对象存储后台的并行请求的数量
queue_depth: 10000

search_enabled: true

Grafana Agent

Grafana Agent is a locally installed agent that acts as:

  • Prometheus scraping services.
  • Tempo backend service receiver and trace span processor.
  • A Promtail (Loki log sink) instance.

Span metrics overview

The Grafana Agent has remote write capabilities, allowing it to send metrics, logs, and trace data to back-end storage such as Mimir, Loki, and Tempo. More information about the Grafana Agent can be found at Over here Found it.

Its main role in this environment is to receive trace spans from microservices applications and process them to extract metrics and log information, and then store them to the final back-end storage.

Its configuration file can be found inagent/config.yamlfound.

1
2
3
4
5
6
7
8
9
10
11
12
13
agent:
image: grafana/agent:v0.24.0
ports:
- "12347:12345"
- "12348:12348"
- "6832:6832"
- "55679:55679"
volumes:
- "${PWD}/agent/config.yaml:/etc/agent/agent.yaml"
command: [
"-config.file=/etc/agent/agent.yaml",
"-server.http.address=0.0.0.0:12345",
]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
server:
log_level: debug

# 配置一个日志摄取端点,用于自动记录功能。
logs:
configs:
- name: loki
clients:
- url: http://loki:3100/loki/api/v1/push
external_labels:
job: agent
positions_directory: /tmp/positions

# 配置一个 Tempo 实例来接收来自微服务的追踪。
traces:
configs:
- name: latencyEndpoint
# 在 6832 端口接收 Jaeger 格式的追踪信息。
receivers:
jaeger:
protocols:
thrift_binary:
endpoint: "0.0.0.0:6832"
# 向 Tempo 实例发送成批的跟踪数据。
remote_write:
- endpoint: tempo:55680
insecure: true
# 从传入的跟踪跨度生成普罗米修斯指标。
spanmetrics:
# 添加 http.target 和 http.method span 标签作为度量数据的标签。
dimensions:
- name: http.method
- name: http.target
# 在 12348 端口暴露这些指标。
handler_endpoint: 0.0.0.0:12348
# 从传入的跟踪数据中自动生成日志。
automatic_logging:
# 使用在配置文件开始时定义的日志实例。
backend: logs_instance
logs_instance_name: loki
# 每个根跨度记录一行(即每个跟踪记录一行)。
roots: true
processes: false
spans: false
# 在日志行中添加 http.method、http.target 和 http.status_code span 标签。如果有的话。
span_attributes:
- http.method
- http.target
- http.status_code
# 强制将跟踪 ID 设置为 `traceId`。
overrides:
trace_id_key: "traceId"
# 启用服务图。
service_graphs:
enabled: true

glossary

English Chinese Notes
Exemplars Example
Derived fields Derived field
Metrics Measure
Logging Log
Tracing Trace
observability Observability
span search Span Search Tempo function - requires Grafana Agent
service graph Service graph support Tempo function - requires Grafana Agent
scrape Scraping Prometheus vocabulary

Grafana series of articles

Grafana series of articles