Prometheus Performance Tuning - Horizontal Slicing

This article was last updated on: February 7, 2024 pm

Brief introduction

Before the author had 2 consecutive articles:

Some of Prometheus’ performance tuning techniques are described, including solving high cardinality problems and streamlining Prometheus’ metrics and storage footprint.

Today, we will introduce a new tuning idea: horizontal sharding.

Horizontal sharding

If you are facing a problem not with high cardinality due to labeling, but because of the rapid expansion of monitoring scale and the large number of instances that need to be monitored, you can use Prometheushashmod relabel action to optimize performance. In this way, a Prometheus only needs to monitor a subset of all the various instances in the face of thousands of instances.

📝Notes

Prometheus also has vertical sharding, vertical sharding is much simpler, to put it bluntly, it is enough to configure different jobs to monitor different components.
Horizontal sharding is relatively technical.

Horizontal sharding configuration

The configuration is as follows, using a Prometheus to capture part of the targets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
global:
external_labels:
env: prod
scraper: 2
scrape_configs:
- job_name: my_job
...
relabel_configs:
- source_labels: [__address__]
modulus: 4
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: 2
action: keep

at modulus , configured with 4 as a base. Each Prometheus only captures 1/4, for example, the above configuration only crawls hashmod after __temp_hash is 2 targets.

After the crawl is complete, it can be passed again remote_write Scenarios such as Thanos Mimir VM aggregate data from these four Prometheus Servers.

🎉🎉🎉