How to streamline Prometheus' metrics and storage footprint
This article was last updated on: July 24, 2024 am
preface
As Prometheus monitors more and more components, numbers, and metrics, Prometheus will have higher requirements for computing performance and more storage usage.
In this case, to optimize Prometheus performance, optimize the storage footprint. The first thing that comes to mind may be a variety of Prometheus-compatible storage solutions, such as Thanos or VM, Mimir, etc. However, in fact, although centralized storage, long-term storage, storage downsampling and storage compression can solve related problems to a certain extent, they do not cure the symptoms.
The real cost is that the amount of indicators (series) is too large.
The cure for the root cause should be to reduce the amount of indicators. There are 2 ways to do this:
Depending on the actual usage, only the metrics used by Grafana Dashboards and Alerts (prometheus rules) are kept.
This article focuses on the second approach: How do I streamline Prometheus’ metrics and storage footprint based on actual usage?
Ideas
Analyze all metric names currently stored in Prometheus;
All metric names used in the analysis and display session, i.e. all metrics used by Grafana’s Dashboards;
Analyze all metric names used in the alarm process, that is, all indicators used in the Prometheus Rule configuration;
(Optional) analyze all metric names used in the diagnostic environment, i.e. metrics that are frequently queried on the Prometheus UI;
Pass relabel at metric_relabel_configs or write_relabel_configs only keep 2-4 indicators, thereby drastically reducing the amount of indicators that Prometheus needs to store.
To implement this idea, you can use Grafana Labs mimirtool Come and get it.
Here’s a before and after comparison of how amazing this is:
Before streamlining: 270336 activity series
Streamlined: 61055 active series
Streamlining effect: Nearly 5x the streamlining rate!
Grafana Mimirtool
Grafana Mimir is a Prometheus long-term storage solution based on object storage, evolved from Cortex. The official claims to support hundreds of millions of series write storage and query.
Grafana Mimirtool is a utility released by Mimir that can be used alone.
Grafana Dashboards in Grafana instances (via Grafana API)
Prometheus alerting and recording rules in Mimir instances
Grafana Dashboards JSON file
Prometheus notes YAML files for alerting and recording rules
Grafana Mimirtool can then compare these extracted metrics with the active series in a Prometheus or Cloud Prometheus instance and output one used Indicators and unused A list of metrics.
Prometheus streamlines indicators in action
hypothesis
Assume:
Install Prometheus via kube-prometheus-stack
Grafana is installed and used as a showcase terminal
The corresponding alarm rule is configured
Other than that, there are no other indicators that need to be preserved
precondition
Grafana Mimirtool Find the version of the platform corresponding to MimirTool from Releases and download it;
Same drop, no keep Instead, use drop It’s okay too. I won’t repeat it here.
🎉🎉🎉
summary
In this article, the requirements for streamlining the Prometheus indicator are presented, and then how to use it is explained mimirtool analyze command to determine the metrics used in Grafana Dashboards and Prometheus Rules. Then with analyze prometheus Analyzed in display and alarmused and unused The activity series, last configured with Prometheus only keep Metrics used.
Combined with this actual battle, the streamlining rate can reach about 5 times, and the effect is still very obvious. Recommended to give it a try. 👍️👍️👍️