Thanos working principle and components introduction
This article was last updated on: July 24, 2024 am
Introduction to Thanos
Thanos is an “open source, highly available Prometheus system with long-term storage capabilities.” Many well-known companies use Thanos as part of the CNCF incubation program.
A key feature of Thanos is that it allows “unlimited” storage through the use of object storage such as S3. Object storage can be either object storage offered by each cloud provider or solutions like CEPH, Rook, or Minio.
How it works
Thanos and Prometheus work side by side, and it’s common to upgrade to Thanos starting with Prometheus.
Thanos is divided into components, each with a single target (typical cloud-native architecture), and components communicate with each other via gRPC.
Thanos Sidecar
Thanos runs with Prometheus (with a sidecar) and outputs Prometheus metrics to an object repository every 2 hours. This makes Prometheus virtually stateless. Prometheus still has 2 hours of metrics in memory, so you may still lose 2 hours of metrics in the event of downtime (this should be handled by your Prometheus setup, using HA/sharding, not Thanos).
📓 Reference documentation:
Thanos sidecar, along with the Prometheus Operator and Kube Prometheus stack, can be easily deployed. This component acts as a store for Thanos queries.
Thanos Store
Thanos storage acts as a gateway to transform queries into remote object storage. It can also cache some information on local storage. Basically, this component allows you to query the object store for metrics. This component acts as a store for Thanos queries.
Thanos Compactor
Thanos Compactor is a monolithic (it is not extensible) that is responsible for compressing and reducing metrics stored in object storage. Downsampling (data aging) is the loosening of the granularity of an indicator over time. For example, you might want to keep your metrics for 2 or 3 years, but you don’t need as many data points as yesterday’s metrics. This is where a compressor comes in, which saves bytes on object storage and thus costs.
Thanos Query
Thanos Query is the main component of Thanos, and it is the central point to which PromQL queries are sent. The Thanos query exposes a Prometheus-compatible endpoint. It then assigns the query to all “stores”. Keep in mind that the Store might be any other Thanos component that provides metrics. Thanos queries can send queries to another Thanos query (they can be stacked).
- Thanos Store
- Thanos Sidecar
- Thanos Query
It is also responsible for de-duplication of the same metrics from different stores or Prometheus. For example, if you have a metric in Prometheus that is also in object storage, Thanos Query can deduplicate that metric value. In the case of Prometheus HA setups, deduplication is also based on Prometheus replicas and shards.
Thanos Query Frontend
As its name suggests, the Thanos query front end is the front end of Thanos queries, and its goal is to split a large query into multiple smaller queries and cache the query results (in memory or memcached).
There are other components, such as Thanos Receiver and Thanos Ruler in the case of remote writes.
Thanos deployment architecture
Sidecar mode deployment:
Receiver mode deployment: