Grafana series (X): Why you should use Loki

This article was last updated on: February 7, 2024 pm



We all know why Loki is a great help with log management. But here are all the reasons why your company’s accounting and operations teams would love Loki too.

Why should I use Loki?

—— Reduce costs, streamline operations, and build better teams

Beyond the technical rationale, and its scalability, its organizational gains are often underestimated or overlooked.

I want to talk about what Loki does – or better yet, what it allows you to avoid. I learned all this after a lot of suffering. These things can make sense when you’re enriching people, teams, or projects instead of datasets.

This can be roughly divided into two camps:Cost and process, assuming that costs are monetary and processes are organized.

An introduction to Loki’s technical principles

The first is a brief introduction to how Loki works, which should help with other introductions. Loki is a cost-effective, scalable, unbiased log aggregator that is primarily based on the Prometheus label paradigm and stitched with Cortex internals for scaling.

Loki ingests your logs and makes them searchable. You know, those text files that contain amorphous manifestations of technical debt. Your app’s flimsy, tentative storyline. Something that the brevity of a measure can never express. Debug logs look useless under the sun and rainbow, but are valuable during failures.

Essentially, Loki made two choices, and everything else inherited that choice.

  1. It indexes only a portion of the metadata, not the entire log line.
  2. It decouples its storage layer into a pair of pluggable backends: one for indexing and one for compressing logs.

Why Loki only indexes metadata

Therefore, Loki only indexes metadata. How exactly does this make its operation more cost-effective, and how much?

For full-text indexing,The indexes themselves will eventually be larger than the data they index, which is common. Indexes are expensive to run because they require more expensive hardware (typically memory-intensive instances).

Loki does not index the contents of the log at all, but only the metadata from which it comes from (tags such asapp=apienvironment=prodmachine_id=instance-123-abc)。

As a result, Loki does not need to maintain expensive clusters of instances to provide large full-text indexes, and only needs to worry about a fraction of the data. As a rule of thumb, this is better than data4 orders of magnitude smaller(1 in 10,000).

Therefore, from the beginning, Loki minimized the part that is typically the most expensive to run an index-log aggregator.

Why Loki uses object storage as log storage

We just covered the indexing decision Loki made; Now let’s look at how decoupled storage can help reduce costs. After all, Loki also needs to store logs. It does this by sending them in compressed blocks to pluggable object storage like AWS S3.

Compared to the expensive memory starvation instances we talked about earlier, object storage is cabbage price cheap and very cost-effective. The logs are there until they are accessed. Essentially, tiny tag indexes are used to route requests to compressed logs in object storage, which are then decompressed and scanned in a highly parallel fashion on commodity hardware.

To help us transition to more process-oriented benefits, I’d like to point out that when logging is cheap, it removes the perverse incentive to reduce logging. Not recording those debug logs is an antipattern (because they are expensive to store and retrieve). When storage is cheap, we can avoid these tough decisions and ensure we have the resources we need to fight failures.

How Loki can reduce your operational headaches

Now that we’ve covered why our accountants love Loki, let’s take a look at the subtle reasons why our operations team loves Loki too.

Because Loki takes a non-indexed approach to logging, it avoids the reliance on structured logging to drive operational insights into log data. This means that there is no need to coordinate pattern definitions with preprocessing tools, or to fight monsters when multiple applications or teams try to change these patterns.

The issue of building temporary pipeline tools and backward compatible migrations doesn’t really apply. However, when avoiding preprocessing, it is necessary to mention trade-offs. When querying, we must understand how to meaningfully interact with the data.

But how good this distinction is! The technical debt of query time can be managed in any way and over a long period of time, or not at all (this is also what we use at query time).logfmtA major reason for readability/grepping).

On the other hand, pretreatment of uptake time requires enormous upfront effort, is extremely fragile to change, and leads to tissue friction.

The problem is always that there is a wide variety of use cases, formats, and expertise across internal groups. But one of these recording methods gives us flexibility around this issue, while the other does not.

Loki lacks a formal schema (202204 does), which is not to say it can’t be used for analysis. But it is tailored for developers and operators, and is more inclined to implement incident response than historical analysis. That said, the next version of Loki will bring powerful analytical capabilities for temporary metrics.

It’s not just grep either. Its LogQL query language is modeled after Prometheus’ PromQL, enabling rapid proof of hypotheses and seamless switching between logs and metrics. For example, quickly generate error rates from log entries, as simple as that.

从 log 生成错误率

As mentioned earlier, some of my favorite things about Loki is the things it allows us to avoid.

Remember our small indexes and schemaless data model? Loki allows us to avoid dealing with hot and cold indexing, lifecycle management, and one-time archive data retrieval to reactivate old data when audit issues arise. Just ship your old data to cheap object storage and don’t have to worry about managing continuous performance-focused indexing tiers on expensive hardware.

Loki automatically creates, rotates, and expires its own tiny indexes, ensures it doesn’t grow too large, and enables users to transparently query any data as long as you specify a retention time.

Loki also seamlessly handles upgrades to its internal storage versions. Want to take advantage of some new improvements? No problem. Loki maintains a reference for the boundaries between these, transparently splitting queries between them, and stitching them together. You don’t need to worry about unloading and reloading old schema versions for compatibility.

How Loki can improve your team

Next, I want to talk about dev and ops. Combining the two has become increasingly popular (and for good reason).

There’s a difference here, though – don’t confuse understanding how/where software is deployed with running an observability system. Let your application developers record what they want without worrying about which logging pattern they need to use to ensure they don’t break some of the preprocessing pipelines of their observation tools.

As mentioned earlier, we at Grafana Labs prefer logfmt because its simple output enables grep-friendly query time filtering/manipulation. The point is, some level of consistency is good, but not necessary. Let your developers and operations focus on the essence they need without worrying about the paradigm of your observable system.

Loki lacks user-defined schemas, and its non-indexed nature removes the cognitive burden on developers and operators, allowing them to refocus on the essence of their work and then turn to querying Loki when needed.

Keep your operations team informed about the secondary needs of running and scaling Loki, including configuring promtail (or whatever agents you use). I recommend using tags to attach an environment identifier to your logs, for exampleapplication=apienv=prodcluster=us-centralWait. Users can then mix and match label filters to quickly refine where problems occur, and take advantage of the massively parallel nature of Loki’s read paths to run arbitrary queries on potentially huge datasets at low cost.

And don’t worry – Loki is open source. It ensures that the barrier to entry to knowing Loki is relatively low. You don’t need to feel like you’re only hiring from other large organizations, and you don’t need to worry about new engineers not having experience with the tools of your choice.

Loki can run in single-binary mode on a stand-alone (like Prometheus) and then scale out as your use case grows due to scale, redundancy, or availability issues. We have a large number of users running Loki in clusters ranging from Raspberry Pi to large, horizontally scaled.

Loki doesn’t do everything, but we think it makes a good trade-off for its usage: a fast, affordable, highly scalable log aggregator with good integration with the Prometheus tagging model that effortlessly switches between metrics and logs.

Grafana series of articles

Grafana series of articles