Three scenarios for monitoring Kubernetes cluster certificate expiration times

This article was last updated on: July 24, 2024 am

preface

Kubernetes uses a lot of certificates, such as CA certificates, as well as components such as kubelet, apiserver, proxy, etcd, and kubeconfig files.

If the certificate expires, you will not be able to log in to the Kubernetes cluster, or the entire cluster will be exceptional.

In order to solve the problem of certificate expiration, there are generally the following ways:

  1. Significantly extend the validity period of the certificate, from as short as 10 years to as long as 100 years;
  2. Certificates that are about to expire are automatically rotated, such as Rancher’s K3s, RKE2 uses this method;
  3. Increase the monitoring of certificate expiration to facilitate early detection of certificate expiration problems and manual intervention

This article mainly introduces the monitoring of Kubernetes cluster certificate expiration, and provides three monitoring solutions:

  1. use Blackbox Exporter Monitor Kubernetes apiserver certificate expiration through Probe;
  2. use kube-prometheus-stack Monitor the expiration time of relevant certificates through apiserver and kubelet components;
  3. use Enix’s x509-certificate-exporterMonitor all nodes in the cluster /etc/kubernetes/pki and /var/lib/kubelet and the kubeconfig file

Solution one: Blackbox Exporter monitors Kubernetes apiserver certificate expiration time

Blackbox Exporter is used to probe endpoints such as HTTPS, HTTP, TCP, DNS, ICMP and GRPC. After you define the endpoint, Blackbox Exporter generates metrics that can be visualized using tools such as Grafana. One of the most important features of Blackbox Exporter is measuring the availability of endpoints.

Of course, Blackbox Exporter can obtain the relevant information of the certificate after detecting HTTPS, which is to monitor the expiration time of the Kubernetes apiserver certificate in this way.

Configuration steps

  1. Adjust the configuration of Blackbox Exporter, add insecure_tls_verify: trueAs follows:
    调整 Blackbox Exporter 配置

  2. Restart Blackbox Exporter: kubectl rollout restart deploy ...

  3. Add access to the Kubernetes APIServer internal endpointhttps://kubernetes.default.svc.cluster.local/readyzof monitoring.

    1. If you are not using Prometheus Operator and are using the native Prometheus, you need to modify the configmap or secret of the Prometheus configuration file and add scrape config, as shown in the following example:

      Prometheus 增加 scrape config

    2. If you are using Prometheus Operator, you can add the following Probe CRD, and Prometheus Operator will automatically convert and merge it into Prometheus.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: kubernetes-apiserver
spec:
interval: 60s
module: http_2xx
prober:
path: /probe
url: monitor-prometheus-blackbox-exporter.default.svc.cluster.local:9115
targets:
staticConfig:
static:
- https://kubernetes.default.svc.cluster.local/readyz

Finally, you can add Prometheus Alarm Rule, here you can directly use Prometheus Operator to create a PrometheusRule CRD as an example, the example is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: prometheus-blackbox-exporter
spec:
groups:
- name: prometheus-blackbox-exporter
rules:
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 0m
labels:
severity: warning
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 14
for: 0m
labels:
severity: critical
- alert: BlackboxSslCertificateExpired
annotations:
description: |-
SSL certificate has expired already
VALUE = {{ $value }}
LABELS = {{ $labels }}
summary: SSL certificate expired (instance {{ $labels.instance }})
expr: probe_ssl_earliest_cert_expiry - time() <= 0
for: 0m
labels:
severity: emergency

effect

Probe 查询证书过期时间

Scenario two: kube-prometheus-stack monitors certificate expiration through apiserver and kubelet components

Here you can refer to my article:Prometheus Operator and kube-prometheus bis - How to monitor a 1.23+ kubeadm cluster, After the installation is complete, it works out of the box.

Out-of-the-box content includes:

  1. Crawl apiserver and kubelet metrics; (i.e. serviceMonitor)
  2. Configure alarms related to certificate expiration time; (i.e. PrometheusRule)

The indicators used here are:

  1. apiserver
    1. apiserver_client_certificate_expiration_seconds_count
    2. apiserver_client_certificate_expiration_seconds_bucket
  2. kubelet
    1. kubelet_certificate_manager_client_expiration_renew_errors
    2. kubelet_server_expiration_renew_errors
    3. kubelet_certificate_manager_client_ttl_seconds
    4. kubelet_certificate_manager_server_ttl_seconds

Monitor the effect

The corresponding Prometheus alarm rules are as follows:

证书过期时间相关 PrometheusRule

Solution three: Use enix’s x509-certificate-exporter

Means of surveillance

The exporter obtains certificate information by monitoring the certificate file under the specified directory or path of all nodes in the cluster and the kubeconfig file.

If you are using a Kubernetes cluster built with kubeadm, you can monitor the following files containing certificates and kubeconfig:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
watchFiles:
- /var/lib/kubelet/pki/kubelet-client-current.pem
- /etc/kubernetes/pki/apiserver.crt
- /etc/kubernetes/pki/apiserver-etcd-client.crt
- /etc/kubernetes/pki/apiserver-kubelet-client.crt
- /etc/kubernetes/pki/ca.crt
- /etc/kubernetes/pki/front-proxy-ca.crt
- /etc/kubernetes/pki/front-proxy-client.crt
- /etc/kubernetes/pki/etcd/ca.crt
- /etc/kubernetes/pki/etcd/healthcheck-client.crt
- /etc/kubernetes/pki/etcd/peer.crt
- /etc/kubernetes/pki/etcd/server.crt
watchKubeconfFiles:
- /etc/kubernetes/admin.conf
- /etc/kubernetes/controller-manager.conf
- /etc/kubernetes/scheduler.conf

Install the configuration

Edit values.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
kubeVersion: ''
extraLabels: {}
nameOverride: ''
fullnameOverride: ''
imagePullSecrets: []
image:
registry: docker.io
repository: enix/x509-certificate-exporter
tag:
pullPolicy: IfNotPresent
psp:
create: false
rbac:
create: true
secretsExporter:
serviceAccountName:
serviceAccountAnnotations: {}
clusterRoleAnnotations: {}
clusterRoleBindingAnnotations: {}
hostPathsExporter:
serviceAccountName:
serviceAccountAnnotations: {}
clusterRoleAnnotations: {}
clusterRoleBindingAnnotations: {}
podExtraLabels: {}
podAnnotations: {}
exposePerCertificateErrorMetrics: false
exposeRelativeMetrics: false
metricLabelsFilterList: null
secretsExporter:
enabled: true
debugMode: false
replicas: 1
restartPolicy: Always
strategy: {}
resources:
limits:
cpu: 200m
memory: 150Mi
requests:
cpu: 20m
memory: 20Mi
nodeSelector: {}
tolerations: []
affinity: {}
podExtraLabels: {}
podAnnotations: {}
podSecurityContext: {}
securityContext:
runAsUser: 65534
runAsGroup: 65534
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
secretTypes:
- type: kubernetes.io/tls
key: tls.crt
includeNamespaces: []
excludeNamespaces: []
includeLabels: []
excludeLabels: []
cache:
enabled: true
maxDuration: 300
hostPathsExporter:
debugMode: false
restartPolicy: Always
updateStrategy: {}
resources:
limits:
cpu: 100m
memory: 40Mi
requests:
cpu: 10m
memory: 20Mi
nodeSelector: {}
tolerations: []
affinity: {}
podExtraLabels: {}
podAnnotations: {}
podSecurityContext: {}
securityContext:
runAsUser: 0
runAsGroup: 0
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
watchDirectories: []
watchFiles: []
watchKubeconfFiles: []
daemonSets:
cp:
nodeSelector:
node-role.kubernetes.io/master: ''
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
watchFiles:
- /var/lib/kubelet/pki/kubelet-client-current.pem
- /etc/kubernetes/pki/apiserver.crt
- /etc/kubernetes/pki/apiserver-etcd-client.crt
- /etc/kubernetes/pki/apiserver-kubelet-client.crt
- /etc/kubernetes/pki/ca.crt
- /etc/kubernetes/pki/front-proxy-ca.crt
- /etc/kubernetes/pki/front-proxy-client.crt
- /etc/kubernetes/pki/etcd/ca.crt
- /etc/kubernetes/pki/etcd/healthcheck-client.crt
- /etc/kubernetes/pki/etcd/peer.crt
- /etc/kubernetes/pki/etcd/server.crt
watchKubeconfFiles:
- /etc/kubernetes/admin.conf
- /etc/kubernetes/controller-manager.conf
- /etc/kubernetes/scheduler.conf
nodes:
watchFiles:
- /var/lib/kubelet/pki/kubelet-client-current.pem
- /etc/kubernetes/pki/ca.crt
rbacProxy:
enabled: false
podListenPort: 9793
hostNetwork: false
service:
create: true
port: 9793
annotations: {}
extraLabels: {}
prometheusServiceMonitor:
create: true
scrapeInterval: 60s
scrapeTimeout: 30s
extraLabels: {}
relabelings: {}
prometheusPodMonitor:
create: false
prometheusRules:
create: true
alertOnReadErrors: true
readErrorsSeverity: warning
alertOnCertificateErrors: true
certificateErrorsSeverity: warning
certificateRenewalsSeverity: warning
certificateExpirationsSeverity: critical
warningDaysLeft: 30
criticalDaysLeft: 14
extraLabels: {}
alertExtraLabels: {}
rulePrefix: ''
disableBuiltinAlertGroup: false
extraAlertGroups: []
extraDeploy: []

Install via Helm Chart:

1
2
helm repo add enix https://charts.enix.io
helm install x509-certificate-exporter enix/x509-certificate-exporter

This Helm Chart is also automatically installed:

  • ServiceMonitor
  • PrometheusRule

Its monitoring metrics are:

  • x509_cert_not_after

Monitor the effect

The Exporter also offers a fancy Grafana Dashboard, as follows:

x509 Exporter Grafana Dashboard

Alert Rules are as follows:

x509 Exporter Prometheus Rule

summary

In order to monitor the certificate expiration time of Kubernetes clusters, we provide 3 solutions, each with its own advantages and disadvantages:

  1. use Blackbox Exporter Monitor Kubernetes apiserver certificate expiration through Probe;
    1. Advantages: Simple implementation;
    2. Disadvantages: Only HTTPS certificates can be monitored;
  2. use kube-prometheus-stack Monitor the expiration time of relevant certificates through apiserver and kubelet components;
    1. Advantages: Out of the box, there is no need to install additional exporters after installing kube-prometheus-stack
    2. Disadvantages: Only apiserver and kubelet certificates can be monitored;
  3. use Enix’s x509-certificate-exporterMonitor all nodes in the cluster /etc/kubernetes/pki and /var/lib/kubelet and the kubeconfig file
    1. Advantages: You can monitor all nodes, all kubeconfig files, and all secret certificates in tls format, if you want to monitor certificates outside of the Kubernetes cluster, you can also do the same; Wide and comprehensive range;
    2. Additional installation required: x509-certificate-exporter, corresponding to 1 Deployment and multiple DaemonSets, which consumes a lot of resources in the Kubernetes cluster.

You can choose flexibly according to your actual situation.

🎉🎉🎉

📚️ Reference documentation


Three scenarios for monitoring Kubernetes cluster certificate expiration times
https://e-whisper.com/posts/44110/
Author
east4ming
Posted on
August 25, 2022
Licensed under