This article was last updated on: July 24, 2024 am
background
Edge clusters (based on Raspberry Pi + K3S) need to implement basic alerting functions.
Edge cluster limits
CPU/memory/storage resources are tight and cannot support a complete Prometheus-based monitoring system solution that requires at least 2GB of memory and a large amount of storage (even if it is based on Prometheus Agent) (Need to avoid additional storage and computing resource consumption)
Network conditions cannot support the monitoring system, because the monitoring system generally needs to transmit data every 1min (or every moment), and the amount of data is not small;
There is a 5G charging network, and the destination address needs to be activated and charged according to traffic, and due to 5G network conditions, the network transmission capacity is limited and unstable (it may be offline for a period of time);
Critical needs
To summarize, the key requirements are as follows:
To implement timely alarms for edge cluster exceptions, you need to know the abnormal conditions that are occurring in the edge cluster.
Network: The network conditions are poor, the network traffic is small, only a small number of end addresses can be opened, and the network is unstable (offline for a period of time) can be tolerated;
Resources: You need to avoid additional storage and compute resource consumption as much as possible
scheme
In summary, the following scheme is adopted:
Alarm notifications based on Kubernetes Events
Architecture diagram
Technical scenario planning
Gather Events from various Kubernetes resources, such as:
pod
node
kubelet
crd
…
Pass kubernetes-event-exporter components to implement the collection of Kubernetes Events;
Filter only Warning
Level Events for alert notification (subsequently, conditions can be further defined)
Alerts are sent through communication tools such as Feishu webhooks (subsequently, the sending channels can be increased)
Implementation steps
Manual way:
On the edge cluster, perform the following operations:
1. Create roles
kubectl apply
The following yaml:
📝Notes :
Can be used cat << _EOF_ | kubectl apply -f -
to create quickly
1 2 3 4 apiVersion: v1 kind: Namespace metadata: name: monitoring
1 2 3 4 5 6 7 8 9 10 11 12 13 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: event-exporter-extra rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 apiVersion: v1 kind: ServiceAccount metadata: namespace: monitoring name: event-exporter ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: ServiceAccount namespace: monitoring name: event-exporter
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter-extra roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extra subjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter
2. Create kubernetes-event-exporter
config
kubectl apply
The following yaml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: IoT edge clusters are implemented based on alarm notifications from Kubernetes Events content: XXX IoT K3S 集群告警 template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}"
🐾 Note:
endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
Change to the corresponding webhook endpoint as needed, ❌ remember not to announce it!!!
content: XXX IoT K3S 集群告警
: Adjust the name as needed for quick identification, such as: “Test K3S cluster alarm at home”
3. Create a deployment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 cat << _EOF_ | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: event-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule _EOF_
📝 Illustrate:
event-exporter-cfg
The relevant configuration is used to load the configuration file saved in the form of ConfigMap;
localtime
zoneinfo
TZ
The relevant configuration is used to modify the time zone of the podAsia/Shanghai
to make the final displayed notification effect in the CST time zone;
affinity
tolerations
The relevant configuration is to ensure that: in any case, priority is scheduled to the master node and adjusted as needed, here because the master often exists as a gateway in the edge cluster, the configuration is high, and the online time is longer;
Automate deployment
Effect:Deploy automatically when K3S is installed
On the node where the K3S server is located,/var/lib/rancher/k3s/server/manifests/
Directory (created first if it doesn’t exist), Create event-exporter.yaml
1 2 3 4 apiVersion: v1 kind: Namespace metadata: name: monitoring
1 2 3 4 5 6 7 8 9 10 11 12 13 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: event-exporter-extra rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch
1 2 3 4 5 apiVersion: v1 kind: ServiceAccount metadata: namespace: monitoring name: event-exporter
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: ServiceAccount namespace: monitoring name: event-exporter
1 2 3 4 5 6 7 8 9 10 11 12 apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter-extra roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extra subjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec" headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: IoT edge clusters are implemented based on alarm notifications from Kubernetes Events content: xxxK3S集群告警 template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 apiVersion: apps/v1 kind: Deployment metadata: name: event-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule
After that, starting K3S will be deployed automatically.
📚️Reference:
Automatic deployment of manifests and Helm charts Rancher documentation
The final effect
As shown in the following figure:
📚️ Reference documentation