WireGuard series (9): Build a unified K8S cluster across multiple clouds based on K3S+WireGuard+Kilo

This article was last updated on: February 7, 2024 pm

Synopsis of the series:

  1. WireGuard Part 1: What is a VPN?
  2. WireGuard Part 2: Introduction to WireGuard - Fast, Modern, Secure VPN Tunnels
  3. WireGuard Part 3: WireGuard Installation
  4. WireGuard article series (4): WireGuard is quick to get started
  5. WireGuard Part 5: Introduction to Netmaker - A Platform for Creating and Managing WireGuard Networks
  6. WireGuard article series (6): Netmaker installation
  7. WireGuard Part 7: Creating Full Mesh Networks with WireGuard and Netmaker
  8. WireGuard article series (eight): An introduction to the K8S CNI Kilo based on WireGuard

🛠️ Practical Session! Build a unified K8S cluster across multiple clouds based on K3S + WireGuard + Kilo. 💪💪💪

steps

1. Premise

1.1 Multiple cloud hosts across clouds

Prepare at least 2 cloud hosts with different public clouds (configured at least 1C1G to run), here are 6 with different hostname requirements, namely:

  1. Sky Wing Cloud:ty1(K3S Server)
  2. Alibaba Cloud:ali(K3S Agent)
  3. HUAWEI CLOUD:hw1(K3S Agent)
  4. Baidu Cloud:bd1 and bd2(K3S Agent)
  5. Tencent Cloud:tx1(K3S Agent)

1.2 Operating System

Operating system: Ubuntu 20.04 is recommended (as of 1/22/2022) because WireGuard can be easily installed.

1.3 WireGuard installed

And WireGuard is installed, see 👉️ the installation processOver here

1.4 Network

Agreement Port Source Description
TCP 6443 K3s agent node Kubernetes API Server
UDP 51820 K3s server and agent node Kilo inter-network communication
TCP 10250 K3s server and agent node Kubelet metrics
TCP 2379-2380 K3s server node

Typically, all outbound traffic is allowed.

1.5 Make sure that the cloud host has a public IP address

Each location has at least one node’s IP address that is reachable to other locations.

If the location is in a different cloud or private network, then this must be a public IP address. If the IP address is not automatically configured on the node’s Ethernet device, it can be used kilo.squat.ai/force-endpoint The comment mode is specified manually.

1.6 Cloud hosts enable IPv4 forwarding

at /etc/sysctl.conf There is a line like this:

net.ipv4.ip_forward=1

And take effect:systel -p

1.7 (Optional) Configure the image repository configuration

Refer here 👉️:Private image repository configuration reference
Mainly to speed up pulling images.

2. Install K3S Server

remark

K3S Server can be easily compared to the master of K8S.

2.1 One-click installation of K3S Server

Details omitted, directly through cnrancher’s installation script to install with one click:

1
curl -sfL http://rancher-mirror.cnrancher.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_CLUSTER_INIT=true INSTALL_K3S_EXEC="--tls-san {{ server_public_ip }} --node-external-ip {{ master_ip }}  --flannel-backend none --kube-proxy-arg metrics-bind-address=0.0.0.0 --kube-apiserver-arg feature-gates=EphemeralContainers=true" sh -s -

Brief description:

  1. K3S_CLUSTER_INIT=true: cluster mode, which installs the built-in etcd instead of sqlite3;
  2. --tls-san {{ server_public_ip }}: {{ server_public_ip }} Change to the k3s server public IP that you specify,--tls-san This option adds an additional hostname or IP as an alternate name to the TLS certificate, which you can specify multiple times if you want to access by IP and hostname.
  3. --node-external-ip {{ server_public_ip }}: Specify the public IP address of the node
  4. --flannel-backend none:K3S The default network plugin is flannel, which means that flannel is not used, and Kilo is installed separately.
  5. --kube-proxy-arg metrics-bind-address=0.0.0.0: kube-proxy parameter.
  6. (Optional):--kube-apiserver-arg feature-gates=EphemeralContainers=true Enable feature-gates:EphemeralContainers to facilitate sidecar attachment for debugging without restarting the pod.

mistake

  1. If not specified --tls-san , may cause kubectl Unable to pass server_public_ip to access the cluster.
  2. If not specified --node-external-ip , which may prevent k3s agents located in other clouds from connecting to the K3S Server’s APIs.
  3. If not specified --kube-proxy-arg metrics-bind-address=0.0.0.0, which may result in the inability to fetch metrics.

2.2 View installation results

The details are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
❯ systemctl status k3s.service
● k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-01-22 16:27:14 CST; 4h 5min ago
Docs: https://k3s.io
Process: 5757 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 5758 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 5759 (k3s-server)
Tasks: 49
Memory: 926.0M
CGroup: /system.slice/k3s.service
├─ 5759 /usr/local/bin/k3s server
├─ 5774 containerd
├─18561 /var/lib/rancher/k3s/data/2e877cf4762c3c7df37cc556de3e08890fbf450914bb3ec042ad4f36b5a2413a/bin/containerd-shim-runc-v2 -namespace k>
├─18579 /pause
└─18745 /opt/bin/kg --kubeconfig=/etc/kubernetes/kubeconfig --hostname=ty1.k3s

Jan 22 17:28:41 ty1.k3s k3s[5759]: I0122 17:28:41.435542 5759 reconciler.go:319] "Volume detached for volume...
...
1
2
3
4
5
6
7
8
9
10
11
12
13
❯ journalctl -f -b -u k3s.service
-- Logs begin at Fri 2021-03-26 09:47:06 CST, end at Sat 2022-01-22 20:35:12 CST. --
Jan 22 16:20:21 ty1.k3s systemd[1]: Starting Lightweight Kubernetes...
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Starting k3s v1.22.5+k3s1 (405bf79d)"
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, max>
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Configuring database table schema and indexes, this may take a momen>
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Database tables and indexes are up to date"
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Kine available at unix://kine.sock"
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Reconciling bootstrap data between datastore and disk"
Jan 22 16:20:22 ty1.k3s k3s[1660]: time="2022-01-22T16:20:22+08:00" level=info msg="Running kube-apiserver --advertise-address=x.x.x.x --adverti>
Jan 22 16:20:22 ty1.k3s k3s[1660]: Flag --insecure-port has been deprecated, This flag has no effect now and will be removed in v1.24.
Jan 22 16:20:22 ty1.k3s k3s[1660]: I0122 16:20:22.307328 1660 server.go:581] external host was not specified, using x.x.x.x
Jan 22 16:20:22 ty1.k3s k3s[1660]: I0122 16:20:22.308688 1660 server.go:175] Version: v1.22.5+k3s1
1
2
3
4
❯ k3s kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
1
2
3
❯ k3s kubectl get node
NAME STATUS ROLES AGE VERSION
ty1.k3s NotReady control-plane,master 4h10m v1.22.5+k3s1

⚠️ note

The status of the K3S Server at this point is NotReady, which is normal because the CNI network plug-in has not been installed.

3. Install Kilo

3.1 Specify the K3S Server topology

My K3S Server is on Tianyi Cloud, specify the topology:

1
2
3
k3s kubectl annotate node ty1.k3s kilo.squat.ai/location="ctyun"
k3s kubectl annotate node ty1.k3s kilo.squat.ai/force-endpoint=x.x.x.x:51820
k3s kubectl annotate node ty1.k3s kilo.squat.ai/persistent-keepalive=20

Explained here 👉️:Kilo Annotions - location

3.2 Install Kilo❗️

Install Kilo by deploying a DaemonSet in the cluster.

1
2
kubectl apply -f https://gitee.com/mirrors/squat/raw/main/manifests/crds.yaml
kubectl apply -f https://gitee.com/mirrors/squat/raw/main/manifests/kilo-k3s.yaml

ℹ️ remark

The address above is Kilo’s Gitee mirror repository.

Detailed description:

  1. crds.yaml Installed peer.kilo.squat.aito configure the WireGuard peer through the CRD.
  2. kilo-k3s.yaml Kilo CNI installed, including:
    1. ConfigMap
      1. cni-conf.json
      2. kilo-scripts
    2. ServiceAccount
    3. ClusterRole
    4. ClusterRoleBinding
    5. DaemonSet: Runs on all nodes
  3. Among them, the configuration of WireGuard:
    1. The ini-formatted WireGuard configuration is located at:/var/lib/kilo/conf
    2. The key for WireGuard is located at:/var/lib/kilo/key

3.3 Verification

  1. kube-system ns, created kilo DaemonSet, and all pods are out of the Running state
  2. K3S Server Node status from NotReady Into Ready
  3. On K3S Server Node, there are already kilo-related annotations, such as the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: v1
kind: Node
metadata:
name: ty1.k3s
labels:
beta.kubernetes.io/arch: amd64
...
annotations:
k3s.io/external-ip: x.x.x.x
k3s.io/hostname: ty1.k3s
k3s.io/internal-ip: 192.168.1.226
k3s.io/node-args: >-
["server","--tls-san","x.x.x.x","--node-external-ip","x.x.x.x","--flannel-backend","none","--kube-proxy-arg","metrics-bind-address=0.0.0.0","--kube-apiserver-arg","feature-gates=EphemeralContainers=true"]
kilo.squat.ai/endpoint: x.x.x.x:51820
kilo.squat.ai/force-endpoint: x.x.x.x:51820
kilo.squat.ai/granularity: location
kilo.squat.ai/internal-ip: 192.168.1.226/24
kilo.squat.ai/key: zCiXXXXXXXXXXXXXXXXXXXXXXXXXXXXQTL9CEc=
kilo.squat.ai/last-seen: '1642856638'
kilo.squat.ai/location: ctyun
kilo.squat.ai/persistent-keepalive: '20'
kilo.squat.ai/wireguard-ip: 10.4.0.3/16
...

4. Install the K3S Agent

4.1 One-click installation of K3S Agent

1
curl -sfL http://rancher-mirror.cnrancher.com/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_TOKEN={{ token }} K3S_URL=https://{{ server_public_ip }}:6443 sh -s - --node-external-ip {{ node_public_ip }} --kube-proxy-arg "metrics-bind-address=0.0.0.0"

4.2 Wait for the K3S Agent to join the cluster

Wait for the K3S Agent to join the cluster, status NotReady Never mind:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
❯ systemctl status k3s-agent.service
● k3s-agent.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-01-22 16:27:35 CST; 4h 44min ago
Docs: https://k3s.io
Process: 4079 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 4080 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 4081 (k3s-agent)
Tasks: 63
Memory: 126.9M
CGroup: /system.slice/k3s-agent.service
├─4081 /usr/local/bin/k3s agent
├─4106 containerd
├─5285 /var/lib/rancher/k3s/data/
...

-- Logs begin at Sat 2021-11-06 14:00:29 CST, end at Sat 2022-01-22 21:10:33 CST. --
Jan 22 16:27:35 ali1.k3s systemd[1]: Starting Lightweight Kubernetes...
Jan 22 16:27:35 ali1.k3s systemd[1]: Started Lightweight Kubernetes.
Jan 22 16:27:35 ali1.k3s k3s[4081]: time="2022-01-22T16:27:35+08:00" level=info msg="Starting k3s agent v1.22.5+k3s1 (405bf79d)"
Jan 22 16:27:35 ali1.k3s k3s[4081]: time="2022-01-22T16:27:35+08:00" level=info msg="Running load balancer 127.0.0.1:6444 -> [192.168.1.226:6443 140.246.255.203:6443]"
Jan 22 16:27:55 ali1.k3s k3s[4081]: time="2022-01-22T16:27:55+08:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Module overlay was already loaded"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Module nf_conntrack was already loaded"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Module br_netfilter was already loaded"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Module iptable_nat was already loaded"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Using private registry config file at /etc/rancher/k3s/registries.yaml"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Jan 22 16:28:01 ali1.k3s k3s[4081]: time="2022-01-22T16:28:01+08:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Jan 22 16:28:02 ali1.k3s k3s[4081]: time="2022-01-22T16:28:02+08:00" level=info msg="Containerd is now running"
Jan 22 16:28:02 ali1.k3s k3s[4081]: time="2022-01-22T16:28:02+08:00" level=info msg="Updating load balancer server addresses -> [140.246.255.203:6443]"
Jan 22 16:28:02 ali1.k3s k3s[4081]: time="2022-01-22T16:28:02+08:00" level=info msg="Connecting to proxy" url="wss://140.246.255.203:6443/v1-k3s/connect"
Jan 22 16:28:02 ali1.k3s k3s[4081]: time="2022-01-22T16:28:02+08:00" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/var/lib/rancher/k3s/agent/cli>
Jan 22 16:28:02 ali1.k3s k3s[4081]: Flag --cloud-provider has been deprecated, will be removed in 1.23, in favor of removing cloud provider code from Kubelet.
Jan 22 16:28:02 ali1.k3s k3s[4081]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.742554 4081 server.go:436] "Kubelet version" kubeletVersion="v1.22.5+k3s1"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.798379 4081 dynamic_cafile_content.go:155] "Starting controller" name="client-ca-bundle::/var/lib/rancher/k3s/agent/client-ca.crt"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.891854 4081 server.go:687] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.892197 4081 container_manager_linux.go:280] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.892310 4081 container_manager_linux.go:285] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot>
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.892345 4081 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.892357 4081 container_manager_linux.go:320] "Creating device plugin manager" devicePluginEnabled=true
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.892403 4081 state_mem.go:36] "Initialized new in-memory state store"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.893507 4081 kubelet.go:418] "Attempting to sync node with API server"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.893533 4081 kubelet.go:279] "Adding static pod path" path="/var/lib/rancher/k3s/agent/pod-manifests"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.893577 4081 kubelet.go:290] "Adding apiserver pod source"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.893601 4081 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.897574 4081 kuberuntime_manager.go:245] "Container runtime initialized" containerRuntime="containerd" version="v1.5.8-k3s1" apiVersion="v1alpha2"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.898404 4081 server.go:1213] "Started kubelet"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.901202 4081 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.903670 4081 server.go:149] "Starting to listen" address="0.0.0.0" port=10250
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.904727 4081 server.go:409] "Adding debug handlers to kubelet server"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.909003 4081 volume_manager.go:291] "Starting Kubelet Volume Manager"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.909628 4081 desired_state_of_world_populator.go:146] "Desired state populator starts to run"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.949667 4081 kubelet_network_linux.go:56] "Initialized protocol iptables rules." protocol=IPv4
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.971847 4081 kubelet_network_linux.go:56] "Initialized protocol iptables rules." protocol=IPv6
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.972256 4081 status_manager.go:158] "Starting to sync pod status with apiserver"
Jan 22 16:28:02 ali1.k3s k3s[4081]: I0122 16:28:02.972423 4081 kubelet.go:1967] "Starting kubelet main sync loop"
...
Jan 22 16:28:03 ali1.k3s k3s[4081]: I0122 16:28:03.331465 4081 kubelet_node_status.go:74] "Successfully registered node" node="ali1.k3s"
1
2
3
4
5
6
7
8
❯ k3s kubectl get node
NAME STATUS ROLES AGE VERSION
ali1.k3s NotReady worker 4h41m v1.22.5+k3s1
bd2.k3s NotReady worker 4h40m v1.22.5+k3s1
ty1.k3s Ready control-plane,master 4h42m v1.22.5+k3s1
tx1.k3s NotReady worker 4h41m v1.22.5+k3s1
hw1.k3s NotReady worker 4h41m v1.22.5+k3s1
bd1.k3s NotReady worker 4h40m v1.22.5+k3s1

4.3 Specify the K3S Agent topology

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# ali1
k3s kubectl annotate node ali1.k3s kilo.squat.ai/location="aliyun"
k3s kubectl annotate node ali1.k3s kilo.squat.ai/force-endpoint={{ ali1_public_ip }}:51820
k3s kubectl annotate node ali1.k3s kilo.squat.ai/persistent-keepalive=20

# hw1
k3s kubectl annotate node hw1.k3s kilo.squat.ai/location="huaweicloud"
k3s kubectl annotate node hw1.k3s kilo.squat.ai/force-endpoint={{ hw1_public_ip }}:51820
k3s kubectl annotate node hw1.k3s kilo.squat.ai/persistent-keepalive=20

# bd1
k3s kubectl annotate node bd1.k3s kilo.squat.ai/location="baidu"
k3s kubectl annotate node bd1.k3s kilo.squat.ai/force-endpoint={{ bd1_public_ip }}:51820
k3s kubectl annotate node bd1.k3s kilo.squat.ai/persistent-keepalive=20

# bd2
k3s kubectl annotate node bd2.k3s kilo.squat.ai/location="baidu"
k3s kubectl annotate node bd2.k3s kilo.squat.ai/force-endpoint={{ bd2_public_ip }}:51820
k3s kubectl annotate node bd2.k3s kilo.squat.ai/persistent-keepalive=20

# tx1
k3s kubectl annotate node tx1.k3s kilo.squat.ai/location="tencentcloud"
k3s kubectl annotate node tx1.k3s kilo.squat.ai/force-endpoint={{ tx2_public_ip }}:51820
k3s kubectl annotate node tx1.k3s kilo.squat.ai/persistent-keepalive=20

Wait for all node status Ready.

5. Verification

5.1 Verify network connectivity

Deploy 1 busybox’s DaemonSet so that it lands on each node. Verify network flow direction:

Enter one of the pods, ping the other pods, you can ping each other, as an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/ # ping 10.42.2.7
PING 10.42.2.7 (10.42.2.7): 56 data bytes
64 bytes from 10.42.2.7: seq=0 ttl=62 time=6.604 ms
64 bytes from 10.42.2.7: seq=1 ttl=62 time=6.520 ms
64 bytes from 10.42.2.7: seq=2 ttl=62 time=6.412 ms
64 bytes from 10.42.2.7: seq=3 ttl=62 time=6.430 ms
64 bytes from 10.42.2.7: seq=4 ttl=62 time=6.487 ms
^C
--- 10.42.2.7 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 6.412/6.490/6.604 ms
/ # ping 10.42.1.3
PING 10.42.1.3 (10.42.1.3): 56 data bytes
64 bytes from 10.42.1.3: seq=0 ttl=62 time=7.426 ms
64 bytes from 10.42.1.3: seq=1 ttl=62 time=7.123 ms
64 bytes from 10.42.1.3: seq=2 ttl=62 time=7.109 ms
64 bytes from 10.42.1.3: seq=3 ttl=62 time=7.129 ms
^C
--- 10.42.1.3 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 7.109/7.196/7.426 m

Pinging the WireGuard address of other nodes, or pinging, is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/ # ping 10.4.0.1
PING 10.4.0.1 (10.4.0.1): 56 data bytes
64 bytes from 10.4.0.1: seq=0 ttl=64 time=0.077 ms
64 bytes from 10.4.0.1: seq=1 ttl=64 time=0.099 ms
^C
--- 10.4.0.1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.077/0.088/0.099 ms
/ # ping 10.4.0.2
PING 10.4.0.2 (10.4.0.2): 56 data bytes
64 bytes from 10.4.0.2: seq=0 ttl=63 time=29.000 ms
64 bytes from 10.4.0.2: seq=1 ttl=63 time=28.939 ms
^C
--- 10.4.0.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 28.939/28.969/29.000 ms

Ping the default private network address of other nodes, you can also ping, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/ # ping 172.17.0.3
PING 172.17.0.3 (172.17.0.3): 56 data bytes
64 bytes from 172.17.0.3: seq=0 ttl=63 time=6.327 ms
64 bytes from 172.17.0.3: seq=1 ttl=63 time=6.350 ms
^C
--- 172.17.0.3 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 6.327/6.338/6.350 ms
/ # ping 192.168.64.4
PING 192.168.64.4 (192.168.64.4): 56 data bytes
64 bytes from 192.168.64.4: seq=0 ttl=63 time=29.261 ms
64 bytes from 192.168.64.4: seq=1 ttl=63 time=29.015 ms
^C
--- 192.168.64.4 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 29.015/29.138/29.261 ms

Proof network (pod-pod, pod-node) all-pass.

5.2 i️ Detailed Description

5.2.1 Network Card

Kilo has 3 network cards, namely:

  1. kilo0: WireGuard VPN network, used to form VPN intranets between nodes; (bd2 There is no NIC because and bd1 It belongs to the same VPC intranet, and the connection in the intranet is not encrypted by WireGuard by default, and only the outbound VPC intranet will be encrypted by WireGuard)
  2. kube-bridge: Bridging the network, so that the network card of the pod is connected with the network card of the cloud host, so as to realize the shared communication of the pod through the WireGuard VPN network;
    1. tunl0: In Bridge mode, multi-host network communication requires additional configuration of host routing, or the use of overlay networking. Automatic configuration via Kilo. For example, the network structure in the case of overlay is:
      典型 Bridge Overlay 网络结构

thereintokube-bridge The configuration is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"cniVersion": "0.3.1",
"name": "kilo",
"plugins": [
{
"name": "kubernetes",
"type": "bridge",
"bridge": "kube-bridge",
"isDefaultGateway": true,
"forceAddress": true,
"mtu": 1420,
"ipam": {
"type": "host-local"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {
"portMappings": true
}
}
]
}
5.2.2 CIDR

As follows:

Cloud Host Bring your own intranet IP WireGuard VPN IP Pod CIDR
ty1 192.168.1.226 10.4.0.3/16 10.42.0.0/24
ali1 172.21.143.136 10.4.0.1/16 10.42.3.0/24
hw1 192.168.7.226 10.4.0.4/16 10.42.1.0/24
bd1 192.168.64.4 10.4.0.2/16 10.42.4.0/24
bd2 192.168.64.5 None 10.42.5.0/24
tx1 172.17.0.3 10.4.0.5/16 10.42.2.0/24

ℹ️ remark

The private IP addresses of ECMs are automatically generated by the public cloud without any special settings.

5.2.3 Routing Tables

Looking at the routing table, take ty1 as an example, as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
❯ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.1 0.0.0.0 UG 100 0 0 eth0
10.4.0.0 0.0.0.0 255.255.0.0 U 0 0 0 kilo0
10.42.0.0 0.0.0.0 255.255.255.0 U 0 0 0 kube-bridge
10.42.1.0 10.4.0.4 255.255.255.0 UG 0 0 0 kilo0
10.42.2.0 10.4.0.5 255.255.255.0 UG 0 0 0 kilo0
10.42.3.0 10.4.0.1 255.255.255.0 UG 0 0 0 kilo0
10.42.4.0 10.4.0.2 255.255.255.0 UG 0 0 0 kilo0
10.42.5.0 10.4.0.2 255.255.255.0 UG 0 0 0 kilo0
172.17.0.3 10.4.0.5 255.255.255.255 UGH 0 0 0 kilo0
172.21.143.136 10.4.0.1 255.255.255.255 UGH 0 0 0 kilo0
192.168.7.226 10.4.0.4 255.255.255.255 UGH 0 0 0 kilo0
192.168.64.4 10.4.0.2 255.255.255.255 UGH 0 0 0 kilo0
192.168.64.5 10.4.0.2 255.255.255.255 UGH 0 0 0 kilo0

The BD1 is as follows:

1
2
3
4
5
6
❯ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
...
10.42.5.0 192.168.64.5 255.255.255.0 UG 0 0 0 tunl0
...

The bd2 is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
❯ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.64.1 0.0.0.0 UG 100 0 0 eth0
10.4.0.1 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
10.4.0.2 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
10.4.0.3 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
10.4.0.4 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
10.4.0.5 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
10.42.0.0 192.168.64.4 255.255.255.0 UG 0 0 0 tunl0
10.42.1.0 192.168.64.4 255.255.255.0 UG 0 0 0 tunl0
10.42.2.0 192.168.64.4 255.255.255.0 UG 0 0 0 tunl0
10.42.3.0 192.168.64.4 255.255.255.0 UG 0 0 0 tunl0
10.42.4.0 192.168.64.4 255.255.255.0 UG 0 0 0 tunl0
10.42.5.0 0.0.0.0 255.255.255.0 U 0 0 0 kube-bridge
169.254.169.254 192.168.64.2 255.255.255.255 UGH 100 0 0 eth0
172.17.0.3 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
172.21.143.136 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
192.168.1.226 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
192.168.7.226 192.168.64.4 255.255.255.255 UGH 0 0 0 tunl0
192.168.64.0 0.0.0.0 255.255.240.0 U 0 0 0 eth0
5.2.4 Summary
  1. This visit 10.4.0.0/16(i.e. the WireGuard segment), all go kilo0 network card (that is, WireGuard network card);
    1. Cloud hosts access each other through WireGuard VPN IP;
  2. Pods on ty1 visit each other (that is, access 10.42.0.0/24 network segment), go kube-bridge bridge,
  3. ty1 accesses other Node’s pods, through kilo0:
    1. visit 10.42.1.0/24(pod on hw1), SNAT goes hw1’s WireGuard VPN IP: 10.4.0.4
    2. visit 10.42.2.0/24(pod on tx1), SNAT goes tx1’s WireGuard VPN IP: 10.4.0.5
    3. visit 10.42.3.0/24(pod on ali1), SNAT takes ali1’s WireGuard VPN IP: 10.4.0.1
    4. visit 10.42.4.0/24(pod on bd1) and 10.42.5.0/24(pod on bd2), SNAT all follow bd1’s WireGuard VPN IP: 10.4.0.2 (because bd1 and bd2 are on the same VPC intranet, and only bd1 has a WireGuard VPN IP, not on bd2)
  4. ty1 accesses other nodes through the private IP address that comes with the cloud host kilo0 SNAT is followed by the corresponding WireGuard VPN IP
  5. BD1 access 10.42.5.0/24(pod on bd2), pass tunl0.
  6. bd2, just kube-bridge and tunl0
    1. The same machine pod visits each other,kube-bridge
    2. Access other machine pods, go tunl0 The VPC private IP address (192.168.64.4) is forwarded to bd and forwarded by bd1.

🗒️ Refine

  1. In the same node, take the bridge;
  2. Same location, go to the private IP (VPC network) that comes with it;
  3. Different locations, go WireGuard VPN;

kgctl

Kilo provides a command-line tool for inspecting and interacting with clusters:kgctl

This tool can be used to understand the topology of a mesh, get the WireGuard configuration of peers, or plot a cluster diagram.

kgctlNeed to go through the setupKUBECONFIGEnvironment variables or provided--kubeconfigflags to provide a Kubernetes configuration file.

Installation

kgctlThe binaries are automatically compiled for Linux, macOS, and Windows, and every version of Kilo can be downloaded from GitHub release page Download.

command

Command Syntax Description
graph kgctl graph [flags] Generate a graph representing the cluster topology in GraphViz format.
showconf kgctl showconf (node | peer) NAME [flags] Displays the WireGuard configuration of a node or peer in the mesh.

graph

⚠️ note

Installation required circo, otherwise the topology map cannot be generated.

Installation command:sudo apt install circos graphviz -y

graphcommand to generate a graph representing a Kilo grid in GraphViz format. The diagram is useful for understanding or debugging the network topology.

Example:

1
kgctl graph

This will produce some DOT graph description language output, such as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
digraph kilo {
label="10.2.4.0/24";
labelloc=t;
outputorder=nodesfirst;
overlap=false;
"ip-10-0-6-7"->"ip-10-0-6-146"[dir=both];
"ip-10-1-13-74"->"ip-10-1-20-76"[dir=both];
"ip-10-0-6-7"->"ip-10-1-13-74"[dir=both];
"ip-10-0-6-7"->"squat"[dir=both, style=dashed];
"ip-10-1-13-74"->"squat"[dir=both, style=dashed];

# ...

}
;

To render a graph, use one of the GraphViz layout tools, for examplecirco:

1
kgctl graph | circo -Tsvg > cluster.svg

showconf

showconfThe command outputs the WireGuard configuration of a node or peer in the cluster, that is, the configuration that the node or peer needs to set on its local WireGuard interface in order to participate in the mesh. Example:

1
2
NODE=master # the name of a node
kgctl showconf node $NODE

This will produce some output in INI format, for example.

1
2
3
4
5
6
7
8
[Interface]
ListenPort = 51820

[Peer]
AllowedIPs = 10.2.0.0/24, 10.1.13.74/32, 10.2.4.0/24, 10.1.20.76/32, 10.4.0.2/32
Endpoint = 3.120.246.76:51820
PersistentKeepalive = 0
PublicKey = IgDTEvasUvxisSAmfBKh8ngFmc2leZBvkRwYBhkybUg=

summary

WireGuard configuration

The WireGuard configuration generated by Kilo autosense is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
❯ kgctl showconf node ty1.k3s
[Interface]
ListenPort = 51820

[Peer]
AllowedIPs = 10.42.3.0/24, 172.21.143.136/32, 10.4.0.1/32
Endpoint = [{{ ali1_public_ip }}]:51820
PersistentKeepalive = 20
PublicKey = tscPxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=

[Peer]
AllowedIPs = 10.42.4.0/24, 192.168.64.4/32, 10.42.5.0/24, 192.168.64.5/32, 10.4.0.2/32
Endpoint = [{{ bd1_public_ip }}]:51820
PersistentKeepalive = 20
PublicKey = 29khxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxrz8=

[Peer]
AllowedIPs = 10.42.1.0/24, 192.168.7.226/32, 10.4.0.4/32
Endpoint = [{{ hw1_public_ip }}]:51820
PersistentKeepalive = 20
PublicKey = B9JZe6X8+xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=

[Peer]
AllowedIPs = 10.42.2.0/24, 172.17.0.3/32, 10.4.0.5/32
Endpoint = [{{ tx1_public_ip }}]:51820
PersistentKeepalive = 20
PublicKey = mn1rUiD+Zb3/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxU=

Network topology

As shown in the following figure:

基于 K3S WireGuard Kilo 的多云统一 K8S 集群网络拓扑

Network flow

  1. In the same node, take the bridge;
  2. Same location, go to the private IP (VPC network) that comes with it;
  3. Different locations, go WireGuard VPN;

🎉🎉🎉