K8S Performance Optimization - OS sysctl tuning

This article was last updated on: February 7, 2024 pm

preface

This is the first article in the K8S performance optimization series: Best practices for OS sysctl performance optimization parameters.

List of parameters

Sysctl tuning parameters at a glance

# Kubernetes Settings
vm.max_map_count = 262144
kernel.softlockup_panic = 1
kernel.softlockup_all_cpu_backtrace = 1
net.ipv4.ip_local_reserved_ports = 30000-32767

# Increase the number of connections
net.core.somaxconn = 32768

# Maximum Socket Receive Buffer
net.core.rmem_max = 16777216

# Maximum Socket Send Buffer
net.core.wmem_max = 16777216

# Increase the maximum total buffer-space allocatable
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216

# Increase the number of outstanding syn requests allowed
net.ipv4.tcp_max_syn_backlog = 8096


# For persistent HTTP connections
net.ipv4.tcp_slow_start_after_idle = 0

# Allow to reuse TIME_WAIT sockets for new connections
# when it is safe from protocol viewpoint
net.ipv4.tcp_tw_reuse = 1

# Max number of packets that can be queued on interface input
# If kernel is receiving packets faster than can be processed
# this queue increases
net.core.netdev_max_backlog = 16384

# Increase size of file handles and inode cache
fs.file-max = 2097152

# Max number of inotify instances and watches for a user
# Since dockerd runs as a single user, the default instances value of 128 per user is too low
# e.g. uses of inotify: nginx ingress controller, kubectl logs -f
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

# Additional sysctl flags that kubelet expects
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1

# Prevent docker from changing iptables: https://github.com/kubernetes/kubernetes/issues/40182
net.ipv4.ip_forward=1

In the case of AWS, the additional additions are as follows:

# AWS settings
# Issue #23395
net.ipv4.neigh.default.gc_thresh1=0

If IPv6 is enabled, add the following:

# Enable IPv6 forwarding for network plugins that don't do it themselves
net.ipv6.conf.all.forwarding=1

Parameter interpretation

Category	Kernel parameters	Description	Reference link
Kubernetes	`vm.max_map_count = 262144`	Limit the number of VMAs (virtual memory areas) a process can have, A larger value is useful for Elasticsearch, Mongo, or other mmap users ES Configuration
Kubernetes	`kernel.softlockup_panic = 1`	Used to resolve K8S kernel softlock related bugs	root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com)
Kubernetes	`kernel.softlockup_all_cpu_backtrace = 1`	Used to resolve K8S kernel softlock related bugs	root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com)
Kubernetes	`net.ipv4.ip_local_reserved_ports = 30000-32767`	Default K8S Nodport	service-node-port-range and ip_local_port_range collision · Issue #6342 · kubernetes/kops (github.com)
Network	`net.core.somaxconn = 32768`	Represents the backlog upper limit of the socket listener. What is a backlog? The backlog is the socket’s listening queue that goes into the backlog when a request has not yet been processed or created. Increase the number of connections.	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.core.rmem_max = 16777216`	The maximum value of the receive socket buffer size, in bytes. Maximize Socket Receive Buffer	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.core.wmem_max = 16777216`	The maximum value of the send socket buffer size, in bytes. Maximize the Socket Send Buffer	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216	Increase the maximum value of the total allocable buffer space	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.ipv4.tcp_max_syn_backlog = 8096`	Indicates the length of the queue of connections (SYN messages) that have not yet received a client acknowledgment, which defaults to 1024 Increase the number of outstanding SYN requests	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.ipv4.tcp_slow_start_after_idle = 0`	Persist HTTP connections	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.ipv4.tcp_tw_reuse = 1`	Indicates that a socket that allows reuse of the TIME_WAIT state is used for new TCP connections, and defaults to 0, which means closed. Allows reuse of TIME_WAIT sockets for new connections when the protocol is secure Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.core.netdev_max_backlog = 16384`	When the NIC receives packets faster than the kernel can process, there is a queue to hold those packets. This parameter represents the maximum value of the queue If the kernel receives packets faster than it can handle, this queue is incremented Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
File system	`fs.file-max = 2097152`	This parameter determines the maximum number of file handles allowed on the system, and the file handle setting represents the number of files that can be opened on the Linux system. Increase the file handle and inode cache size Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
File system	fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288	The maximum number of inotify instances and watches for a user Since dockerd runs as a single user, the default instance value of 128 per user is too low For example, use inotify: nginx ingress controller, kubectl logs -f	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet	`vm.overcommit_memory = 1`	A strategy for memory allocation =1, indicating that the kernel allows all physical memory to be allocated regardless of the current memory state	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet	`kernel.panic = 10`	Automatic restart in panic error, wait time is 10 seconds	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet	`kernel.panic_on_oops = 1`	The panic() operation is performed when Oops occurs Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network	`net.ipv4.ip_forward=1`	Enable IP forwarding It also prevents docker from changing iptables	Upgrading docker 1.13 on nodes causes outbound container traffic to stop working · Issue #40182 · kubernetes/kubernetes (github.com)
Network	`net.ipv4.neigh.default.gc_thresh1=0`	Fix AWS `arp_cache: neighbor table overflow!` Error	arp_cache: neighbor table overflow! · Issue #4533 · kubernetes/kops (github.com)

EOF

CloudNative

#K8S #BestPractices #Linux #Production #PerformanceTuning

K8S Performance Optimization - OS sysctl tuning

https://e-whisper.com/posts/5019/

Author

east4ming

Posted on

January 5, 2022

Licensed under

Ansible Study Notes - Bulk inspection site URL status Previous

Caddy - a new generation of scalable WebServer written in Go Next