K8S Performance Optimization - OS sysctl tuning

This article was last updated on: February 7, 2024 pm

preface

This is the first article in the K8S performance optimization series: Best practices for OS sysctl performance optimization parameters.

List of parameters

Sysctl tuning parameters at a glance

# Kubernetes Settings
vm.max_map_count = 262144
kernel.softlockup_panic = 1
kernel.softlockup_all_cpu_backtrace = 1
net.ipv4.ip_local_reserved_ports = 30000-32767

# Increase the number of connections
net.core.somaxconn = 32768

# Maximum Socket Receive Buffer
net.core.rmem_max = 16777216

# Maximum Socket Send Buffer
net.core.wmem_max = 16777216

# Increase the maximum total buffer-space allocatable
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216

# Increase the number of outstanding syn requests allowed
net.ipv4.tcp_max_syn_backlog = 8096


# For persistent HTTP connections
net.ipv4.tcp_slow_start_after_idle = 0

# Allow to reuse TIME_WAIT sockets for new connections
# when it is safe from protocol viewpoint
net.ipv4.tcp_tw_reuse = 1

# Max number of packets that can be queued on interface input
# If kernel is receiving packets faster than can be processed
# this queue increases
net.core.netdev_max_backlog = 16384

# Increase size of file handles and inode cache
fs.file-max = 2097152

# Max number of inotify instances and watches for a user
# Since dockerd runs as a single user, the default instances value of 128 per user is too low
# e.g. uses of inotify: nginx ingress controller, kubectl logs -f
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

# Additional sysctl flags that kubelet expects
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1

# Prevent docker from changing iptables: https://github.com/kubernetes/kubernetes/issues/40182
net.ipv4.ip_forward=1

In the case of AWS, the additional additions are as follows:

# AWS settings
# Issue #23395
net.ipv4.neigh.default.gc_thresh1=0

If IPv6 is enabled, add the following:

# Enable IPv6 forwarding for network plugins that don't do it themselves
net.ipv6.conf.all.forwarding=1

Parameter interpretation

Category Kernel parameters Description Reference link
Kubernetes vm.max_map_count = 262144 Limit the number of VMAs (virtual memory areas) a process can have,
A larger value is useful for Elasticsearch, Mongo, or other mmap users ES Configuration
Kubernetes kernel.softlockup_panic = 1 Used to resolve K8S kernel softlock related bugs root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com)
Kubernetes kernel.softlockup_all_cpu_backtrace = 1 Used to resolve K8S kernel softlock related bugs root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com)
Kubernetes net.ipv4.ip_local_reserved_ports = 30000-32767 Default K8S Nodport service-node-port-range and ip_local_port_range collision · Issue #6342 · kubernetes/kops (github.com)
Network net.core.somaxconn = 32768 Represents the backlog upper limit of the socket listener. What is a backlog? The backlog is the socket’s listening queue that goes into the backlog when a request has not yet been processed or created.
Increase the number of connections.
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.core.rmem_max = 16777216 The maximum value of the receive socket buffer size, in bytes.
Maximize Socket Receive Buffer
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.core.wmem_max = 16777216 The maximum value of the send socket buffer size, in bytes.
Maximize the Socket Send Buffer
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
Increase the maximum value of the total allocable buffer space Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.ipv4.tcp_max_syn_backlog = 8096 Indicates the length of the queue of connections (SYN messages) that have not yet received a client acknowledgment, which defaults to 1024
Increase the number of outstanding SYN requests
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.ipv4.tcp_slow_start_after_idle = 0 Persist HTTP connections Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.ipv4.tcp_tw_reuse = 1 Indicates that a socket that allows reuse of the TIME_WAIT state is used for new TCP connections, and defaults to 0, which means closed.
Allows reuse of TIME_WAIT sockets for new connections when the protocol is secure Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.core.netdev_max_backlog = 16384 When the NIC receives packets faster than the kernel can process, there is a queue to hold those packets. This parameter represents the maximum value of the queue
If the kernel receives packets faster than it can handle, this queue is incremented Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
File system fs.file-max = 2097152 This parameter determines the maximum number of file handles allowed on the system, and the file handle setting represents the number of files that can be opened on the Linux system.
Increase the file handle and inode cache size Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
File system fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288
The maximum number of inotify instances and watches for a user
Since dockerd runs as a single user, the default instance value of 128 per user is too low
For example, use inotify: nginx ingress controller, kubectl logs -f
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet vm.overcommit_memory = 1 A strategy for memory allocation
=1, indicating that the kernel allows all physical memory to be allocated regardless of the current memory state
Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet kernel.panic = 10 Automatic restart in panic error, wait time is 10 seconds Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet kernel.panic_on_oops = 1 The panic() operation is performed when Oops occurs Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
Network net.ipv4.ip_forward=1 Enable IP forwarding
It also prevents docker from changing iptables
Upgrading docker 1.13 on nodes causes outbound container traffic to stop working · Issue #40182 · kubernetes/kubernetes (github.com)
Network net.ipv4.neigh.default.gc_thresh1=0 Fix AWS arp_cache: neighbor table overflow! Error arp_cache: neighbor table overflow! · Issue #4533 · kubernetes/kops (github.com)

EOF


K8S Performance Optimization - OS sysctl tuning
https://e-whisper.com/posts/5019/
Author
east4ming
Posted on
January 5, 2022
Licensed under