Best practices for making container images
This article was last updated on: July 24, 2024 am
概述
The exposed ports will be displayed
Under.
- The exposed port is also shown in the metadata of the returned image.
- When you link one container to another, the exposed ports are linked.
- Set environment variables
- 👍️ Use.
It is good practice to set environment variables for directives. An example is setting the version of a project. This makes people not looking
The case is easy to find the version. Another example is to advertise a path that can be used by another process, for example LABEL maintainer
LABEL maintainer
.
1 |
|
Avoid default passwords
❗ The bestFROM
Avoid setting a default password
。 Many people extend the base image but forget to remove or change the default password. If you assign a well-known password to a user in production, this can lead to security issues. 👍️ FROM
The password should be configured using environment variables, secrets, or other K8s encryption schemesalpine:3.13
。
If you do choose to set a default password, make sure that an appropriate warning message is displayed when the container starts. The message should inform the user of the value of the default password and explain how to change it, such as what environment variables to set.Disable SSHD❗ Disables running SSHDs in the image. Can be used latest
command to access containers running on the local host. Alternatively, it can be used latest
command to access containers running on the K8s or TKE container platform. Installing and running sshd in an image is subject to potential attacks and requires additional security patch fixes.
Use VOLUMES for persistent data
Mirrors should be usedexample
volumeexample:1
to store persistent data. This way, Kubernetes or TKE mounts the NAS to the node running the container, and if the container moves to a new node, the storage reconnects to that node. By using volumes for all persistent storage needs, persistent content is preserved even if the container is restarted or moved. If the image writes data anywhere inside the container, the data may be lost.example:1
In addition, in
Dockerfileexample:2
Explicitly defining volumes makes it easy for consumers of mirrors to understand which volumes must be defined when running mirroring.example:latest
For more information on how to use volumes in K8s or TKE container platforms, see latest
Kubernetes documentation
.
Run the container process with a non-root userBy default, Docker runs container processes with root inside the container. This is an insecure practice because if an attacker manages to break through the container, they can gain root access to the Docker host.❗ Note:
If root is in the container, then the escape is the root on the host.EXEC
Use multi-stage builds
exploitexec
Multi-stage buildexec
to create a temporary image for building artifacts that will be copied to the production image. The temporary build image is discarded along with the original files, folders, and dependencies associated with the image.TERM
This results in a lean, production-ready mirror.SIGKILL
One use case is to use a non-Alpine base image to install dependencies that need to be compiled. You can then copy the wheel file to the final image.
An example of Python is as follows:
Size before use: 705MB, size after use: 103MB**❗ Disables the storage of confidential information in containers**Storing confidential information in containers is prohibited, including:ADD
Sensitive informationapt-get install
Database credentialsrm -rf /var/lib/apt/lists/*
SSH key
User name and passwordRUN
API tokens, etcapt-get
The above information can be obtained by:
1 |
|
The environment variable ENV is passed
1 |
|
VOLUME MOUNTSapt-get
Avoid putting files in rm -rf ...
middle
For some applications (e.g. Python’s Gunicorn), some cache information or heartbeat detection information is written RUN
, this pair
yum
The read and write performance has high requirements, if
1 |
|
Mounting a normal disk can cause serious performance problems.
RUN
In some Linux distributions,COPY
PassADD
The file system is stored in memory. However, Docker containers are not set by default- Open it
- :
As shown above,
- The standard Docker overlay file system is being used: it is supported by a normal block device or a hard drive that the computer is using. This can cause performance issues .
apt-get install
For such applications, a common solution is to store their temporary files elsewhere. Especially, if you look above you will seeRUN
use- File systemShared memory and memory file system.
RUN
So all you need to do is use /dev/shm instead of /tmp- Use Alpine Linux base images (adopt with caution)
apt-get upgrade
Use based onyum upgrade all
Alpine Linux
Because it only provides the necessary packages, the resulting image is smaller.
The benefits are:dockerfile
Reduced hosting costs because less disk space is used
Faster build, download, and run timesdockerfile
More secure (because there are fewer packages and libraries)ADD
Faster deploymentapt-get install
Examples are as follows:RUN
Size before use: 702MB, size after use: 102MBADD
❗ Note:
1 |
|
With caution with alpine, I’ve seen a whole bunch of problems with Alpine Linux because it is built on top of musl libc instead of the GNU libc (glibc) used by most Linux distributions. Problems are: errors in the datetime format, crashes due to smaller stacks, etc.*use * Exclude irrelevant filesdocker build
To exclude build-agnostic files, use apt-get
File. This file is supported with ADD
Similar exclusion patterns for files. See for details
.dockerignore file
1 |
|
。Do not install unnecessary packagesTo reduce complexity, dependencies, file size, and build time, avoid installing extra or unnecessary application packages. For example, you do not need to include a text editor in database mirroring.docker build
Decouple applicationsADD
There should be only one process per container. Separating applications into multiple containers makes it easier to scale horizontally and reuse containers. For example, a web application stack LNMP might contain three separate containers, each with its own unique image, to manage the web server, application, cache database, and database in a separate manner.apt-get
It’s a good rule of thumb to limit each container to one process, but it’s not a hard and fast rule. For example, yes
Build using the init process
EXPOSE
container, and some programs may spawn other child processes (e.g. nginx) on their own.docker run -p
Judge from your own experience and keep the container as concise and modular as possible. If containers depend on each other, you can use container networking or K8s sidecars to ensure that the containers can communicate.dockerfile
Sort multi-row parametersEXPOSE
It is recommended that you order the parameters alphabetically for multiple rows to facilitate subsequent changes. This helps avoid duplicate packages and makes it easier to update lists. This also makes PRs easier to read and review. In the backslash (
- Adding spaces before also helps.
docker ps
Here’s from an example docker inspect
image- :
JAVA CONTAINER IMAGE BEST PRACTICES
IDE plugin recommendationENV
idea - Go to “Preferences”, “Plugins”, “Install JetBrains plugin…”, search for “Docker” and click "Install"dockerfile
EclipseJAVA_HOME
📓 Note:
Docker and IntelliJ IDEA
Docker and EclipseSet parameters related to memory limits📓 Note:**designate ** The JVM is told to allocate a 1 GB heap, but it does not tell the JVM to limit its entire memory usage to 1 GB. In addition to memory, there will be card tables, code caches, and various other off-heap data structures. The parameters used to specify the total memory usage are
。 Please note that use
, the heap will be approximately 250 MB.
The JVM is historically lookupdocker exec
to determine how much memory is available, and then set its heap size based on that value. Unfortunately, containers like docker are inkubectl exec
Container-specific information is not provided in . After 2017 there was a patch that provided one
A command-line argument that tells the JVM to look
to determine how much memory is available. If this patch is not available in the running version of OpenJDK, it can be set explicitly to replace.In summary, set the parameters related to memory limits:
New version of Openjdk 8, adding: *If there are no parameters above, set:*It is recommended to set the JVM Heap to approximately 50% - 80% of the memory limit
It is recommended to set the memory limit of the JVM MaxRAM close to that of the K8s podSet the GC policyThere is a patch in OpenJDK8 that will use the information available to cgroup to compute the appropriate number of parallel GC threads. However, if this patch is not available in your version of OpenJDK, assuming that your container host has 8 CPUs, but the CPU limit in the container is 2 CPUs, you may end up with 8 parallel GC threads. The workaround is to explicitly specify the number of parallel GC threads:
。
If the cpu limit is set to only one CPU in your container, it is strongly recommended
Run to avoid parallel GC altogether.
JAVA STARTUP PHASE TUNING
A JAVA PROGRAM HAS A STARTUP PHASE THAT REQUIRES A LOT OF HEAPS, AFTER WHICH IT MAY ENTER A QUIET LOOP PHASE WHERE IT DOESN’T NEED TOO MANY HEAPS.
For the serial GC policy, you can make it more aggressive by configuring, such as: (This value increases by default when heap occupancy is greater than 80 percent.) )(Shrink when heap occupancy is less than 60%)
For the parallel-parallel GC strategy, we recommend that you configure the following configurations:
THE JAVA CONTAINER GLOBALLY RECOMMENDS RESOURCE REQUESTS AND RESOURCE LIMITS
JAVA PROGRAMS HAVE A STARTUP PHASE, THE STARTUP PHASE ALSO CONSUMES A LOT OF CPU, THE MORE CPU USED, THE SHORTER THE STARTUP PHASE.
Here’s a table summarizing the Spring Boot sample app startup time (CPU in millicore) for different CPU limits:
1 |
|
500m - 80 seconds
1000m - 35 seconds
1500m - 22 seconds
- 2500m - 17 seconds
- 3000m - 12 seconds
- BASED ON THE ABOVE SITUATION, K8S OR TKE CONTAINER PLATFORM ADMINISTRATORS CAN CONSIDER THE FOLLOWING RESTRICTIONS ON JAVA CONTAINERS:
- Use CPU requests, do not set the CPU limit
- Use memory limit and equal to memory request
Examples are as follows:
- use
- And not
(Cautious assessment)/tmp
We all know, Java instances deployed on traditional virtual machines. In order to better analyze the problem, it is generally necessary to add:
This parameter, after adding this parameter, if you encounter a memory overflow, it will automatically generate HeapDump, and later we can get this HeapDump to analyze the problem more accurately./tmp
However, the application of container technology brings some differences, after using the container platform, we are more inclined to:/tmp
Encountered a failure and failed quickly/tmp
Recover quickly from failures
Try to make the user “unperceived” of the fault/tmp
Therefore, for Java application containers, we also have to optimize to meet this demand tmpfs
Examples of failures:/tmp
Encounter a failure and fail quickly, i.e. "exit as quickly as possible, end quickly"tmpfs
This is exactly what this needs are:
1 |
|
Thrown when this parameter is passed /tmp
The JVM exits immediately. You can pass this parameter if you want to terminate the exception application as soon as possible.
NGINX container image best practices/dev/shm
If you run NGINX directly on the underlying hardware or virtual machine, you typically need one NGINX instance to use all available CPUs. Since NGINX is a multi-process model, you typically start multiple worker processes, each a different process, in order to utilize all CPUs.shm
However, when running in a container, if you will ** Set to **
, will start the corresponding number of processes according to the number of CPU cores of the host where the container is located. For example, I used to run NGINX containers on physical machines
parameter, although CPU limit is set to 2, NGINX will start 64 (physical CPU) processes.Therefore, 👍️ it is recommended to configure according to the actual needs or CPU limit settings As follows:
Python container image best practices
- 🐾
- Warning:
- With the migration over time, and the deepening of practice, best practices are also changing, and the following parts are no longer best practices for Python container images.
The latest Python container image best practices can be found in this article: - https://e-whisper.com/posts/25776/
Examples are as follows:
1 |
|
Example Dockerfile
IDE plugin recommendation
PyCharm - Same as Idea
VSCode - .dockerignore
Visual Studio Code Remote - Containers
Plugins.dockerignore
Developing inside a Container.gitignore
Recommended environment variables to configure: Prevents Python from writing pyc files to the hard disk: Prevents Python from buffering stdout and stderr
: It is convenient to adjust whether to enable debug according to the different environment types (test/production).
The method for installing the database driver package
In the case of the postgredb driver psycopg2, you may need to install additional basic components:
Reference link
Docker documentation - Best practices for writing Dockerfiles"Docker and PID 1 zombie reaping problem"
“Demystifying the init system (PID 1)”
Blog article -
Resource management in Docker\
Docker documentation -
Runtime Metricsopenjdk
Blog article - Memory inside Linux containers
1 |
|
Docker documentation -
Docker basics
- Docker documentation -
- Dockerfile reference
Docker documentation -
testdriven.io -
Deploying Django to Heroku With Docker
testdriven.io -
-Xmx=1g
Dockerizing Django with Postgres, Gunicorn, and Nginx-XX:MaxRAM
dockercon-2018 --XX:MaxRam=500m
Docker for Python Developers
Docker documentation - /proc
Multi-stage build/proc
Red Hat Developer - -XX:+UseCGroupMemoryLimitForHeap
OpenJDK and Containers/sys/fs/cgroup/memory/memory.limit_in_bytes
Java Application Optimization on Kubernetes on the Example of a Spring Boot Microservice-XX:MaxRAM=n
Python Speed -
Faster Docker builds with pipenv, poetry, or pip-tools
- Python Speed -
-XX:+UseCGroupMemoryLimitForHeap
- Configuring Gunicorn for Docker
-XX:MaxRAM=n
- Docker documentation -
- Docker and Eclipse
Docker documentation -
Docker and IntelliJ IDEA-XX:ParallelGCThreads=2
Developing inside a Container
2 questions-XX:+UseSerialGC
Do you have best practices for mirroring other languages?
Are you trying to make a native executable Java image with GraalVM? How was the experience?
JAVA 程序都有一个启动阶段,它需要大量的堆,之后可能会进入一个安静的循环阶段,在这个阶段它就不需要太多的堆。
对于串行 GC 策略, 您可以通过配置使它更具侵略性, 如: -XX:MinHeapFreeRatio=20
(当堆占用率大于 80%,此值默认增大。)
XX:MaxHeapFreeRatio=40
(堆占用率小于60%时收缩)
对于并行 - parallel GC策略, 推荐如下配置:
-XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90
JAVA 容器全局建议资源请求和资源限制
JAVA 程序都有一个启动阶段,启动阶段也会大量消耗 CPU, CPU 使用越多, 启动阶段越短.
下面是一个表,总结了不同CPU限制下的 spring boot 示例应用启动时间(CPU 以 millicore 为单位):
- 500m - 80 seconds
- 1000m - 35 seconds
- 1500m - 22 seconds
- 2500m - 17 seconds
- 3000m - 12 seconds
根据以上情况, K8s 或 TKE 容器平台管理员可以考虑对 JAVA 容器做如下限制:
- 使用CPU requests, 不设置 cpu limit
- 使用 memory limit 且等于 memory request
示例如下:
1 |
|
使用 ExitOnOutOfMemoryError
而非 HeapDumpOnOutOfMemoryError
(谨慎评估)
我们都知道, 在传统的虚拟机上部署的 Java 实例. 为了更好地分析问题, 一般都是要加上: -XX:+HeapDumpOnOutOfMemoryError
这个参数的, 加这个参数后, 如果遇到内存溢出, 就会自动生成 HeapDump , 后面我们可以拿到这个 HeapDump 来更精确地分析问题.
但是, 容器技术的应用, 带来了一些不同, 在使用容器平台后, 我们更倾向于:
- 遇到故障快速失败
- 遇到故障快速恢复
- 尽量做到用户对故障"无感知"
所以, 针对 Java 应用容器, 我们也要优化以满足这种需求, 以 OutOfMemoryError
故障为例:
- 遇到故障快速失败, 即尽可能"快速退出, 快速终结"
-XX:+ExitOnOutOfMemoryError
就正好满足这种需求:
传递此参数时,抛出 OutOfMemoryError
时 JVM 将立即退出。 如果您想尽快终止异常应用程序,则可以传递此参数。
NGINX 容器镜像最佳实践
如果您直接在基础硬件或虚拟机上运行 NGINX,通常需要一个 NGINX 实例来使用所有可用的CPU。由于NGINX 是多进程模式,通常你会启动多个 worker processes,每个工作进程都是不同的进程,以便利用所有CPU。
但是,在容器中运行时,如果将 worker_processes
设置为 auto
, 会根据容器所在宿主机的 CPU 核数启动相应进程数. 比如, 我之前在物理机上运行 NGINX 容器使用 auto
参数, 尽管 CPU limit 设置为2, 但是 NGINX 会启动 64 (物理机 CPU 数) 个进程.
因此,👍️建议根据 实际需求或 CPU limit 的设置配置 nginx.conf
, 如下:
1 |
|
Python 容器镜像最佳实践
🐾Warning:
随着时间的迁移, 以及实践的深入, 最佳实践也在发生着变化, 以下部分内容已经不能作为 Python 容器镜像的最佳实践.
最新的 Python 容器镜像最佳实践可以参见这篇文章: https://e-whisper.com/posts/25776/
示例如下:
1 |
|
△ 示例 Dockerfile
IDE插件推荐
- PyCharm - 同Idea
- VSCode - Visual Studio Code Remote - Containers 插件
建议配置的环境变量
1 |
|
PYTHONDONTWRITEBYTECODE
: 防止 python 将 pyc 文件写入硬盘PYTHONUNBUFFERED
: 防止 python 缓冲 stdout 和 stderrDEBUG
: 方便根据环境类型的不同(测试/生产)调整是否开启debug
安装数据库驱动包的方法
以 postgredb 的驱动 psycopg2 为例, 可能需要安装额外的基础组件:
1 |
|
参考链接
- Docker documentation - Best practices for writing Dockerfiles
- “Docker和PID 1 zombie reaping问题”
- “揭开init系统(PID 1)的神秘面纱”
- Blog article - Resource management in Docker
- Docker documentation - Runtime Metrics
- Blog article - Memory inside Linux containers
- Docker documentation - Docker basics
- Docker documentation - Dockerfile reference
- Docker documentation - 自定义元数据。
- testdriven.io - Deploying Django to Heroku With Docker
- testdriven.io - Dockerizing Django with Postgres, Gunicorn, and Nginx
- dockercon-2018 - Docker for Python Developers
- Docker documentation - 多阶段构建
- Red Hat Developer - OpenJDK and Containers
- Java Application Optimization on Kubernetes on the Example of a Spring Boot Microservice
- Python Speed - Faster Docker builds with pipenv, poetry, or pip-tools
- Python Speed - Configuring Gunicorn for Docker
- Docker documentation - Docker and Eclipse
- Docker documentation - Docker and IntelliJ IDEA
- Developing inside a Container
2个问题
- 您是否有制作其他语言镜像的最佳实践呢?
- 您是否尝试通过 GraalVM 制作 原生可执行 Java 镜像? 体验如何?