Best practices for making container images

This article was last updated on: February 7, 2024 pm

概述

The exposed ports will be displayed

Under.

  1. The exposed port is also shown in the metadata of the returned image.
  2. When you link one container to another, the exposed ports are linked.
  3. Set environment variables
  4. 👍️ Use.

It is good practice to set environment variables for directives. An example is setting the version of a project. This makes people not looking

The case is easy to find the version. Another example is to advertise a path that can be used by another process, for example LABEL maintainer

LABEL maintainer.

1
LABEL maintainer="[email protected]"

Avoid default passwords

The bestFROMAvoid setting a default password

。 Many people extend the base image but forget to remove or change the default password. If you assign a well-known password to a user in production, this can lead to security issues. 👍️ FROMThe password should be configured using environment variables, secrets, or other K8s encryption schemesalpine:3.13

If you do choose to set a default password, make sure that an appropriate warning message is displayed when the container starts. The message should inform the user of the value of the default password and explain how to change it, such as what environment variables to set.Disable SSHD Disables running SSHDs in the image. Can be used latest command to access containers running on the local host. Alternatively, it can be used latest command to access containers running on the K8s or TKE container platform. Installing and running sshd in an image is subject to potential attacks and requires additional security patch fixes.

Use VOLUMES for persistent data

Mirrors should be usedexamplevolumeexample:1to store persistent data. This way, Kubernetes or TKE mounts the NAS to the node running the container, and if the container moves to a new node, the storage reconnects to that node. By using volumes for all persistent storage needs, persistent content is preserved even if the container is restarted or moved. If the image writes data anywhere inside the container, the data may be lost.example:1In addition, in

Dockerfileexample:2 Explicitly defining volumes makes it easy for consumers of mirrors to understand which volumes must be defined when running mirroring.example:latestFor more information on how to use volumes in K8s or TKE container platforms, see latestKubernetes documentation

.

Run the container process with a non-root userBy default, Docker runs container processes with root inside the container. This is an insecure practice because if an attacker manages to break through the container, they can gain root access to the Docker host. Note:

If root is in the container, then the escape is the root on the host.EXECUse multi-stage builds

exploitexecMulti-stage buildexecto create a temporary image for building artifacts that will be copied to the production image. The temporary build image is discarded along with the original files, folders, and dependencies associated with the image.TERMThis results in a lean, production-ready mirror.SIGKILLOne use case is to use a non-Alpine base image to install dependencies that need to be compiled. You can then copy the wheel file to the final image.

An example of Python is as follows:

Size before use: 705MB, size after use: 103MB** Disables the storage of confidential information in containers**Storing confidential information in containers is prohibited, including:ADDSensitive informationapt-get installDatabase credentialsrm -rf /var/lib/apt/lists/*SSH key

User name and passwordRUNAPI tokens, etcapt-getThe above information can be obtained by:

1
2
3
4
RUN apt-get update && apt-get install -y \
curl \
s3cmd=1.1.* \
&& rm -rf /var/lib/apt/lists/*

The environment variable ENV is passed

1
2
RUN apt-get install curl -y
RUN apt-get install s3cmd -y && rm -rf /var/lib/apt/lists/*

VOLUME MOUNTSapt-getAvoid putting files in rm -rf ... middle

For some applications (e.g. Python’s Gunicorn), some cache information or heartbeat detection information is written RUN , this pair

yum The read and write performance has high requirements, if

1
RUN yum -y install curl && yum -y install s3cmd && yum clean all -y

Mounting a normal disk can cause serious performance problems.

  1. RUNIn some Linux distributions,COPY Pass ADD The file system is stored in memory. However, Docker containers are not set by default
  2. Open it

As shown above,

  1. The standard Docker overlay file system is being used: it is supported by a normal block device or a hard drive that the computer is using. This can cause performance issues .apt-get installFor such applications, a common solution is to store their temporary files elsewhere. Especially, if you look above you will see RUN use
  2. File systemShared memory and memory file system.RUNSo all you need to do is use /dev/shm instead of /tmp
  3. Use Alpine Linux base images (adopt with caution)apt-get upgradeUse based onyum upgrade allAlpine Linux

Because it only provides the necessary packages, the resulting image is smaller.

The benefits are:dockerfileReduced hosting costs because less disk space is used

Faster build, download, and run timesdockerfileMore secure (because there are fewer packages and libraries)ADDFaster deploymentapt-get installExamples are as follows:RUNSize before use: 702MB, size after use: 102MBADD Note:

1
2
3
FROM alpine:3.11
RUN apt-get -y install curl && rm -rf /var/lib/apt/lists/*
ADD app /app

With caution with alpine, I’ve seen a whole bunch of problems with Alpine Linux because it is built on top of musl libc instead of the GNU libc (glibc) used by most Linux distributions. Problems are: errors in the datetime format, crashes due to smaller stacks, etc.*use * Exclude irrelevant filesdocker buildTo exclude build-agnostic files, use apt-get File. This file is supported with ADD Similar exclusion patterns for files. See for details

.dockerignore file

1
2
3
FROM alpine:3.11
ADD app /app
RUN apt-get -y install curl && rm -rf /var/lib/apt/lists/*

Do not install unnecessary packagesTo reduce complexity, dependencies, file size, and build time, avoid installing extra or unnecessary application packages. For example, you do not need to include a text editor in database mirroring.docker buildDecouple applicationsADDThere should be only one process per container. Separating applications into multiple containers makes it easier to scale horizontally and reuse containers. For example, a web application stack LNMP might contain three separate containers, each with its own unique image, to manage the web server, application, cache database, and database in a separate manner.apt-getIt’s a good rule of thumb to limit each container to one process, but it’s not a hard and fast rule. For example, yes

Build using the init process

EXPOSEcontainer, and some programs may spawn other child processes (e.g. nginx) on their own.docker run -pJudge from your own experience and keep the container as concise and modular as possible. If containers depend on each other, you can use container networking or K8s sidecars to ensure that the containers can communicate.dockerfileSort multi-row parametersEXPOSEIt is recommended that you order the parameters alphabetically for multiple rows to facilitate subsequent changes. This helps avoid duplicate packages and makes it easier to update lists. This also makes PRs easier to read and review. In the backslash (

  • Adding spaces before also helps.docker psHere’s from an example
  • docker inspectimage

JAVA CONTAINER IMAGE BEST PRACTICES

IDE plugin recommendationENVidea - Go to “Preferences”, “Plugins”, “Install JetBrains plugin…”, search for “Docker” and click "Install"dockerfileEclipseJAVA_HOME📓 Note:

Docker and IntelliJ IDEA

Docker and EclipseSet parameters related to memory limits📓 Note:**designate ** The JVM is told to allocate a 1 GB heap, but it does not tell the JVM to limit its entire memory usage to 1 GB. In addition to memory, there will be card tables, code caches, and various other off-heap data structures. The parameters used to specify the total memory usage are

。 Please note that use

, the heap will be approximately 250 MB.

The JVM is historically lookupdocker execto determine how much memory is available, and then set its heap size based on that value. Unfortunately, containers like docker are inkubectl execContainer-specific information is not provided in . After 2017 there was a patch that provided one

A command-line argument that tells the JVM to look

to determine how much memory is available. If this patch is not available in the running version of OpenJDK, it can be set explicitly to replace.In summary, set the parameters related to memory limits:

New version of Openjdk 8, adding: *If there are no parameters above, set:*It is recommended to set the JVM Heap to approximately 50% - 80% of the memory limit

It is recommended to set the memory limit of the JVM MaxRAM close to that of the K8s podSet the GC policyThere is a patch in OpenJDK8 that will use the information available to cgroup to compute the appropriate number of parallel GC threads. However, if this patch is not available in your version of OpenJDK, assuming that your container host has 8 CPUs, but the CPU limit in the container is 2 CPUs, you may end up with 8 parallel GC threads. The workaround is to explicitly specify the number of parallel GC threads:

If the cpu limit is set to only one CPU in your container, it is strongly recommended

Run to avoid parallel GC altogether.

JAVA STARTUP PHASE TUNING

A JAVA PROGRAM HAS A STARTUP PHASE THAT REQUIRES A LOT OF HEAPS, AFTER WHICH IT MAY ENTER A QUIET LOOP PHASE WHERE IT DOESN’T NEED TOO MANY HEAPS.

For the serial GC policy, you can make it more aggressive by configuring, such as: (This value increases by default when heap occupancy is greater than 80 percent.) )(Shrink when heap occupancy is less than 60%)

For the parallel-parallel GC strategy, we recommend that you configure the following configurations:

THE JAVA CONTAINER GLOBALLY RECOMMENDS RESOURCE REQUESTS AND RESOURCE LIMITS

JAVA PROGRAMS HAVE A STARTUP PHASE, THE STARTUP PHASE ALSO CONSUMES A LOT OF CPU, THE MORE CPU USED, THE SHORTER THE STARTUP PHASE.
Here’s a table summarizing the Spring Boot sample app startup time (CPU in millicore) for different CPU limits:

1
2
3
4
5
6
7
8
9
10
FROM python:3.6 as base
COPY requirements.txt /
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /wheels -r requirements.txt

FROM python:3.6-alpine
COPY --from=base /wheels /wheels
COPY --from=base requirements.txt .
RUN pip install --no-cache /wheels/* # flask, gunicorn, pycrypto
WORKDIR /app
COPY . /app

500m - 80 seconds

1000m - 35 seconds

1500m - 22 seconds

  • 2500m - 17 seconds
  • 3000m - 12 seconds
  • BASED ON THE ABOVE SITUATION, K8S OR TKE CONTAINER PLATFORM ADMINISTRATORS CAN CONSIDER THE FOLLOWING RESTRICTIONS ON JAVA CONTAINERS:
  • Use CPU requests, do not set the CPU limit
  • Use memory limit and equal to memory request

Examples are as follows:

  • use
  • And not

(Cautious assessment)/tmpWe all know, Java instances deployed on traditional virtual machines. In order to better analyze the problem, it is generally necessary to add:

This parameter, after adding this parameter, if you encounter a memory overflow, it will automatically generate HeapDump, and later we can get this HeapDump to analyze the problem more accurately./tmpHowever, the application of container technology brings some differences, after using the container platform, we are more inclined to:/tmpEncountered a failure and failed quickly/tmpRecover quickly from failures

Try to make the user “unperceived” of the fault/tmpTherefore, for Java application containers, we also have to optimize to meet this demand tmpfs Examples of failures:/tmpEncounter a failure and fail quickly, i.e. "exit as quickly as possible, end quickly"tmpfs This is exactly what this needs are:

1
2
3
4
5
6
7
$ docker run --rm -it ubuntu:18.04 df
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 31263648 25656756 3995732 87% /
tmpfs 65536 0 65536 0% /dev
tmpfs 4026608 0 4026608 0% /sys/fs/cgroup
/dev/mapper/root 31263648 25656756 3995732 87% /etc/hosts
shm 65536 0 65536 0% /dev/shm

Thrown when this parameter is passed /tmp The JVM exits immediately. You can pass this parameter if you want to terminate the exception application as soon as possible.

NGINX container image best practices/dev/shmIf you run NGINX directly on the underlying hardware or virtual machine, you typically need one NGINX instance to use all available CPUs. Since NGINX is a multi-process model, you typically start multiple worker processes, each a different process, in order to utilize all CPUs.shmHowever, when running in a container, if you will ** Set to **

, will start the corresponding number of processes according to the number of CPU cores of the host where the container is located. For example, I used to run NGINX containers on physical machines

parameter, although CPU limit is set to 2, NGINX will start 64 (physical CPU) processes.Therefore, 👍️ it is recommended to configure according to the actual needs or CPU limit settings As follows:

Python container image best practices

  1. 🐾
  2. Warning:
  3. With the migration over time, and the deepening of practice, best practices are also changing, and the following parts are no longer best practices for Python container images.
    The latest Python container image best practices can be found in this article:
  4. https://e-whisper.com/posts/25776/

Examples are as follows:

1
2
3
4
5
FROM python:3.6-alpine
WORKDIR /app
COPY requirements.txt /
RUN pip install -r /requirements.txt # flask and gunicorn
COPY . /app

Example Dockerfile

IDE plugin recommendation

PyCharm - Same as Idea

VSCode - .dockerignoreVisual Studio Code Remote - Containers

Plugins.dockerignoreDeveloping inside a Container.gitignoreRecommended environment variables to configure: Prevents Python from writing pyc files to the hard disk: Prevents Python from buffering stdout and stderr

: It is convenient to adjust whether to enable debug according to the different environment types (test/production).

The method for installing the database driver package

In the case of the postgredb driver psycopg2, you may need to install additional basic components:

Reference link

Docker documentation - Best practices for writing Dockerfiles"Docker and PID 1 zombie reaping problem"

“Demystifying the init system (PID 1)”

Blog article -

Resource management in Docker\Docker documentation -

Runtime MetricsopenjdkBlog article - Memory inside Linux containers

1
2
3
4
5
6
7
8
9
...
apt-get update; \
apt-get install -y --no-install-recommends \
dirmngr \
gnupg \
wget \
; \
rm -rf /var/lib/apt/lists/*; \
...

Docker documentation -

Docker basics

  • Docker documentation -
  • Dockerfile reference

Docker documentation -

Custom metadata

testdriven.io -

Deploying Django to Heroku With Docker

testdriven.io - -Xmx=1gDockerizing Django with Postgres, Gunicorn, and Nginx-XX:MaxRAMdockercon-2018 - -XX:MaxRam=500mDocker for Python Developers

Docker documentation - /procMulti-stage build/procRed Hat Developer - -XX:+UseCGroupMemoryLimitForHeapOpenJDK and Containers/sys/fs/cgroup/memory/memory.limit_in_bytesJava Application Optimization on Kubernetes on the Example of a Spring Boot Microservice-XX:MaxRAM=nPython Speed -

Faster Docker builds with pipenv, poetry, or pip-tools

  1. Python Speed - -XX:+UseCGroupMemoryLimitForHeap
  2. Configuring Gunicorn for Docker-XX:MaxRAM=n
  3. Docker documentation -
  4. Docker and Eclipse

Docker documentation -

Docker and IntelliJ IDEA-XX:ParallelGCThreads=2Developing inside a Container

2 questions-XX:+UseSerialGCDo you have best practices for mirroring other languages?

Are you trying to make a native executable Java image with GraalVM? How was the experience?

JAVA 程序都有一个启动阶段,它需要大量的堆,之后可能会进入一个安静的循环阶段,在这个阶段它就不需要太多的堆。

对于串行 GC 策略, 您可以通过配置使它更具侵略性, 如: -XX:MinHeapFreeRatio=20(当堆占用率大于 80%,此值默认增大。)

XX:MaxHeapFreeRatio=40(堆占用率小于 60% 时收缩)

对于并行 - parallel GC 策略, 推荐如下配置:

-XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90

JAVA 容器全局建议资源请求和资源限制

JAVA 程序都有一个启动阶段,启动阶段也会大量消耗 CPU, CPU 使用越多, 启动阶段越短.
下面是一个表,总结了不同 CPU 限制下的 spring boot 示例应用启动时间(CPU 以 millicore 为单位):

  • 500m - 80 seconds
  • 1000m - 35 seconds
  • 1500m - 22 seconds
  • 2500m - 17 seconds
  • 3000m - 12 seconds

根据以上情况, K8s 或 TKE 容器平台管理员可以考虑对 JAVA 容器做如下限制:

  • 使用 CPU requests, 不设置 cpu limit
  • 使用 memory limit 且等于 memory request

示例如下:

1
2
3
4
5
6
resources:
requests:
memory: "1024Mi"
cpu: "500m"
limits:
memory: "1024Mi"

使用 ExitOnOutOfMemoryError 而非 HeapDumpOnOutOfMemoryError (谨慎评估)

我们都知道, 在传统的虚拟机上部署的 Java 实例. 为了更好地分析问题, 一般都是要加上: -XX:+HeapDumpOnOutOfMemoryError这个参数的, 加这个参数后, 如果遇到内存溢出, 就会自动生成 HeapDump , 后面我们可以拿到这个 HeapDump 来更精确地分析问题.

但是, 容器技术的应用, 带来了一些不同, 在使用容器平台后, 我们更倾向于:

  1. 遇到故障快速失败
  2. 遇到故障快速恢复
  3. 尽量做到用户对故障 " 无感知 "

所以, 针对 Java 应用容器, 我们也要优化以满足这种需求, 以 OutOfMemoryError 故障为例:

  1. 遇到故障快速失败, 即尽可能 " 快速退出, 快速终结 "

-XX:+ExitOnOutOfMemoryError 就正好满足这种需求:

传递此参数时,抛出 OutOfMemoryError 时 JVM 将立即退出。 如果您想尽快终止异常应用程序,则可以传递此参数。

NGINX 容器镜像最佳实践

如果您直接在基础硬件或虚拟机上运行 NGINX,通常需要一个 NGINX 实例来使用所有可用的 CPU。由于 NGINX 是多进程模式,通常你会启动多个 worker processes,每个工作进程都是不同的进程,以便利用所有 CPU。

但是,在容器中运行时,如果将 worker_processes 设置为 auto, 会根据容器所在宿主机的 CPU 核数启动相应进程数. 比如, 我之前在物理机上运行 NGINX 容器使用 auto 参数, 尽管 CPU limit 设置为 2, 但是 NGINX 会启动 64 (物理机 CPU 数) 个进程.

因此,👍️建议根据 实际需求或 CPU limit 的设置配置 nginx.conf, 如下:

1
worker_processes  2;

Python 容器镜像最佳实践

🐾Warning:
随着时间的迁移, 以及实践的深入, 最佳实践也在发生着变化, 以下部分内容已经不能作为 Python 容器镜像的最佳实践.
最新的 Python 容器镜像最佳实践可以参见这篇文章: https://e-whisper.com/posts/25776/

示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 基于官方基础镜像
FROM python:3.7-alpine

# 设置工作目录
WORKDIR /app

# 设置环境变量
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBUG 0

# install psycopg2
RUN apk update \
&& apk add --virtual build-deps gcc musl-dev python3-dev \
&& apk add postgresql-dev \
&& pip install psycopg2 \
&& apk del build-deps

# install dependencies
COPY ./requirements.txt .
RUN pip install -r requirements.txt

# copy project
COPY . .

# 切换到非 root 用户
RUN adduser -D myuser
USER myuser

# run gunicorn
CMD gunicorn hello_django.wsgi:application --bind 0.0.0.0:$PORT

△ 示例 Dockerfile

IDE 插件推荐

建议配置的环境变量

1
2
3
4
# set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
ENV DEBUG 0
  1. PYTHONDONTWRITEBYTECODE: 防止 python 将 pyc 文件写入硬盘
  2. PYTHONUNBUFFERED: 防止 python 缓冲 stdout 和 stderr
  3. DEBUG: 方便根据环境类型的不同 (测试 / 生产) 调整是否开启 debug

安装数据库驱动包的方法

以 postgredb 的驱动 psycopg2 为例, 可能需要安装额外的基础组件:

1
2
3
4
5
6
# install psycopg2
RUN apk update \
&& apk add --virtual build-deps gcc musl-dev python3-dev \
&& apk add postgresql-dev \
&& pip install psycopg2 \
&& apk del build-deps

参考链接

2 个问题

  1. 您是否有制作其他语言镜像的最佳实践呢?
  2. 您是否尝试通过 GraalVM 制作 原生可执行 Java 镜像? 体验如何?

Best practices for making container images
https://e-whisper.com/posts/8023/
Author
east4ming
Posted on
July 25, 2019
Licensed under