What are containers anyways?
The official Docker resources site says:
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
You can imagine containers as some sort of virtualization. But what’s the difference?
When talking about virtualization, you may think of hypervisors. However, there are multiple ways of implementing virtualization [3]. Regarding to the topic of the post, I’m going to focus on “hypervisors” and “containers”:
- Hardware-based Virtualization: Is the virtualization of computers as complete hardware platforms, certain logical abstractions of their components, or only the functionality required to run various operating systems [4].
- Operating system-based virtualization: Operating system paradigm in which the kernel allows the existence of multiple isolated user space instances. Such instances have different name they are called containers (LXC, Solaris containers, Docker), zones (Solaris containers), virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels (DragonFly BSD), or jails (FreeBSD jail or chroot jail) [5].
Note that virtualization can be classified in multiple different ways, but this one fits the best to illustrate what containers are.
Now that we understand what the differences are, hypervisors make sense when we want to virtualize a full computer with the advantages and the requirements overhead. On the other side containers are much more lightweight, almost only stores file structures and even shared ones are stored only once. Furthermore, since they share the kernel with your OS, you only need to manage one of those. Finally don’t forget about the speed difference of running an “isolated process” compared with a full virtualized computer.
Since containers allow applications to be more rapidly deployed, patched, or scaled it’s getting more used in order to accelerate development, test, and production cycles.
Something to keep in mind is that the container images need to be build for specific target,for instance you can’t run Linux containers on Windows.
Actually there is Docker Desktop which uses WSL2 or Hyper-V to generate a VM and connect it’s kernel with docker, yet if it wasn’t for that VM it wouldn’t work.
How to create and download images
An image is a read-only template that contains a set of instructions for creating a container [6], for now let’s say that each instruction generates a new layer with the modifications.
Images are created using Dockerfile
’s, here is an example of a simple one:
|
|
We are going to dive deeper into container images later
You can build your image using docker build .
, you can also specify a different folder.
Here is the result:
Sending build context to Docker daemon 2.048kB
Step 1/3 : FROM alpine
latest: Pulling from library/alpine
Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
Status: Downloaded newer image for alpine:latest
---> c059bfaa849c
Step 2/3 : RUN ["touch", "/new"]
---> Using cache
---> 6b28fc1e2272
Step 3/3 : CMD ["echo", "Hey!"]
---> Using cache
---> cd86016de0a5
Successfully built cd86016de0a5
You can check how it is generating a layer in each step. Additionally it downloaded alpine:latest
.
There are two interesting details to explain from the output:
- If the given image is not available locally (which was the case) it downloads it from the registry, for now you can imagine it like a giant database that contains images with all it’s releases.
- It didn’t download
alpine
image, it usedalpine:latest
. Well container image format may contain optional information like the registry or tag, however if you don’t specify the release it useslatest
as default.
If for some reason you want to download an image from the registry without using it immediately you shall use docker pull
. Realize that other commands also pull images if not found locally like docker build
or docker run
.
Running containers
Creating a container is actually simple, just run docker run
, depending on the image that command by itself may not do much.
Here are the some useful options:
-p :
: Maps a container port to the host so it is accessible from the outside.-it
: Is actually a combination of--interactive
and--tty
. It allows you to get access to an interactive shell.-d
: Run container in detached mode.--rm
: Removes a container automatically after it exists.-v /path/host /path/container
Bind mount a local folder inside the container.
Container connectivity
By default docker uses a builtin bridge network, containers within that network can only communicate among them using the IP address.
If you want containers to communicate using a name you have to create a bridge network and attach the containers to it [7]:
docker network create --driver bridge alpine-net
docker run -dit --name alpine1 --network alpine-net alpine ash
docker run -dit --name alpine2 --network alpine-net alpine ash
You can also use
--link
option but it’s deprecated so I do not recommend it.
Here is a diagram of how bridge driver works:
There are other available networking drivers:
host
: The container shares the network stack with the hostoverlay
: Allowing multihost networks (used for docker swarm environments)macvlan
: Allow you to assign a MAC address to a container, making it appear as a physical device on your network. The Docker daemon routes traffic to containers by their MAC addresses.none
: Disable all networking.
This may vary on container implementation, for instance podman supports
bridge
,macvlan
andipvlan
.
How to handle storage
Containers are meant to be stateless, therefore storage shouldn’t be inside the containers. Otherwise when the container get’s removed all that data get’s erased and there is no way of getting it back. There are 3 ways of managing storage:
- Volumes
- Managed by docker (stored under
/var/lib/docker/volumes
). - Other processes shouldn’t modify data on that folder.
- Volumes may have a name.
- Is the best solution to persist data on containers.
- Bind mounts
- Any folder on the host can be mounted anywhere on the container.
- Other processes could modify files inside the mount.
- Containers could modify files on the host filesystem.
- Tmpfs
- Doesn’t write to disk, stores data on memory.
- Provides a higher performance.
Volumes lifecycle is completely independent from containers’. You can add a volume to a container (it also creates it if it doesn’t exist):
docker run --mount type=volume, source=,target= ...
Also volumes can be managed with docker volume
.
Multiple nodes with Docker-Compose
We already know how to connect containers. Nevertheless there is a better approach, using docker-compose
. With Compose, you use a YAML file to configure multicontainer services including volumes, networks and containers. Then, with a single command, you create and start all the services from your configuration, additionally this configurations files are almost equivalent to docker swarm
docker compose
may not come installed with older docker versions, check the documentation on how to install it.
Most important objects are services
, volumes
and networks
. By default it creates a network for the stack so you can access services by name.
Almost all docker parameters have an equivalent version with similar naming. Here is an example setting up wordpress:
|
|
Best practices
Here are some of the best practices when creating docker images [8]:
-
Create ephemeral containers: The container can be stopped and destroyed, then rebuilt and replaced with an absolute minimum set up and configuration. For instance any data that needs to persist must be stored in a stateful backing service, typically a database [9].
-
Switch to a non-root user: If a service can run without privileges, use
USER
to change to a non-root user. Start by creating the user and group in the Dockerfile withRUN
. -
Do not leak sensitive info to docker images: Even if it’s in a intermediate layer it can be recovered.