8. Introduction to Docker#
In this chapter, you will learn about containerization and Docker as one of the platforms for containerization.
Attribution The content of this lecture are modified from three excellent sources: Docker for beginners; Introduction to Docker Containers by Microsoft; and Reproducible Computational Environments Using Containers: Introduction to Docker from Software Carpentry
8.1. Containerization#
In software development and deployment teams, containerization has become a regular practice that allows developers to package applications and their dependencies into lightweight, portable units known as containers. Docker is one of the most popular containerization platforms.
Containers are self-sufficient units that package an application and all its required dependencies, including libraries and configuration files, into a single image. These containers can run consistently across different environments, ensuring that an application behaves the same way whether it’s in a developer’s laptop or a production server. Containers do not have high overhead like virtual machine (VM), and hence enable more efficient usage of the underlying system and resources.
8.2. Why Containers?#
Containerization has several benefits:
Portability: Containers encapsulate everything an application needs to run, making it easy to move between different environments without modification.
Consistency: Containers ensure that the application runs the same way everywhere, eliminating the “it works on my machine” problem.
Isolation: Containers provide process and resource isolation, ensuring that one container cannot interfere with another, improving security and stability.
Scalability and Speed: Containers can be easily scaled up or down to meet changing workloads and the speed to deploy them is much higher than VMs.
Resource Efficiency: Containers are lightweight and share the host OS kernel, making them more resource-efficient than traditional virtualization. While containers live on top of a host machine and use its resources, they virtualize the host OS unlike VMs that virtualize the underlying hardware. Meaning containers don’t need to have their own OS, making them much more lightweight than VMs, and consequently quicker to spin up.
8.3. What is Docker?#
Docker is a leading containerization platform that has played a major role in popularizing containers. Docker is open-source and it provides a set of tools and services for creating, deploying, and managing containers.
8.4. Docker Components#
Docker Engine: The core component of Docker that acts as a client-server application. It includes:
Docker daemon (
dockerd
) which acts as the server and responds to requests from the client.Docker REST API for communication, and
Docker client which has two alternatives: the command line interface (CLI) named
docker
and the graphical user interface (GUI) application named Docker Desktop.
Images: A snapshot of a file system with the application code and all dependencies needed to run it. Images are used to create containers.
Containers: An instance of a Docker image that can run a specific application. Containers are isolated from each other and share the host OS kernel.
Docker Hub: A registry of Docker images containing all available Docker images (link).
8.5. Run Your First Docker Container#
There is a simple Docker image that you can run as a container and also verify that your Docker installation is correct. To do this, execute the following the command:
$ docker run hello-world
8.6. What is a Dockerfile?#
A Dockerfile is a text file that contains the instructions we use to build and run a Docker image. The following aspects of the image are defined:
The base or parent image we use to create the new image
Commands to update the base OS and install additional software
Build artifacts to include, such as a developed application
Services to expose, such as storage and network configuration
Command to run when the container is launched
Note: A base image is an image that uses the Docker scratch
image. The scratch
image is an empty image that doesn’t create a filesystem layer. This image assumes that the application you’re going to run can directly use the host OS kernel. A parent image is an image from which you create your images. For example, instead of creating an image from scratch
and then installing Ubuntu, we’ll rather use an image already based on Ubuntu.
8.7. Useful Docker CLI Commands#
Build an Image
You can use
docker build
command to build a Docker image from a Dockerfile as following:$ docker build -t <NAME:TAG> .
This command assumes you have a Dockerfile in the current directory. If your Dockerfile is not in the current directory, or if the file name is not Dockerfile you need to replace
.
with-f <PATH to DOCKERFILE>
.List Images
You can list all images on your machine using the
docker images
command:$ docker images
The output will be a table similar to the following (this is from Hamed’s computer!):
REPOSITORY TAG IMAGE ID CREATED SIZE giswqs/segment-geospatial latest cd7db75a587c 3 weeks ago 5.34GB gfm-gap latest 93ca820ec782 2 months ago 2.73GB cdl latest 9f1e0f6b1273 2 months ago 2.73GB hls latest 1d3452e331df 2 months ago 9.73GB lc-td latest 221b2866cb63 3 months ago 1.82GB
Remove an Image
You can remove an image using the
docker rmi
command. This can be used to free some space on your computer. You can specify the name or ID of an image as following (including the tag is optional):$ docker rmi <IMAGE ID or NAME:TAG>
Run a Container
You can run a container using the
docker run
command. You only need to specify the Docker image name or ID to launch a container from the image.$ docker run <IMAGE ID or NAME:TAG>
List Available Containers
You can use
docker ps
to list containers in running state. To see all containers in all states, pass the-a
flag to the command:$ docker ps -a
To learn more about Docker container states, check out this page.
Interrupt a Container
You can stop or restart a container using one of the following commands:
$ docker stop <CONTAINER ID or NAME> $ docker restart <CONTAINER ID or NAME>
Remove a Container
You can remove a container using the following command. Note that this will result in all the data in the container to be deleted.
$ docker rm <COTAINER ID or NAME>
8.8. Create Your Own Dockerfile#
You can create your own Dockerfile with specific software and packages installed. This is a nice way to create a reproducible and portable runtime environment for your projects. Here is an example of a Dockerfile:
FROM continuumio/miniconda3:24.7.1-0
# Set the working directory to /home/workdir
RUN mkdir /home/workdir
WORKDIR /home/workdir
# Create a Conda env named 'myenv' with numpy installed in it
RUN conda create -n myenv numpy=2.0.1
CMD ["/bin/bash"]
So let’s look what each of these commands mean:
FROM
Use the FROM command to specify the parent image that you want your image to derive from. Here, we’re using the continuumio/miniconda3:24.7.1-0
image.
RUN
Use RUN to execute any shell command when the image is being built. Note that this is different from a command you want to execute when running the container.
WORKDIR
Sets the current working directory inside the container (like a cd
command in shell). All subsequent commands in the Dockerfile will happen inside this directory.
CMD
This is the command instruction, it specifies what to run when the container is started. Here we’re simply setting the container run the bash
.
Other useful commands inside a Dockerfile are:
COPY
The COPY instruction has the following format: COPY
8.9. Managing Storage in Docker#
All files created inside a container are stored on a writable container layer by default. This means that:
The data won’t exist after the container is removed.
It would be difficult to access the data outside the container.
You can’t easily move the data on the host machine.
To address these challenges, Docker has a mechanism for containers to store files on the host machine. This means the files can be easily accessed by other processes outside the container, and they will persist after the container is removed.
The easiest way to do this is to mount a directory on the host machine to the container using the following command:
$ docker run -v $(pwd):/home/workdir <IMAGE NAME>
In this example, current directory on host machine (pwd
) is mounted to /home/workdir/
inside the container. This means that any file and directory that is available inside the current directory on the host will be accessible at /home/workdir/
inside the container. If you make changes to these files or directories on either the host or the container, it will be reflected on the other side (practically these are the same files stored on the host, but accessible from two separate places). It’s best to only change the files from inside the container then to make sure your changes don’t conflict each other.
You can also mount a directory that doesn’t exist on the host to a directory inside the container. For example running the following:
$ docker run -v /doesnt/exist:/home/workdir <IMAGE NAME>
will automatically create /doesnt/exist
on the host before starting the container.
8.10. Activating Conda Environments in Docker#
You can use conda inside Docker to manage packages and environments. To do that, you can use conda in the Dockerfile to create a new environment and install packages. However, if you want to activate the environment before starting the container you need to do some extra steps.
Try building an image using the following Dockerfile:
FROM continuumio/miniconda3:24.7.1-0
# Set the working directory to /home/workdir
RUN mkdir /home/workdir
WORKDIR /home/workdir
# Create a Conda env named 'myenv' with numpy installed in it
RUN conda create -n myenv numpy=2.0.1
# Activate the conda environment
RUN conda activate myenv
CMD ["/bin/bash"]
As you noticed, the Docker build in this case fails. This is because Docker runs each command in a new shell, and the conda activate command needs to be run in the same shell where the environment was created.
There are multiple ways to resolve this issue. One of them, which we recommend, is to add the conda activate
command to your .bashrc
file. .bashrc
is a script file that is executed when a user logs in. In this case, any command included in the .bashrc
will be executed when the container runs. Try building an image from the following Dockerfile and then run it as a container:
FROM continuumio/miniconda3:24.7.1-0
# Set the working directory to /home/workdir
RUN mkdir /home/workdir
WORKDIR /home/workdir
# Create a Conda env named 'myenv' with numpy installed in it
RUN conda create -n myenv numpy=2.0.1
# Activate the conda environment
RUN echo "conda activate myenv" >> ~/.bashrc
ENV PATH="$PATH:/opt/conda/envs/myenv/bin"
CMD ["/bin/bash"]
8.11. Running Jupyter Notebooks Inside a Container#
You can install and run a Jupyter server inside the container the same way you would do inside a conda environment on your machine. There are a couple of steps you need to follow to make it accessible outside of the container though.
First, you need to create a new user inside the container, and switch to that user. This is generally a good practice as you don’t want to run the container as root user.
Second, you need to expose port 8888
that is used by Jupyter server from the container. This allows any processor outside of the container to communicate with processes inside the container through port 8888
.
Third, you need to include Jupyter Lab command at the end of your Dockerfile. For this, you need to pass an extra argument to set the IP of the server to 0.0.0.0
The following sample Dockerfile implements these three changes, and runs Jupyter Lab when the container is launched.
FROM continuumio/miniconda3:24.7.1-0
# Create a Conda environment with JupyterLab installed
RUN conda create -n myenv numpy=1.25.0 jupyterlab=3.6.3
# Activate the Conda environment
RUN echo "conda activate myenv" >> ~/.bashrc
ENV PATH="$PATH:/opt/conda/envs/myenv/bin"
# Create a non-root user and switch to that user
RUN useradd -m jupyteruser
USER jupyteruser
# Set the working directory to /home/jupyteruser
WORKDIR /home/jupyteruser
# Expose the JupyterLab port
EXPOSE 8888
# Start JupyterLab
CMD ["jupyter", "lab", "--ip=0.0.0.0"]
Note: The new user you create inside the container, in this case jupyteruser
, can only access their home directory /home/jupyteruser
. Therefore, you need to launch Jupyter from this working directory, and when you mount directories at the launch time you should mount them to jupyteruser
’s home directory or any sub-directory in there.
Finally, to run the container you should publish container’s port 8888
to a port on the host (it can be the same 8888
if it’s not being used otherwise):
$ docker run -it -p 8888:8888 <IMAGE NAME>
Lastly, you can copy the url of the Jupyter server and past it in your browser to access Jupyter Lab.
8.12. Working with Docker Hub#
So far you have learned how to build Docker images, run Docker containers, and use these to create reproducible work environments. Let’s say you now want to share your Docker image with another colleague, or share it publicly for others to access. For this purpose, you can use a registry. Docker Hub is one of main registries for sharing Docker images. In this section, you will learn to push images to your Docker Hub account.
Here are the steps to follow:
Create a Docker account. You can do this by selecting Sign In at the top-right corner of Docker Desktop Dashboard.
Create a new repository on your Docker Hub account. Open Docker Hub and select Create repository. Enter a Name and Description, and set the visibility to Public.
Now that you have a repository, you can build and push an image to this repository as following:
Build your image using the following command and swapping out
DOCKER_USERNAME
with your username andIMAGE_NAME
with the name of the image/repository:$ docker build -t DOCKER_USERNAME/IMAGE_NAME .
Verify that the image has been built locally by running
docker images
ordocker image ls
command.To push the image, use the docker push command (similarly replace
DOCKER_USERNAME
with your username andIMAGE_NAME
with the name of the image/repository):$ docker push DOCKER_USERNAME/IMAGE_NAME
Now that you have published a Docker image on Docker Hub you can use docker pull
command to download that image into any machine that doesn’t have the image. For this purpose, you need to run:
$ docker pull DOCKER_USERNAME/IMAGE_NAME
Cleanup Commands
It’s good practice to remove unwanted Docker images and containers to cleanup your disk space and memory. You can use prune
to do this as following:
docker container prune
removes all stopped containers.docker image prune
removes all unused or dangling images (images that do not have a tag).docker system prune
removes all stopped containers, dangling images, and dangling build caches.
Tip
You can consult this Docker CLI cheatsheet for a quick reference of its most used commands.