Using NVIDIA GPU within Docker Containers

Jonathan Petitcolas
Jonathan PetitcolasMarch 21, 2018
#devops#ai#popular

Diving into machine learning requires some computation power, mainly brought by GPUs. But I'm reluctant to install new software stacks on my laptop - I prefer installing them in Docker containers, to avoid polluting other programs, and to be able to share the results with my coworkers. That means I have to configure Docker to use my GPU. This is the story of how I managed to do it, in about half a day.

Introduction to NVIDIA Docker

I'm used to using Docker for all my projects at marmelab. It allows to setup easily even the most complex infrastructures, without polluting the local system. However, as image processing generally requires a GPU for better performances, the first question is: can Docker handle GPUs?

Looking for an answer to this question leads me to the nvidia-docker repository, described in a concise and effective way as:

Build and run Docker containers leveraging NVIDIA GPUs

Fortunately, I have an NVIDIA graphic card on my laptop. NVIDIA engineers found a way to share GPU drivers from host to containers, without having them installed on each container individually.

Enable NVIDIA GPU within Docker containers

GPUs on container would be the host container ones. Looks promising. Let's give it a try!

Installing CUDA on Host

CUDA is a parallel computing platform allowing to use GPU for general purpose processing. It is strongly recommended when dealing with machine learning, an important resource consuming task.

Does our Graphical Card supports CUDA?

The first step is to identify precisely the model of my graphical card. This is done easily on Linux using the lspci util:

lspci | grep VGA

00:02.0 VGA compatible controller: Intel Corporation [...] (rev 09)
01:00.0 VGA compatible controller: NVIDIA [...] [GeForce GT 650M] (rev a1)

So, I have a GT650M. Checking on NVIDIA commercial website, this card has CUDA support. Great so far!

Installing CUDA for NVIDIA Graphical Cards

Let's follow the NVIDIA graphical card installation guide to install the latest version of CUDA. I am using Linux Mint 18.2, based on Ubuntu Xenial (16.04).

# Install NVIDIA repo metadata
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo dpkg --install cuda-repo-ubuntu1604_9.1.85-1_amd64.deb

# Install CUDA GPG key
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

This last step caused an issue because of the https protocol:

gpgkeys: protocol `https' not supported

Fixing it is as easy as installing the missing package:

sudo apt install gnupg-curl

Now I can install the cuda package the standard way:

sudo apt-get update
sudo apt install cuda

After downloading and installing almost 3GB of data, I still need to perform some post-installation tasks.

Updating environment variables

First, I need to update some environment variables. I edit my ~/.bashrc file, and add the following lines:

# CUDA
export PATH="/usr/local/cuda-9.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-9.1/lib64:$LD_LIBRARY_PATH"

I then reload the environment variables using:

source ~/.bashrc

NVIDIA Persistence Daemon

Then, I need to start the NVIDIA Persistence Daemon as the first NVIDIA software during boot process. The persistence daemon aims to keep GPU initialized even when no client is connected to it, and to keep its state across CUDA jobs. No precise idea why, but it is required.

I create a new /usr/lib/systemd/system/nvidia-persistenced.service with the following content:

[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.target

Then, I enable this newly created service:

sudo systemctl enable nvidia-persistenced

Disable some UDEV Rule

A udev rule (an interface between physical devices and the system) prevents NVIDIA driver from working properly. Hence, I need to edit the /lib/udev/rules.d/40-vm-hotadd.rules file, and comment the memory subsystem rule:

# SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state", ATTR{state}="online"

After a machine reboot, it is time to test my CUDA installation by compiling some of the provided examples:

cuda-install-samples-9.1.sh ~
cd ~/NVIDIA_CUDA-9.1_Samples
make

# take a book, go for a walk, or any other activity that takes time...

./bin/x86_64/linux/release/deviceQuery | tail -n 1

Having Result = PASS shows that my CUDA installation is fully operational!

Installing NVIDIA Docker

Installing nvidia-docker is far easier than installing CUDA. First, I need to add the nvidia-docker dependencies:

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

Before installing nvidia-docker2 utility, I need to ensure that I use docker-ce, the latest official Docker release. Based on official documentation, here is the process to follow:

# remove all previous Docker versions
sudo apt-get remove docker docker-engine docker.io

# add Docker official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# Add Docker repository (for Ubuntu Xenial)
sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"

sudo apt-get update
sudo apt install docker-ce

Now that I have the last version of Docker, I can install nvidia-docker:

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

I now have access to a Docker nvidia runtime, which embeds my GPU in a container. I can use it with any Docker container.

Let's ensure everything work as expected, using a Docker image called nvidia-smi, which is a NVidia utility allowing to monitor (and manage) GPUs:

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Launching the previous command should return the following output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 650M     Off  | 00000000:01:00.0 N/A |                  N/A |
| N/A   72C    P0    N/A /  N/A |    554MiB /  4037MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

I now have access to the GPU from my Docker containers! \o/

Benchmarking Between GPU and CPU

It is time to benchmark the difference between GPU and CPU processing. I decided to look for a TensorFlow sample, as it can run either on GPU, or CPU. After some googling, I found a benchmark script on learningtensorflow.com:

import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

device_name = sys.argv[1]  # Choose device from cmd line. Options: gpu or cpu
shape = (int(sys.argv[2]), int(sys.argv[2]))
if device_name == "gpu":
    device_name = "/gpu:0"
else:
    device_name = "/cpu:0"

with tf.device(device_name):
    random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        result = session.run(sum_operation)
        print(result)

# It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", str(datetime.now() - startTime))

This script takes two arguments: cpu or gpu, and a matrix size. It performs some matrix operations, and returns the time spent on the task.

I now want to call this script using Docker and the nvidia runtime. I settled on the tensorflow/tensorflow:latest-gpu Docker image, which provides a fully working TensorFlow environment:

docker run \
    --runtime=nvidia \
    --rm \
    -ti \
    -v "${PWD}:/app" \
    gcr.io/tensorflow/tensorflow:latest-gpu \
    python /app/benchmark.py cpu 10000

Executing this command twice gave me the following results:

  • With GPU: 8.36 seconds
  • With CPU: 25.83 seconds

That means using the GPU across Docker is approximatively 68% faster than using the CPU across Docker. Whew! Impressive numbers for such a simple script. It is very likely that this difference will be multiplied when used on concrete cases, such as image recognition. But we'll see that in another post. Stay tuned!

Conclusion

Linking GPU to my Docker container has been a long (about a half day) and somewhat stressful process. Official documentation is straightforward. Yet, it doesn't explain the purpose of each command. For instance, I had no idea of what was the NVidia Persistence Daemon when I installed it. I had to dig deeper in other parts of the documentation to get more information. And even now, I don't have the full understanding of its purpose.

The most stressful moment was probably the required reboot, just after tweaking some /lib/* and /usr/* system files with some unfamiliar lines of text. I modified my host system after all, not a throwable Docker container. But everything worked well without any hassle.

As a final word, I highly recommend enabling the --runtime="nvidia" option. The performance boost is indubitable, even for a simple script. And I'm sure it will feel like a must-have once I'll start to play with image classification.

Did you like this article? Share it!