Plex GPU transcoding in Docker on LXC on Proxmox Updated 04/2024

Nvidia passthrough to Promox > LXC > Docker

Originally From Jocke Here

I’ll assume you’ve got Proxmox and LXC set up, ready to go, running Debian 11 (Bullseye). In my example I’ll be running LXC container named docker1 (ID 101) on my Proxmox host. Everything will be headless (i.e. no X involved). The LXC will be privileged with fuse=1,nesting=1 set as features. I’ll use a Nvidia RTX A2000 as the GPU. All commands will be run as root. Note that there might be other steps that needs to be done if you attempt to run this in a rootless/unprivileged LXC container (see here for more information).

Proxmox host

First step is to install the drivers on the host. Nvidia has an official Debian repo, that we could use. However, that introduces a potential problem; we need to install the drivers on the LXC container later without kernel modules. I could not find a way to do this using the packages within the official Debian repo, and therefore had to install the drivers manually within the LXC container. The other aspect is that both the host and the LXC container need to run the same driver version (or else it won’t work). If we install using official Debian repo on the host, and manual driver install on the LXC container, we could easily end up with different versions (whenever you do an apt upgrade on the host). In order to have this as consistent as possible, we’ll install the driver manually on both the host and within the LXC container.

# we need to disable the Nouveau kernel module before we can install NVIDIA drivers
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot

# install packages required to build NVIDIA kernel drivers (only needed on host)
apt install build-essential

# install pve headers matching your current kernel
# older Proxmox versions might need to use "pve-headers-*" rather than "proxmox-headers-*"
apt install proxmox-headers-$(uname -r)

# download + install nvidia driver
# 550.54.14 was the latest at the time of this writing
wget -O NVIDIA-Linux-x86_64-550.54.14.run  https://us.download.nvidia.com/XFree86/Linux-x86_64/550.54.14/NVIDIA-Linux-x86_64-550.54.14.run

chmod +x NVIDIA-Linux-x86_64-550.54.14.run
./NVIDIA-Linux-x86_64-550.54.14.run --check
# answer "no" when it asks if you want to install 32bit compability drivers
# answer "no" when it asks if it should update X config
./NVIDIA-Linux-x86_64-550.54.14.run

With the drivers installed, we need to add some udev-rules. This is to make sure proper kernel modules are loaded, and that all the relevant device files is created upon boot.

# add kernel modules
echo -e '\n# load nvidia modules\nnvidia-drm\nnvidia-uvm' >> /etc/modules-load.d/modules.conf

# add the following to /etc/udev/rules.d/70-nvidia.rules
# will create relevant device files within /dev/ during boot
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
SUBSYSTEM=="module", ACTION=="add", DEVPATH=="/module/nvidia", RUN+="/usr/bin/nvidia-modprobe -m"

To avoid that the driver/kernel module is unloaded whenever the GPU is not used, we should run the Nvidia provided persistence service. It’s made available to us after the driver install.

# copy and extract
cp /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 .
bunzip2 nvidia-persistenced-init.tar.bz2
tar -xf nvidia-persistenced-init.tar

# remove old, if any (to avoid masked service)
rm /etc/systemd/system/nvidia-persistenced.service

# install
chmod +x nvidia-persistenced-init/install.sh
./nvidia-persistenced-init/install.sh

# check that it's ok
systemctl status nvidia-persistenced.service
rm -rf nvidia-persistenced-init*

If you’ve come so far without any errors, you’re ready to reboot the Proxmox host. After the reboot, you should see the following outputs (GPU type/info will of course change depending on your GPU);

root@foobar:~# nvidia-smi
Sun Apr 14 20:31:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14    Driver Version: 550.54.14    CUDA Version: 12.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A2000    On   | 00000000:82:00.0 Off |                  Off |
| 30%   36C    P2    4W /  70W |       1MiB /  6138MiB |     0%       Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@foobar:~# systemctl status nvidia-persistenced.service
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-02-23 00:18:04 CET; 1h 16min ago
    Process: 9300 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced (code=exited, status=0/SUCCESS)
   Main PID: 9306 (nvidia-persiste)
      Tasks: 1 (limit: 154511)
     Memory: 512.0K
        CPU: 1.309s
     CGroup: /system.slice/nvidia-persistenced.service
             └─9306 /usr/bin/nvidia-persistenced --user nvidia-persistenced

Feb 23 00:18:03 foobar systemd[1]: Starting NVIDIA Persistence Daemon...
Feb 23 00:18:03 foobar nvidia-persistenced[9306]: Started (9306)
Feb 23 00:18:04 foobar systemd[1]: Started NVIDIA Persistence Daemon.

root@foobar:~# ls -alh /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Jan  5 11:56 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Jan  5 11:56 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Jan  5 11:56 /dev/nvidia-modeset
crw-rw-rw- 1 root root 237,   0 Jan  5 11:56 /dev/nvidia-uvm
crw-rw-rw- 1 root root 237,   1 Jan  5 11:56 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
drw-rw-rw-  2 root root     80 Jan  5 11:56 .
drwxr-xr-x 19 root root   5.0K Jan  5 11:56 ..
cr--------  1 root root 240, 1 Jan  5 11:56 nvidia-cap1
cr--r--r--  1 root root 240, 2 Jan  5 11:56 nvidia-cap2

# the below are not needed for transcoding, but for other things like rendering
# or display applications like VirtualGL
root@foobar:~# ls -alh /dev/dri
total 0
drwxr-xr-x  3 root root        120 Jan  5 11:56 .
drwxr-xr-x 19 root root       5.0K Jan  5 11:56 ..
drwxr-xr-x  2 root root        100 Jan  5 11:56 by-path
crw-rw----  1 root video  226,   0 Jan  5 11:56 card0
crw-rw----  1 root video  226,   1 Jan  5 11:56 card1
crw-rw----  1 root render 226, 128 Jan  5 11:56 renderD128

If the correct GPU shows from nvidia-smi, the persistence service runs fine, and you have at least five files under /dev/nvidia* are available, we’re ready to proceed to the LXC container.

The number of files depend on your setup; if you don’t have any /dev/nvidia-caps folder, you should be fine by adding only the five files listed above. If you also happen to have the /dev/nvidia-caps folder, you should add the two (or more) files within that as well. See here for more info.

Note that the files under /dev/dri are strictly not needed for transcoding, but would be needed for other things like rendering or display applications like VirtualGL.

LXC container

We need to add relevant LXC configuration to our container. Shut down the LXC container, and make the following changes to the LXC configuration file;

# edit /etc/pve/lxc/101.conf and add the following
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 237:* rwm
lxc.cgroup2.devices.allow: c 240:* rwm

# if you want to use the card for other things than transcoding
# add /dev/dri cgroup values as well
lxc.cgroup2.devices.allow: c 226:* rwm

# mount nvidia devices into LXC container
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps/nvidia-cap1 dev/nvidia-caps/nvidia-cap1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps/nvidia-cap2 dev/nvidia-caps/nvidia-cap2 none bind,optional,create=file

# if you want to use the card for other things than transcoding
# mount entries for files in /dev/dri should probably also be added

The numbers on the cgroup2-lines are from the fifth column in the device-lists above. Using the examples above, we would add 195, 237 and 240 as the cgroup-values. Also, in my setup the two nvidia-uvm files changes randomly between two values, while the three others remain static. I don’t know why they alternate between the different values (if you know how to make them static, please let me know), but LXC does not complain if you configure numbers that doesn’t exist (i.e. we can add all of them to make sure it works).

We can now turn on the LXC container, and we’ll be ready to install the Nvidia driver. This time we’re going to install it without the kernel drivers, and there is no need to install the kernel headers.

wget -ONVIDIA-Linux-x86_64-510.47.03.run  https://us.download.nvidia.com/XFree86/Linux-x86_64/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run
chmod +x NVIDIA-Linux-x86_64-510.47.03.run
./NVIDIA-Linux-x86_64-510.47.03.run --check
# answer "no" when it asks if it should update X config
./NVIDIA-Linux-x86_64-510.47.03.run --no-kernel-module

At this point you should be able to reboot your LXC container. Verify that the files and driver works as expected, before moving on to the Docker setup.

root@docker1:~# ls -alh /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Jan  5 11:56 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Jan  5 11:56 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Jan  5 11:56 /dev/nvidia-modeset
crw-rw-rw- 1 root root 237,   0 Jan  5 11:56 /dev/nvidia-uvm
crw-rw-rw- 1 root root 237,   1 Jan  5 11:56 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root     80 Jan  5 15:22 .
drwxr-xr-x 8 root root    640 Jan  5 15:22 ..
cr-------- 1 root root 240, 1 Jan  5 15:22 nvidia-cap1
cr--r--r-- 1 root root 240, 2 Jan  5 15:22 nvidia-cap2

root@docker1:~# nvidia-smi
Sun Apr 14 20:31:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14    Driver Version: 550.54.14    CUDA Version: 12.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A2000    Off  | 00000000:82:00.0 Off |                  Off |
| 30%   34C    P8    10W /  70W |      3MiB /  6138MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Docker container

Now we can move on to get the Docker working. We’ll be using docker-compose, and we’ll also make sure to have the latest version by removing the Debian-provided docker and docker-compose. We’ll also install the Nvidia-provided Docker runtime. Both these are relevant in terms of making the GPU available within Docker.

# remove debian-provided packages
apt remove docker-compose docker docker.io containerd runc
# install docker from official repository
apt update
apt install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install docker-ce docker-ce-cli containerd.io
# install docker-compose
curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
# install docker-compose bash completion
curl \
    -L https://raw.githubusercontent.com/docker/cli/master/contrib/completion/bash/docker \
    -o /etc/bash_completion.d/docker-compose
# install NVIDIA Container Toolkit
apt install -y curl
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt install nvidia-container-toolkit
# restart systemd + docker (if you don't reload systemd, it might not work)
systemctl daemon-reload
systemctl restart docker

We should now be able to run Docker containers with GPU support. Let’s test it.

root@docker1:~# docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
Sun Apr 14 20:31:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14    Driver Version: 550.54.14    CUDA Version: 12.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A2000    Off  | 00000000:82:00.0 Off |                  Off |
| 30%   29C    P8     4W /  70W |      1MiB /  6138MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@docker1:~# cat docker-compose.yml
version: '3.7'
services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

root@docker1:~# docker-compose up
Starting test_test_1 ... done
Attaching to test_test_1
test_1  | 2022-02-22 22:49:00.691229: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
test_1  | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
test_1  | 2022-02-22 22:49:02.119628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 4141 MB memory:  -> device: 0, name: NVIDIA RTX A2000, pci bus id: 0000:82:00.0, compute capability: 8.6
test_test_1 exited with code 0

ay! It’s working!

Keep in mind that I’ve experienced issues where tensorflow complains about the “kernel version not matching the DSO version” (please see more information here). If this happens to you, please try a different tensorflow-tag and/or different driver version (so that the kernel and DSO version matches).

Let’s add the final pieces together for a fully working Plex docker-compose.yml.

version: '3.7'

services:
  plex:
    container_name: plex
    hostname: plex
    image: linuxserver/plex:latest
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    environment:
      TZ: Europe/Paris
      PUID: 0
      PGID: 0
      VERSION: latest
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,video,utility
    network_mode: host
    volumes:
      - /srv/config/plex:/config
      - /storage/media:/data/media
      - /storage/temp/plex/transcode:/transcode
      - /storage/temp/plex/tmp:/tmp

And it’s working! Woho!

If you have a consumer-grade GPU, you might also want to have a look at nvidia-patch, a toolkit that removes the restriction on maximum number of simultaneous NVENC video encoding session that is imposed by NVIDIA. Essentially this could potentially unlock more parallel transcodings that Plex can do.

Upgrading

Whenever you upgrade the kernel, you need to re-install the driver on the Proxmox host. If you want to run the same NVIDIA driver version, the process i simple; just re-run the original driver install. There should be no need to do anything in the LXC container (as the version stays the same, and no kernel modules are involved).

# answer "no" when it asks if you want to install 32bit compatibility drivers
# answer "no" when it asks if it should update X config
./NVIDIA-Linux-x86_64-550.54.14.run
reboot

If you want to upgrade the NVIDIA driver, there are a few extra steps. If you already have a working NVIDIA driver (i.e. you did not just update the kernel), you have to uninstall the old NVIDIA driver first (else it will complain that the kernel module is loaded, and it will instantly load the module again if you attempt to unload it).

# uninstall old driver to avoid kernel modules being loaded
# this step can be skipped if driver is broken after kernel update
./NVIDIA-Linux-x86_64-550.54.14.run --uninstall
reboot

# if you upgraded kernel, we need to download new headers
# older Proxmox versions might need to use "pve-headers-*" rather than "proxmox-headers-*"
apt install proxmox-headers-$(uname -r)

# install new version, 550.54.14 is the latest as of writing this
# (installer will ask to uninstall the old version if you could skip the manual uninstall)
wget -O NVIDIA-Linux-x86_64-550.54.14.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.54.14/NVIDIA-Linux-x86_64-550.54.14.run

chmod +x NVIDIA-Linux-x86_64-550.54.14.run
./NVIDIA-Linux-x86_64-550.54.14.run --check
# answer "no" when it asks if you want to install 32bit compatibility drivers
# answer "no" when it asks if it should update X config
./NVIDIA-Linux-x86_64-550.54.14.run
reboot

# new driver should now be installed and working
root@foobar:~# nvidia-smi 
Sun Apr 14 20:31:05 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14    Driver Version: 550.54.14    CUDA Version: 12.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A2000    On   | 00000000:82:00.0 Off |                  Off |
| 30%   32C    P8     4W /  70W |      1MiB /  6138MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Please also check the cgroup-numbers if they have changed. I’ve experienced that they can change between distro upgrades (especially major versions, i.e. going from Debian 11 to Debian 12). If they have changed, update the LXC configuration file accordingly (see installation section of this guide).

We must now upgrade the driver in the LXC container, as they need to be the same version;

# download new version
wget -O NVIDIA-Linux-x86_64-550.54.14.run https://us.download.nvidia.com/XFree86/Linux-x86_64/550.54.14/NVIDIA-Linux-x86_64-550.54.14.run

chmod +x NVIDIA-Linux-x86_64-550.54.14.run
./NVIDIA-Linux-x86_64-550.54.14.run --check
# answer "no" when it asks if you want to install 32bit compability drivers
# answer "no" when it asks if it should update X config
./NVIDIA-Linux-x86_64-550.54.14.run --no-kernel-module

root@docker1:~# nvidia-smi
Sun Apr 14 20:31:05 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14    Driver Version: 550.54.14    CUDA Version: 12.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A2000    Off  | 00000000:82:00.0 Off |                  Off |
| 30%   30C    P8     4W /  70W |      1MiB /  6138MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# update nvidia container toolkit repo + update
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update
apt install nvidia-container-toolkit
apt upgrade

Reboot the LXC container, and things should work with the new driver.