snek-tech-blog/content/posts/rootless-containers-2.md

12 KiB
Raw Blame History

title date draft showSummary summary series series_order
The Redemption of Slirp and Snapshotter 2022-11-29T15:15:15+11:00 false true Part two in our journey of getting rootless containers working on Alpine. Join us as we fix slirp and our snapshotter.
Rootless Containers on Alpine Linux
2

Part Two: Fixing Things

The Story So Far

(Ashe) So where were we?

(Tammy) We're illiterate, and so forth?

(Ashe) Right, yes.

Last time, we did a bunch of prep-work for rootless containers on Alpine, but got stuck with slirp4netns not working with BusyBox's ip program, and rootlesskit not supporting the devmapper snapshotter. So where are we now?

Fixing Slirp4netns

On the former, you'll hopefully recall that we raised an issue with the slirp devs.

They suggested we install a non-busybox version of iproute2 (which Alpine provides in the iproute2 package), and this neatly solved the issue! rootlesskit will now happily start with --net=slirp4netns. Well that was remarkably painless. We're kinda annoyed we didn't think of it ourselves.

Picking a supported snapshotter

With that done, let's look at some snapshotters. The list of supported snapshotters gives us a few options.

Of the presented options, overlayfs is the default, and will happily run on our Alpine instance (which is currently running kernel 5.15).

To get that working, we just need to remove our devmapper config, as overlayfs is the default. Let's go ahead and disable the devmapper snapshotter while we're at it, since we know it won't work in a rootless context.

For those following along, our config.toml currently looks like:

version = 2
root = "/home/tammy/.local/share/containerd"
state = "/tmp/1000-runtime-dir/containerd"

disabled_plugins = ["io.containerd.grpc.v1.cri", "io.containerd.snapshotter.v1.devmapper"]

[plugins]

[grpc]
  address = "/tmp/1000-runtime-dir/containerd/containerd.sock"

With that done, we finally have containerd running. Our final command is:

rootlesskit --net=slirp4netns --copy-up=/etc --copy-up=/run \
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
sh -c "rm -f /run/containerd; exec containerd -c config.toml"

Cleaning up devmapper

(Doll) Oh! Miss! What about the devmapper partition we created?

(Ashe) Good catch, Doll. Let's fix that up, too.

Out with the old, in with the new

[If you recall]({{< ref "rootless-containers-alpine/#creating-our-nerdctl-thin-pool" >}}), we created an {{<hover "Logical Volume" >}}LV{{< /hover >}} called scratch for our devmapper setup. Since we're no longer using that, we can safely delete that LV. Let's do that with doas lvremove /dev/data/scratch.

We'll still need somewhere to put container images and non-peristent data, so let's recreate scratch as a normal partition. We can do this with doas lvcreate -n scratch -l 100%FREE data, since we know we've consumed the rest of the Volume Group. If you wanted a particular size, you could use --size instead of -l.

Formatting the new LV

Next up, let's format the LV as ext4 with doas mkfs.ext4 /dev/data/scratch:

mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 7863296 4k blocks and 1966080 inodes
Filesystem UUID: b6686e80-0eae-4316-b1ac-e8544be2cd87
Superblock backups stored on blocks:
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
	4096000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Looks good.

Getting rootless containerd to start automatically

As we noted last time, Alpine Linux uses OpenRC, which uses plain old shell scripts for service automation. So getting containerd running rootless should be relatively painless. The Service Script Guide tells use what we need to do. First, we'll need to define command, command_args, and pidfile. Sure. Easy enough.

command="rootlesskit"                                                                                                                                       
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \   
  --state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \                                     
  sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""                                                       
pidfile="/run/${RC_SVCNAME}.pid"                                                  

(Tammy) Huh. Interesting config path.

(Ashe) Mm. OpenRC will automatically source any files of matching name /etc/conf.d/, but we don't want that here, since the config file isn't a shell script. Instead, we'll create a folder of matching name, and stash the config file there.

(Octavia) I'm still deeply annoyed that containerd uses toml instead of YAML, or something not written by assholes.

(Ashe) Even pedigree aside, it's way clunkier than YAML too. Sucks on all fronts.

(Selene) This may be one of those little things we could work on.

Ashe sighs.

(Ashe) Perhaps. The backlog continues to grow.

We also need to tell {{< hover "Stop-Start Daemon" >}}ssd{{< /hover >}} what our service depends on:

depend () {
  use net dns
  need cgroups sysctl
}

We'll also need to tell the start-stop deamon that we want the process started as a non-root user, and that rootless kit doesn't automatically background itself.

command_user="tammy:tammy"
command_background=true

Finally, let's add some documentation and niceness:

name="rootless containerd $SVCNAME"
describe () {
	echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts."
}

Our (semi-)final script looks like:

#!/sbin/openrc-run
depend() {
  use net dns
  need cgroups sysctl
}


describe () {
  echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts."
}

name="rootless containerd $SVCNAME"
command="rootlesskit"
command_user="tammy:tammy"
command_background=true
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
  --state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \
  sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
pidfile="/run/${RC_SVCNAME}.pid"

Our final step is to place the script somewhere where OpenRC can find it.

doas chown root:root rootless-containerd
doas chmod 0755 rootless-containerd
doas mv rootless-containerd /etc/init.d/

We can now doas rc-service rootless-containerd start and... it starts!

Finally Testing nerdctl

Okay. Finally here. Let's try something simple like nerdctl run -it --rm alpine:

FATA[0000] rootless containerd not running? (hint: use `containerd-rootless-setuptool.sh install` to start rootless containerd): stat /tmp/1000-runtime-dir/containerd-rootless: no such file or directory

Wat.

OH.

~  ls /tmp/1000-runtime-dir
containerd              rootlesskit-containerd

nerdctl expects the folder to be called something different. That's fine. Let's stop the rootless containerd service with doas rc-service rootless-containerd stop, and then we can just modify the command_args of our service script:

command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
  --state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \
  sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""

And then doas rc-service rootless-containerd start.

Let's try that nerdctl run -it --rm alpine again.

~  nerdctl run -it --rm alpine
:applet not found

I. What. What applet.

Okay let's try this the manual way:

~  nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid)      with tammy@tammy at 13:15:41
 /   export CONTAINERD_ADDRESS=/tmp/1000-runtime-dir/containerd/containerd.sock                                with root@tammy at 13:16:16
 /  export CONTAINERD_SNAPSHOTTER=overlayfs                                                                   with root@tammy at 13:16:43
 /  ctr images pull docker.io/library/alpine:latest                                                           with root@tammy at 13:16:48
docker.io/library/alpine:latest:                                                  resolved       |++++++++++++++++++++++++++++++++++++++| 
index-sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4:    done           |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:c0d488a800e4127c334ad20d61d7bc21b4097540327217dfab52262adc02380c: done           |++++++++++++++++++++++++++++++++++++++| 
layer-sha256:c158987b05517b6f2c5913f3acef1f2182a32345a304fe357e3ace5fadcad715:    done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:49176f190c7e9cdb51ac85ab6c6d5e4512352218190cd69b08e6fd803ffbf3da:   done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 4.9 s                                                                    total:  3.2 Mi (671.5 KiB/s)                                     
unpacking linux/amd64 sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4...
done: 177.071479ms	
 /  ctr run -t --rm --fifo-dir /tmp/foo-fifo --cgroup "" docker.io/library/alpine:latest foo                  with root@tammy at 13:17:58
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process:
unable to apply cgroup configuration: rootless needs no limits + no cgrouppath when no permission is granted for cgroups
mkdir /sys/fs/cgroup/foo: permission denied: unknown
 /                                                                                                            with root@tammy at 13:18:02

Huh. Alright. We recently found out that cgroups might only work with systemd, and it seems that's the case.

Some Bad News

A few hours of pain later, here's what I have.

Attempting to run containerd as our user, and then manually using nsenter does not work:

~  nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid)
nsenter: setns(): can't reassociate to namespace 'net': Operation not permitted

If we run it via rc-service, it works. Likely something to do with the exact CLI options we're using. Won't think too hard about this one.

From there, nerdctl breaks with an applet error, and manually nsenter-ing the daemon and attempting to ctr run breaks because writing the cgroup file gets a permission denied. This occurs even with cgroups disabled.

I'm truly out of my depth here. I knew going in that this was an unsupported configuration, since most rootless implementations rely on systemd, but I wanted to try it anyway.

In my digging, I did find a guide on rootless Docker, and while we really don't want to use Docker, we did try using containerd-rootless.sh, but this results in the same errors.

For now, we're going to have to put this aside, since we really need to get to actually migrating our workloads. We'll keep digging in the background and see if we can discover anything, and ask some questions on the #containerd-dev Slack channel.

More Tea Please

With that, I heave a sigh while Doll is having us make some tea to calm down. Hope we'll have more for you all soon.

Excuse us while we turn into a jellyfish and swim away.