From cb647c981acbc25054362fe9ba3f891851a1bf47 Mon Sep 17 00:00:00 2001 From: Rin Date: Tue, 29 Nov 2022 16:44:11 +1100 Subject: [PATCH] Add second rootless post --- content/posts/rootless-containers-2.md | 279 +++++++++++++++++++++++++ 1 file changed, 279 insertions(+) create mode 100644 content/posts/rootless-containers-2.md diff --git a/content/posts/rootless-containers-2.md b/content/posts/rootless-containers-2.md new file mode 100644 index 0000000..613aa69 --- /dev/null +++ b/content/posts/rootless-containers-2.md @@ -0,0 +1,279 @@ +--- +title: "The Redemption of Slirp and Snapshotter" +date: 2022-11-29T15:15:15+11:00 +draft: false +showSummary: true +summary: "Part two in our journey of getting rootless containers working on Alpine. Join us as we fix slirp and our \ +snapshotter." +series: + - "Rootless Containers on Alpine Linux" +series_order: 2 +--- + + +# Part Two: Fixing Things + +## The Story So Far +> **(Ashe)** So where were we? +> +> **(Tammy)** We're illiterate, and so forth? +> +> **(Ashe)** Right, yes. + +Last time, we did a bunch of prep-work for rootless containers on Alpine, but got stuck with `slirp4netns` not working +with BusyBox's `ip` program, and `rootlesskit` not supporting the devmapper snapshotter. So where are we now? + +## Fixing Slirp4netns +On the former, you'll hopefully recall that we +[raised an issue](https://github.com/rootless-containers/slirp4netns/issues/304) with the slirp devs. + +They suggested we install a non-busybox version of iproute2 (which Alpine provides in the `iproute2` package), and +this neatly solved the issue! `rootlesskit` will now happily start with `--net=slirp4netns`. Well that was remarkably +painless. We're kinda annoyed we didn't think of it ourselves. + +## Picking a supported snapshotter + +With that done, let's look at some snapshotters. The +[list of supported snapshotters](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md#snapshotters) gives +us a few options. + +Of the presented options, overlayfs is the default, and will happily run on our Alpine instance (which is currently +running kernel 5.15). + +To get that working, we just need to remove our devmapper config, as overlayfs is the default. Let's go ahead and +disable the devmapper snapshotter while we're at it, since we know it won't work in a rootless context. + +For those following along, our `config.toml` currently looks like: +```toml +version = 2 +root = "/home/tammy/.local/share/containerd" +state = "/tmp/1000-runtime-dir/containerd" + +disabled_plugins = ["io.containerd.grpc.v1.cri", "io.containerd.snapshotter.v1.devmapper"] + +[plugins] + +[grpc] + address = "/tmp/1000-runtime-dir/containerd/containerd.sock" +``` + +With that done, we finally have containerd running. Our final command is: +```sh +rootlesskit --net=slirp4netns --copy-up=/etc --copy-up=/run \ +--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \ +sh -c "rm -f /run/containerd; exec containerd -c config.toml" +``` + +### Cleaning up devmapper + +> **(Doll)** Oh! Miss! What about the devmapper partition we created? +> +> **(Ashe)** Good catch, Doll. Let's fix that up, too. + + +#### Out with the old, in with the new + +[If you recall]({{< ref "rootless-containers-alpine/#creating-our-nerdctl-thin-pool" >}}), we created an +{{}}LV{{< /hover >}} called `scratch` for our devmapper setup. +Since we're no longer using that, we can safely delete that LV. Let's do that with `doas lvremove /dev/data/scratch`. + +We'll still need somewhere to put container images and non-peristent data, so let's recreate scratch as +a normal partition. We can do this with `doas lvcreate -n scratch -l 100%FREE data`, since we know we've consumed the +rest of the Volume Group. If you wanted a particular size, you could use `--size` instead of `-l`. + +#### Formatting the new LV + +Next up, let's format the LV as ext4 with `doas mkfs.ext4 /dev/data/scratch`: +```sh +mke2fs 1.46.5 (30-Dec-2021) +Discarding device blocks: done +Creating filesystem with 7863296 4k blocks and 1966080 inodes +Filesystem UUID: b6686e80-0eae-4316-b1ac-e8544be2cd87 +Superblock backups stored on blocks: + 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, + 4096000 + +Allocating group tables: done +Writing inode tables: done +Creating journal (32768 blocks): done +Writing superblocks and filesystem accounting information: done +``` + +Looks good. + +## Getting rootless containerd to start automatically + +As we noted last time, Alpine Linux uses OpenRC, which uses plain old shell scripts +for service automation. So getting containerd running rootless should be relatively painless. +The [Service Script Guide](https://github.com/OpenRC/openrc/blob/master/service-script-guide.md) tells use what +we need to do. First, we'll need to define `command`, `command_args`, and `pidfile`. Sure. Easy enough. +```sh +command="rootlesskit" +command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \ + --state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \ + sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\"" +pidfile="/run/${RC_SVCNAME}.pid" +``` +> **(Tammy)** Huh. Interesting config path. +> +> **(Ashe)** Mm. OpenRC will automatically source any files of matching name `/etc/conf.d/`, but we don't want that here, +> since the config file isn't a shell script. Instead, we'll create a folder of matching name, and stash the config file +> there. +> +> **(Octavia)** I'm still deeply annoyed that containerd uses toml instead of YAML, or something not written by assholes. +> +> **(Ashe)** Even pedigree aside, it's way clunkier than YAML too. Sucks on all fronts. +> +> **(Selene)** This may be one of those little things we could work on. +> +> Ashe sighs. +> +> **(Ashe)** Perhaps. The backlog continues to grow. + +We also need to tell {{< hover "Stop-Start Daemon" >}}ssd{{< /hover >}} what our service depends on: +```sh +depend () { + use net dns + need cgroups sysctl +} +``` + +We'll also need to tell the start-stop deamon that we want the process started as a non-root user, +and that rootless kit doesn't automatically background itself. + +```sh +command_user="tammy:tammy" +command_background=true +``` + +Finally, let's add some documentation and niceness: +```sh +name="rootless containerd $SVCNAME" +describe () { + echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts." +} +``` + +Our (semi-)final script looks like: +```sh +#!/sbin/openrc-run +depend() { + use net dns + need cgroups sysctl +} + + +describe () { + echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts." +} + +name="rootless containerd $SVCNAME" +command="rootlesskit" +command_user="tammy:tammy" +command_background=true +command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \ + --state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \ + sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\"" +pidfile="/run/${RC_SVCNAME}.pid" +``` + +Our final step is to place the script somewhere where OpenRC can find it. +```sh +doas chown root:root rootless-containerd +doas chmod 0755 rootless-containerd +doas mv rootless-containerd /etc/init.d/ +``` + +We can now `doas rc-service rootless-containerd start` and... it starts! + +## Finally Testing nerdctl + +Okay. Finally here. Let's try something simple like `nerdctl run -it --rm alpine`: + +```sh +FATA[0000] rootless containerd not running? (hint: use `containerd-rootless-setuptool.sh install` to start rootless containerd): stat /tmp/1000-runtime-dir/containerd-rootless: no such file or directory +``` + +Wat. + +OH. + +```sh +~ ❯ ls /tmp/1000-runtime-dir +containerd rootlesskit-containerd +``` + +`nerdctl` expects the folder to be called something different. That's fine. Let's stop the rootless containerd service +with `doas rc-service rootless-containerd stop`, and then we can just modify the `command_args` of our service script: +```sh +command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \ + --state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \ + sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\"" +``` + +And then `doas rc-service rootless-containerd start`. + +Let's try that `nerdctl run -it --rm alpine` again. + +```sh +~ ❯ nerdctl run -it --rm alpine +:applet not found +``` + +I. What. What applet. + +Okay let's try this the manual way: + +```sh +~ ❯ nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid) with tammy@tammy at 13:15:41 + / ❯ export CONTAINERD_ADDRESS=/tmp/1000-runtime-dir/containerd/containerd.sock with root@tammy at 13:16:16 + / ❯ export CONTAINERD_SNAPSHOTTER=overlayfs with root@tammy at 13:16:43 + / ❯ ctr images pull docker.io/library/alpine:latest with root@tammy at 13:16:48 +docker.io/library/alpine:latest: resolved |++++++++++++++++++++++++++++++++++++++| +index-sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4: done |++++++++++++++++++++++++++++++++++++++| +manifest-sha256:c0d488a800e4127c334ad20d61d7bc21b4097540327217dfab52262adc02380c: done |++++++++++++++++++++++++++++++++++++++| +layer-sha256:c158987b05517b6f2c5913f3acef1f2182a32345a304fe357e3ace5fadcad715: done |++++++++++++++++++++++++++++++++++++++| +config-sha256:49176f190c7e9cdb51ac85ab6c6d5e4512352218190cd69b08e6fd803ffbf3da: done |++++++++++++++++++++++++++++++++++++++| +elapsed: 4.9 s total: 3.2 Mi (671.5 KiB/s) +unpacking linux/amd64 sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4... +done: 177.071479ms + / ❯ ctr run -t --rm --fifo-dir /tmp/foo-fifo --cgroup "" docker.io/library/alpine:latest foo with root@tammy at 13:17:58 +ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: +unable to apply cgroup configuration: rootless needs no limits + no cgrouppath when no permission is granted for cgroups +mkdir /sys/fs/cgroup/foo: permission denied: unknown + / ❯ with root@tammy at 13:18:02 +``` + +Huh. Alright. We recently found out that cgroups might only work with systemd, and it seems that's the case. + +### Some Bad News + +A few hours of pain later, here's what I have. + +Attempting to run containerd as our user, and then manually using `nsenter` does not work: +```sh +~ ❯ nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid) +nsenter: setns(): can't reassociate to namespace 'net': Operation not permitted +``` +If we run it via `rc-service`, it works. Likely something to do with the exact CLI options we're using. Won't think too +hard about this one. + +From there, nerdctl breaks with an applet error, and manually `nsenter`-ing the daemon and attempting to `ctr run` breaks +because writing the cgroup file gets a permission denied. This occurs even with cgroups disabled. + +I'm truly out of my depth here. I knew going in that this was an unsupported configuration, since most rootless +implementations rely on systemd, but I wanted to try it anyway. + +In my digging, I did find a guide on [rootless Docker](https://virtualzone.de/posts/alpine-docker-rootless/), and while +we really don't want to use Docker, we did try using `containerd-rootless.sh`, but this results in the same errors. + +For now, we're going to have to put this aside, since we really need to get to actually migrating our workloads. We'll +keep digging in the background and see if we can discover anything, and ask some questions on the #containerd-dev Slack +channel. + +## More Tea Please + +With that, I heave a sigh while Doll is having us make some tea to calm down. Hope we'll have more for you all soon. + +Excuse us while we turn into a jellyfish and swim away. +