snek-tech-blog/content/posts/rootless-containers-2.md

280 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
title: "The Redemption of Slirp and Snapshotter"
date: 2022-11-29T15:15:15+11:00
draft: false
showSummary: true
summary: "Part two in our journey of getting rootless containers working on Alpine. Join us as we fix slirp and our \
snapshotter."
series:
- "Rootless Containers on Alpine Linux"
series_order: 2
---
# Part Two: Fixing Things
## The Story So Far
> **(Ashe)** So where were we?
>
> **(Tammy)** We're illiterate, and so forth?
>
> **(Ashe)** Right, yes.
Last time, we did a bunch of prep-work for rootless containers on Alpine, but got stuck with `slirp4netns` not working
with BusyBox's `ip` program, and `rootlesskit` not supporting the devmapper snapshotter. So where are we now?
## Fixing Slirp4netns
On the former, you'll hopefully recall that we
[raised an issue](https://github.com/rootless-containers/slirp4netns/issues/304) with the slirp devs.
They suggested we install a non-busybox version of iproute2 (which Alpine provides in the `iproute2` package), and
this neatly solved the issue! `rootlesskit` will now happily start with `--net=slirp4netns`. Well that was remarkably
painless. We're kinda annoyed we didn't think of it ourselves.
## Picking a supported snapshotter
With that done, let's look at some snapshotters. The
[list of supported snapshotters](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md#snapshotters) gives
us a few options.
Of the presented options, overlayfs is the default, and will happily run on our Alpine instance (which is currently
running kernel 5.15).
To get that working, we just need to remove our devmapper config, as overlayfs is the default. Let's go ahead and
disable the devmapper snapshotter while we're at it, since we know it won't work in a rootless context.
For those following along, our `config.toml` currently looks like:
```toml
version = 2
root = "/home/tammy/.local/share/containerd"
state = "/tmp/1000-runtime-dir/containerd"
disabled_plugins = ["io.containerd.grpc.v1.cri", "io.containerd.snapshotter.v1.devmapper"]
[plugins]
[grpc]
address = "/tmp/1000-runtime-dir/containerd/containerd.sock"
```
With that done, we finally have containerd running. Our final command is:
```sh
rootlesskit --net=slirp4netns --copy-up=/etc --copy-up=/run \
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
sh -c "rm -f /run/containerd; exec containerd -c config.toml"
```
### Cleaning up devmapper
> **(Doll)** Oh! Miss! What about the devmapper partition we created?
>
> **(Ashe)** Good catch, Doll. Let's fix that up, too.
#### Out with the old, in with the new
[If you recall]({{< ref "rootless-containers-alpine/#creating-our-nerdctl-thin-pool" >}}), we created an
{{<hover "Logical Volume" >}}LV{{< /hover >}} called `scratch` for our devmapper setup.
Since we're no longer using that, we can safely delete that LV. Let's do that with `doas lvremove /dev/data/scratch`.
We'll still need somewhere to put container images and non-peristent data, so let's recreate scratch as
a normal partition. We can do this with `doas lvcreate -n scratch -l 100%FREE data`, since we know we've consumed the
rest of the Volume Group. If you wanted a particular size, you could use `--size` instead of `-l`.
#### Formatting the new LV
Next up, let's format the LV as ext4 with `doas mkfs.ext4 /dev/data/scratch`:
```sh
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 7863296 4k blocks and 1966080 inodes
Filesystem UUID: b6686e80-0eae-4316-b1ac-e8544be2cd87
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
```
Looks good.
## Getting rootless containerd to start automatically
As we noted last time, Alpine Linux uses OpenRC, which uses plain old shell scripts
for service automation. So getting containerd running rootless should be relatively painless.
The [Service Script Guide](https://github.com/OpenRC/openrc/blob/master/service-script-guide.md) tells use what
we need to do. First, we'll need to define `command`, `command_args`, and `pidfile`. Sure. Easy enough.
```sh
command="rootlesskit"
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
pidfile="/run/${RC_SVCNAME}.pid"
```
> **(Tammy)** Huh. Interesting config path.
>
> **(Ashe)** Mm. OpenRC will automatically source any files of matching name `/etc/conf.d/`, but we don't want that here,
> since the config file isn't a shell script. Instead, we'll create a folder of matching name, and stash the config file
> there.
>
> **(Octavia)** I'm still deeply annoyed that containerd uses toml instead of YAML, or something not written by assholes.
>
> **(Ashe)** Even pedigree aside, it's way clunkier than YAML too. Sucks on all fronts.
>
> **(Selene)** This may be one of those little things we could work on.
>
> Ashe sighs.
>
> **(Ashe)** Perhaps. The backlog continues to grow.
We also need to tell {{< hover "Stop-Start Daemon" >}}ssd{{< /hover >}} what our service depends on:
```sh
depend () {
use net dns
need cgroups sysctl
}
```
We'll also need to tell the start-stop deamon that we want the process started as a non-root user,
and that rootless kit doesn't automatically background itself.
```sh
command_user="tammy:tammy"
command_background=true
```
Finally, let's add some documentation and niceness:
```sh
name="rootless containerd $SVCNAME"
describe () {
echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts."
}
```
Our (semi-)final script looks like:
```sh
#!/sbin/openrc-run
depend() {
use net dns
need cgroups sysctl
}
describe () {
echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts."
}
name="rootless containerd $SVCNAME"
command="rootlesskit"
command_user="tammy:tammy"
command_background=true
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
--state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \
sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
pidfile="/run/${RC_SVCNAME}.pid"
```
Our final step is to place the script somewhere where OpenRC can find it.
```sh
doas chown root:root rootless-containerd
doas chmod 0755 rootless-containerd
doas mv rootless-containerd /etc/init.d/
```
We can now `doas rc-service rootless-containerd start` and... it starts!
## Finally Testing nerdctl
Okay. Finally here. Let's try something simple like `nerdctl run -it --rm alpine`:
```sh
FATA[0000] rootless containerd not running? (hint: use `containerd-rootless-setuptool.sh install` to start rootless containerd): stat /tmp/1000-runtime-dir/containerd-rootless: no such file or directory
```
Wat.
OH.
```sh
~ ls /tmp/1000-runtime-dir
containerd rootlesskit-containerd
```
`nerdctl` expects the folder to be called something different. That's fine. Let's stop the rootless containerd service
with `doas rc-service rootless-containerd stop`, and then we can just modify the `command_args` of our service script:
```sh
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
--state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \
sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
```
And then `doas rc-service rootless-containerd start`.
Let's try that `nerdctl run -it --rm alpine` again.
```sh
~ nerdctl run -it --rm alpine
:applet not found
```
I. What. What applet.
Okay let's try this the manual way:
```sh
~ nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid) with tammy@tammy at 13:15:41
 / export CONTAINERD_ADDRESS=/tmp/1000-runtime-dir/containerd/containerd.sock with root@tammy at 13:16:16
 / export CONTAINERD_SNAPSHOTTER=overlayfs with root@tammy at 13:16:43
 / ctr images pull docker.io/library/alpine:latest with root@tammy at 13:16:48
docker.io/library/alpine:latest: resolved |++++++++++++++++++++++++++++++++++++++|
index-sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4: done |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:c0d488a800e4127c334ad20d61d7bc21b4097540327217dfab52262adc02380c: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c158987b05517b6f2c5913f3acef1f2182a32345a304fe357e3ace5fadcad715: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:49176f190c7e9cdb51ac85ab6c6d5e4512352218190cd69b08e6fd803ffbf3da: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 4.9 s total: 3.2 Mi (671.5 KiB/s)
unpacking linux/amd64 sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4...
done: 177.071479ms
 / ctr run -t --rm --fifo-dir /tmp/foo-fifo --cgroup "" docker.io/library/alpine:latest foo with root@tammy at 13:17:58
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process:
unable to apply cgroup configuration: rootless needs no limits + no cgrouppath when no permission is granted for cgroups
mkdir /sys/fs/cgroup/foo: permission denied: unknown
 / with root@tammy at 13:18:02
```
Huh. Alright. We recently found out that cgroups might only work with systemd, and it seems that's the case.
### Some Bad News
A few hours of pain later, here's what I have.
Attempting to run containerd as our user, and then manually using `nsenter` does not work:
```sh
~ nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid)
nsenter: setns(): can't reassociate to namespace 'net': Operation not permitted
```
If we run it via `rc-service`, it works. Likely something to do with the exact CLI options we're using. Won't think too
hard about this one.
From there, nerdctl breaks with an applet error, and manually `nsenter`-ing the daemon and attempting to `ctr run` breaks
because writing the cgroup file gets a permission denied. This occurs even with cgroups disabled.
I'm truly out of my depth here. I knew going in that this was an unsupported configuration, since most rootless
implementations rely on systemd, but I wanted to try it anyway.
In my digging, I did find a guide on [rootless Docker](https://virtualzone.de/posts/alpine-docker-rootless/), and while
we really don't want to use Docker, we did try using `containerd-rootless.sh`, but this results in the same errors.
For now, we're going to have to put this aside, since we really need to get to actually migrating our workloads. We'll
keep digging in the background and see if we can discover anything, and ask some questions on the #containerd-dev Slack
channel.
## More Tea Please
With that, I heave a sigh while Doll is having us make some tea to calm down. Hope we'll have more for you all soon.
Excuse us while we turn into a jellyfish and swim away.