Add second rootless post
parent
53f21b323a
commit
cb647c981a
|
@ -0,0 +1,279 @@
|
||||||
|
---
|
||||||
|
title: "The Redemption of Slirp and Snapshotter"
|
||||||
|
date: 2022-11-29T15:15:15+11:00
|
||||||
|
draft: false
|
||||||
|
showSummary: true
|
||||||
|
summary: "Part two in our journey of getting rootless containers working on Alpine. Join us as we fix slirp and our \
|
||||||
|
snapshotter."
|
||||||
|
series:
|
||||||
|
- "Rootless Containers on Alpine Linux"
|
||||||
|
series_order: 2
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
# Part Two: Fixing Things
|
||||||
|
|
||||||
|
## The Story So Far
|
||||||
|
> **(Ashe)** So where were we?
|
||||||
|
>
|
||||||
|
> **(Tammy)** We're illiterate, and so forth?
|
||||||
|
>
|
||||||
|
> **(Ashe)** Right, yes.
|
||||||
|
|
||||||
|
Last time, we did a bunch of prep-work for rootless containers on Alpine, but got stuck with `slirp4netns` not working
|
||||||
|
with BusyBox's `ip` program, and `rootlesskit` not supporting the devmapper snapshotter. So where are we now?
|
||||||
|
|
||||||
|
## Fixing Slirp4netns
|
||||||
|
On the former, you'll hopefully recall that we
|
||||||
|
[raised an issue](https://github.com/rootless-containers/slirp4netns/issues/304) with the slirp devs.
|
||||||
|
|
||||||
|
They suggested we install a non-busybox version of iproute2 (which Alpine provides in the `iproute2` package), and
|
||||||
|
this neatly solved the issue! `rootlesskit` will now happily start with `--net=slirp4netns`. Well that was remarkably
|
||||||
|
painless. We're kinda annoyed we didn't think of it ourselves.
|
||||||
|
|
||||||
|
## Picking a supported snapshotter
|
||||||
|
|
||||||
|
With that done, let's look at some snapshotters. The
|
||||||
|
[list of supported snapshotters](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md#snapshotters) gives
|
||||||
|
us a few options.
|
||||||
|
|
||||||
|
Of the presented options, overlayfs is the default, and will happily run on our Alpine instance (which is currently
|
||||||
|
running kernel 5.15).
|
||||||
|
|
||||||
|
To get that working, we just need to remove our devmapper config, as overlayfs is the default. Let's go ahead and
|
||||||
|
disable the devmapper snapshotter while we're at it, since we know it won't work in a rootless context.
|
||||||
|
|
||||||
|
For those following along, our `config.toml` currently looks like:
|
||||||
|
```toml
|
||||||
|
version = 2
|
||||||
|
root = "/home/tammy/.local/share/containerd"
|
||||||
|
state = "/tmp/1000-runtime-dir/containerd"
|
||||||
|
|
||||||
|
disabled_plugins = ["io.containerd.grpc.v1.cri", "io.containerd.snapshotter.v1.devmapper"]
|
||||||
|
|
||||||
|
[plugins]
|
||||||
|
|
||||||
|
[grpc]
|
||||||
|
address = "/tmp/1000-runtime-dir/containerd/containerd.sock"
|
||||||
|
```
|
||||||
|
|
||||||
|
With that done, we finally have containerd running. Our final command is:
|
||||||
|
```sh
|
||||||
|
rootlesskit --net=slirp4netns --copy-up=/etc --copy-up=/run \
|
||||||
|
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
|
||||||
|
sh -c "rm -f /run/containerd; exec containerd -c config.toml"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cleaning up devmapper
|
||||||
|
|
||||||
|
> **(Doll)** Oh! Miss! What about the devmapper partition we created?
|
||||||
|
>
|
||||||
|
> **(Ashe)** Good catch, Doll. Let's fix that up, too.
|
||||||
|
|
||||||
|
|
||||||
|
#### Out with the old, in with the new
|
||||||
|
|
||||||
|
[If you recall]({{< ref "rootless-containers-alpine/#creating-our-nerdctl-thin-pool" >}}), we created an
|
||||||
|
{{<hover "Logical Volume" >}}LV{{< /hover >}} called `scratch` for our devmapper setup.
|
||||||
|
Since we're no longer using that, we can safely delete that LV. Let's do that with `doas lvremove /dev/data/scratch`.
|
||||||
|
|
||||||
|
We'll still need somewhere to put container images and non-peristent data, so let's recreate scratch as
|
||||||
|
a normal partition. We can do this with `doas lvcreate -n scratch -l 100%FREE data`, since we know we've consumed the
|
||||||
|
rest of the Volume Group. If you wanted a particular size, you could use `--size` instead of `-l`.
|
||||||
|
|
||||||
|
#### Formatting the new LV
|
||||||
|
|
||||||
|
Next up, let's format the LV as ext4 with `doas mkfs.ext4 /dev/data/scratch`:
|
||||||
|
```sh
|
||||||
|
mke2fs 1.46.5 (30-Dec-2021)
|
||||||
|
Discarding device blocks: done
|
||||||
|
Creating filesystem with 7863296 4k blocks and 1966080 inodes
|
||||||
|
Filesystem UUID: b6686e80-0eae-4316-b1ac-e8544be2cd87
|
||||||
|
Superblock backups stored on blocks:
|
||||||
|
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
|
||||||
|
4096000
|
||||||
|
|
||||||
|
Allocating group tables: done
|
||||||
|
Writing inode tables: done
|
||||||
|
Creating journal (32768 blocks): done
|
||||||
|
Writing superblocks and filesystem accounting information: done
|
||||||
|
```
|
||||||
|
|
||||||
|
Looks good.
|
||||||
|
|
||||||
|
## Getting rootless containerd to start automatically
|
||||||
|
|
||||||
|
As we noted last time, Alpine Linux uses OpenRC, which uses plain old shell scripts
|
||||||
|
for service automation. So getting containerd running rootless should be relatively painless.
|
||||||
|
The [Service Script Guide](https://github.com/OpenRC/openrc/blob/master/service-script-guide.md) tells use what
|
||||||
|
we need to do. First, we'll need to define `command`, `command_args`, and `pidfile`. Sure. Easy enough.
|
||||||
|
```sh
|
||||||
|
command="rootlesskit"
|
||||||
|
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
|
||||||
|
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
|
||||||
|
sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
|
||||||
|
pidfile="/run/${RC_SVCNAME}.pid"
|
||||||
|
```
|
||||||
|
> **(Tammy)** Huh. Interesting config path.
|
||||||
|
>
|
||||||
|
> **(Ashe)** Mm. OpenRC will automatically source any files of matching name `/etc/conf.d/`, but we don't want that here,
|
||||||
|
> since the config file isn't a shell script. Instead, we'll create a folder of matching name, and stash the config file
|
||||||
|
> there.
|
||||||
|
>
|
||||||
|
> **(Octavia)** I'm still deeply annoyed that containerd uses toml instead of YAML, or something not written by assholes.
|
||||||
|
>
|
||||||
|
> **(Ashe)** Even pedigree aside, it's way clunkier than YAML too. Sucks on all fronts.
|
||||||
|
>
|
||||||
|
> **(Selene)** This may be one of those little things we could work on.
|
||||||
|
>
|
||||||
|
> Ashe sighs.
|
||||||
|
>
|
||||||
|
> **(Ashe)** Perhaps. The backlog continues to grow.
|
||||||
|
|
||||||
|
We also need to tell {{< hover "Stop-Start Daemon" >}}ssd{{< /hover >}} what our service depends on:
|
||||||
|
```sh
|
||||||
|
depend () {
|
||||||
|
use net dns
|
||||||
|
need cgroups sysctl
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
We'll also need to tell the start-stop deamon that we want the process started as a non-root user,
|
||||||
|
and that rootless kit doesn't automatically background itself.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
command_user="tammy:tammy"
|
||||||
|
command_background=true
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, let's add some documentation and niceness:
|
||||||
|
```sh
|
||||||
|
name="rootless containerd $SVCNAME"
|
||||||
|
describe () {
|
||||||
|
echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Our (semi-)final script looks like:
|
||||||
|
```sh
|
||||||
|
#!/sbin/openrc-run
|
||||||
|
depend() {
|
||||||
|
use net dns
|
||||||
|
need cgroups sysctl
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
describe () {
|
||||||
|
echo "This service auto-starts rootless containerd as tammy (UID 1000) when the system starts."
|
||||||
|
}
|
||||||
|
|
||||||
|
name="rootless containerd $SVCNAME"
|
||||||
|
command="rootlesskit"
|
||||||
|
command_user="tammy:tammy"
|
||||||
|
command_background=true
|
||||||
|
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
|
||||||
|
--state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \
|
||||||
|
sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
|
||||||
|
pidfile="/run/${RC_SVCNAME}.pid"
|
||||||
|
```
|
||||||
|
|
||||||
|
Our final step is to place the script somewhere where OpenRC can find it.
|
||||||
|
```sh
|
||||||
|
doas chown root:root rootless-containerd
|
||||||
|
doas chmod 0755 rootless-containerd
|
||||||
|
doas mv rootless-containerd /etc/init.d/
|
||||||
|
```
|
||||||
|
|
||||||
|
We can now `doas rc-service rootless-containerd start` and... it starts!
|
||||||
|
|
||||||
|
## Finally Testing nerdctl
|
||||||
|
|
||||||
|
Okay. Finally here. Let's try something simple like `nerdctl run -it --rm alpine`:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
FATA[0000] rootless containerd not running? (hint: use `containerd-rootless-setuptool.sh install` to start rootless containerd): stat /tmp/1000-runtime-dir/containerd-rootless: no such file or directory
|
||||||
|
```
|
||||||
|
|
||||||
|
Wat.
|
||||||
|
|
||||||
|
OH.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
~ ❯ ls /tmp/1000-runtime-dir
|
||||||
|
containerd rootlesskit-containerd
|
||||||
|
```
|
||||||
|
|
||||||
|
`nerdctl` expects the folder to be called something different. That's fine. Let's stop the rootless containerd service
|
||||||
|
with `doas rc-service rootless-containerd stop`, and then we can just modify the `command_args` of our service script:
|
||||||
|
```sh
|
||||||
|
command_args="--net=slirp4netns --copy-up=/etc --copy-up=/run \
|
||||||
|
--state-dir=/tmp/1000-runtime-dir/containerd-rootless --disable-host-loopback \
|
||||||
|
sh -c \"rm -f /run/containerd; exec containerd -c /etc/conf.d/rootless-containerd/config.toml\""
|
||||||
|
```
|
||||||
|
|
||||||
|
And then `doas rc-service rootless-containerd start`.
|
||||||
|
|
||||||
|
Let's try that `nerdctl run -it --rm alpine` again.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
~ ❯ nerdctl run -it --rm alpine
|
||||||
|
:applet not found
|
||||||
|
```
|
||||||
|
|
||||||
|
I. What. What applet.
|
||||||
|
|
||||||
|
Okay let's try this the manual way:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
~ ❯ nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid) with tammy@tammy at 13:15:41
|
||||||
|
/ ❯ export CONTAINERD_ADDRESS=/tmp/1000-runtime-dir/containerd/containerd.sock with root@tammy at 13:16:16
|
||||||
|
/ ❯ export CONTAINERD_SNAPSHOTTER=overlayfs with root@tammy at 13:16:43
|
||||||
|
/ ❯ ctr images pull docker.io/library/alpine:latest with root@tammy at 13:16:48
|
||||||
|
docker.io/library/alpine:latest: resolved |++++++++++++++++++++++++++++++++++++++|
|
||||||
|
index-sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4: done |++++++++++++++++++++++++++++++++++++++|
|
||||||
|
manifest-sha256:c0d488a800e4127c334ad20d61d7bc21b4097540327217dfab52262adc02380c: done |++++++++++++++++++++++++++++++++++++++|
|
||||||
|
layer-sha256:c158987b05517b6f2c5913f3acef1f2182a32345a304fe357e3ace5fadcad715: done |++++++++++++++++++++++++++++++++++++++|
|
||||||
|
config-sha256:49176f190c7e9cdb51ac85ab6c6d5e4512352218190cd69b08e6fd803ffbf3da: done |++++++++++++++++++++++++++++++++++++++|
|
||||||
|
elapsed: 4.9 s total: 3.2 Mi (671.5 KiB/s)
|
||||||
|
unpacking linux/amd64 sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4...
|
||||||
|
done: 177.071479ms
|
||||||
|
/ ❯ ctr run -t --rm --fifo-dir /tmp/foo-fifo --cgroup "" docker.io/library/alpine:latest foo with root@tammy at 13:17:58
|
||||||
|
ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process:
|
||||||
|
unable to apply cgroup configuration: rootless needs no limits + no cgrouppath when no permission is granted for cgroups
|
||||||
|
mkdir /sys/fs/cgroup/foo: permission denied: unknown
|
||||||
|
/ ❯ with root@tammy at 13:18:02
|
||||||
|
```
|
||||||
|
|
||||||
|
Huh. Alright. We recently found out that cgroups might only work with systemd, and it seems that's the case.
|
||||||
|
|
||||||
|
### Some Bad News
|
||||||
|
|
||||||
|
A few hours of pain later, here's what I have.
|
||||||
|
|
||||||
|
Attempting to run containerd as our user, and then manually using `nsenter` does not work:
|
||||||
|
```sh
|
||||||
|
~ ❯ nsenter -U --preserve-credentials -m -n -t $(cat /tmp/1000-runtime-dir/containerd-rootless/child_pid)
|
||||||
|
nsenter: setns(): can't reassociate to namespace 'net': Operation not permitted
|
||||||
|
```
|
||||||
|
If we run it via `rc-service`, it works. Likely something to do with the exact CLI options we're using. Won't think too
|
||||||
|
hard about this one.
|
||||||
|
|
||||||
|
From there, nerdctl breaks with an applet error, and manually `nsenter`-ing the daemon and attempting to `ctr run` breaks
|
||||||
|
because writing the cgroup file gets a permission denied. This occurs even with cgroups disabled.
|
||||||
|
|
||||||
|
I'm truly out of my depth here. I knew going in that this was an unsupported configuration, since most rootless
|
||||||
|
implementations rely on systemd, but I wanted to try it anyway.
|
||||||
|
|
||||||
|
In my digging, I did find a guide on [rootless Docker](https://virtualzone.de/posts/alpine-docker-rootless/), and while
|
||||||
|
we really don't want to use Docker, we did try using `containerd-rootless.sh`, but this results in the same errors.
|
||||||
|
|
||||||
|
For now, we're going to have to put this aside, since we really need to get to actually migrating our workloads. We'll
|
||||||
|
keep digging in the background and see if we can discover anything, and ask some questions on the #containerd-dev Slack
|
||||||
|
channel.
|
||||||
|
|
||||||
|
## More Tea Please
|
||||||
|
|
||||||
|
With that, I heave a sigh while Doll is having us make some tea to calm down. Hope we'll have more for you all soon.
|
||||||
|
|
||||||
|
Excuse us while we turn into a jellyfish and swim away.
|
||||||
|
|
Loading…
Reference in New Issue