publish part1-rootless
parent
f16f9c2468
commit
cc76809638
|
@ -1,12 +1,16 @@
|
|||
---
|
||||
title: "Rootless Containers on Alpine"
|
||||
date: 2022-10-12T22:17:15+11:00
|
||||
draft: true
|
||||
date: 2022-11-08T19:30:15+11:00
|
||||
draft: false
|
||||
showSummary: true
|
||||
summary: "We recently murdered a server's terminal via `do_distro_upgrade`, and thought it'd be a good time to learn more about containers and Alpine."
|
||||
summary: "We recently murdered a server's terminal via `do_distro_upgrade`, and thought it'd be a good time to learn
|
||||
more about containers and Alpine."
|
||||
series:
|
||||
- "Rootless Containers on Alpine Linux"
|
||||
series_order: 1
|
||||
---
|
||||
|
||||
# Rootless Containers on Alpine Linux
|
||||
# Part One: Prep Work
|
||||
|
||||
## Background
|
||||
**(Ashe)**
|
||||
|
@ -17,7 +21,7 @@ So. We recently murdered a server's terminal via `do_distro_upgrade`.
|
|||
**(Ashe)** Yes.
|
||||
|
||||
```
|
||||
% man 7z
|
||||
% man 7z
|
||||
WARNING: terminal is not fully functional
|
||||
- (press RETURN)%
|
||||
```
|
||||
|
@ -26,20 +30,26 @@ It was in fact *that bad*. So we figured, well, we can spend a few hours, days,
|
|||
**(Tammy)** Or we could just build a new server!
|
||||
|
||||
**(Ashe)** Right.
|
||||
So, after asking some friends about their opinions, we settled on Alpine Linux. And why not also migrate all of our pm2 workloads to containers while we're at it? We've been meaning to learn more about containers for a while now.
|
||||
So, after asking some friends about their opinions, we settled on Alpine Linux. And why not also migrate all of our
|
||||
pm2 workloads to containers while we're at it? We've been meaning to learn more about containers for a while now.
|
||||
|
||||
So off we go!
|
||||
|
||||
## Prep Work
|
||||
|
||||
We need a few things before we actually set up rootless containers. We'll be following along with the [Official Rootless Containers Tutorial](https://rootlesscontaine.rs/getting-started/common/), making adjustments as necessary.
|
||||
We need a few things before we actually set up rootless containers. We'll be following along with the
|
||||
[Official Rootless Containers Tutorial](https://rootlesscontaine.rs/getting-started/common/),
|
||||
making adjustments as necessary.
|
||||
|
||||
### Login Information
|
||||
|
||||
Most Rootless Container implementations use `$XDG_RUNTIME_DIR` to find the user's ID and where their runtime lives (usually some subdir of `/run/user/`).
|
||||
Systemd-based Linux distros will handle this automatically, but Alpine uses [OpenRC](https://wiki.alpinelinux.org/wiki/OpenRC), which does not do this automatically.
|
||||
Most Rootless Container implementations use `$XDG_RUNTIME_DIR` to find the user's ID and where their runtime lives
|
||||
(usually some subdir of `/run/user/`).
|
||||
Systemd-based Linux distros will handle this automatically, but Alpine uses
|
||||
[OpenRC](https://wiki.alpinelinux.org/wiki/OpenRC), which does not do this automatically.
|
||||
|
||||
While Alpine doesn't provide a tutorial for Rootless Containers, we can adapt some of the prep work done for [Wayland](https://wiki.alpinelinux.org/wiki/Wayland) to get OpenRC to set `$XDG_RUNTIME_DIR` for us.
|
||||
While Alpine doesn't provide a tutorial for Rootless Containers, we can adapt some of the prep work done for
|
||||
[Wayland](https://wiki.alpinelinux.org/wiki/Wayland) to get OpenRC to set `$XDG_RUNTIME_DIR` for us.
|
||||
|
||||
We just create `/etc/profile.d/xdg_runtime_dir.sh` like so:
|
||||
```sh
|
||||
|
@ -65,7 +75,7 @@ With that done, we can move onto our next steps.
|
|||
There's some sysctl config required for older distros, but this isn't required for Alpine.
|
||||
|
||||
### User Namespace Configuration
|
||||
Rootless Containers use User Namespaces, subUIDs, and subGIDs, so we'll need to have those working.
|
||||
Rootless Containers use User Namespaces, subUIDs, and subGIDs, so we'll need to have those working.
|
||||
The apk package `shadow-subids` provides that functionality for us.
|
||||
```
|
||||
~ ❯ apk info shadow-subids
|
||||
|
@ -82,7 +92,8 @@ shadow-subids-4.10-r3 installed size:
|
|||
### Sub-ID Counts
|
||||
Rootless Containers generally expect `/etc/subuid` and `/etc/subgid` to contain at least 65,536 sub-IDs for each user.
|
||||
`shadow-subids` doed create these files for us, but leaves them empty by default, so let's go ahead and do that.
|
||||
The [page on subIDs](https://rootlesscontaine.rs/getting-started/common/subuid/) provides a handy Python script to do that for us, which we'll edit slightly so it's not writing directly to system files:
|
||||
The [page on subIDs](https://rootlesscontaine.rs/getting-started/common/subuid/) provides a handy Python script
|
||||
to do that for us, which we'll edit slightly so it's not writing directly to system files:
|
||||
```python
|
||||
f = open("subuid", "w")
|
||||
for uid in range(1000, 65536):
|
||||
|
@ -102,7 +113,8 @@ This is probably overkill for our use-case, but that's also fine.
|
|||
With that done, we can move onto the last prep step.
|
||||
|
||||
### CGroups V2
|
||||
To limit resources that a container can use, we need to enable CGroups V2. In OpenRC, this can be done by changing some options in `/etc/rc.conf`.
|
||||
To limit resources that a container can use, we need to enable CGroups V2.
|
||||
In OpenRC, this can be done by changing some options in `/etc/rc.conf`.
|
||||
|
||||
To enable CGroups in general, we need to set `rc_controller_cgroups` to `YES`
|
||||
```sh
|
||||
|
@ -121,10 +133,10 @@ From here, we can enable CGroups V2 by setting `rc_cgroup_mode` to `unified`
|
|||
rc_cgroup_mode="unified"
|
||||
```
|
||||
|
||||
**(Doll)**: Doll confused.
|
||||
**(Doll)** Doll confused.
|
||||
|
||||
**(Ashe)** So was I, for a bit. Despite what `rc.conf` says, cgroups V2 does *not* seem to be enabled on Alpine
|
||||
unless `rc_cgroup_mode` is set to `unified`. The [https://wiki.alpinelinux.org/wiki/OpenRC#cgroups\_v2](Alpine Wiki)
|
||||
**(Ashe)** So was I, for a bit. Despite what `rc.conf` says, cgroups V2 does *not* seem to be enabled on Alpine
|
||||
unless `rc_cgroup_mode` is set to `unified`. The [Alpine Wiki](https://wiki.alpinelinux.org/wiki/OpenRC#cgroups\_v2)
|
||||
seems to agree here, but isn't super clear. We'll find out if this is sufficient.
|
||||
|
||||
|
||||
|
@ -144,8 +156,436 @@ From here, we can reboot, and continue on. If you don't want to reboot, you can
|
|||
rc-service cgroups start
|
||||
```
|
||||
|
||||
## Creating a group for our container users
|
||||
|
||||
We'll quickly create a group for all users who'll be using rootless containers here. In Alpine, this is as simple as
|
||||
`doas addgroup ctr`. We'll make use of this later.
|
||||
|
||||
## Installing containerd and friends
|
||||
First up we'll need to install `containerd` (to host our containers) and
|
||||
`slirp4netns` (to allow network spaced commands inside the container with lower overhead than VPNKit), so we just:
|
||||
```sh
|
||||
doas apk add containerd
|
||||
doas apk add slirp4netns
|
||||
```
|
||||
|
||||
Next, we need to install `nerdctl` and `rootlesskit`. Both of these are currently only found inside
|
||||
the `testing` repo for Alpine. We can pull them in without subscribing to the entire testing repo like so:
|
||||
```sh
|
||||
doas apk add -X https://dl-cdn.alpinelinux.org/alpine/edge/testing/ nerdctl
|
||||
doas apk add -X https://dl-cdn.alpinelinux.org/alpine/edge/testing/ rootlesskit
|
||||
```
|
||||
|
||||
## Configuring the Rootless containerd service
|
||||
We'll be using nerdctl as our containerd controller of choice. It comes with a rootless containerd.service, but since Alpine doesn't use systemd, we'll have to adapt this into an rc service.
|
||||
We'll be using nerdctl as our containerd controller of choice. It comes with a rootless containerd.service,
|
||||
but since Alpine doesn't use systemd, we'll have to adapt this into an rc service.
|
||||
|
||||
We spent some time trying to adapt the [install script](https://github.com/containerd/nerdctl/blob/48f189a53a24c12838433f5bb5dd57f536816a8a/extras/rootless/containerd-rootless-setuptool.sh)
|
||||
nerdctl provides to our purposes, however this is a bit excessive for what we need,
|
||||
so we'll just do it the "[hard way](https://github.com/containerd/containerd/blob/main/docs/rootless.md)".
|
||||
|
||||
**(Tammy)** Wait, this isn't the "hard way", is it?
|
||||
|
||||
**(Ashe)** Nope. Adapting a 500 line script would be hard and annoying. We're better served by just doing it manually,
|
||||
and providing instructions for anyone following along. So in that vein:
|
||||
|
||||
### Getting containerd running in rootlesskit
|
||||
First, let's get containerd running at the CLI, and then we can make it into an OpenRC Script.
|
||||
We'll need a `config.toml`, but it can pretty minimal:
|
||||
```toml
|
||||
version = 2
|
||||
root = "/home/tammy/.local/share/containerd"
|
||||
state = "/tmp/1000-runtime-dir/containerd"
|
||||
|
||||
[grpc]
|
||||
address = "/tmp/1000-runtime-dir/containerd/containerd.sock"
|
||||
```
|
||||
First try:
|
||||
```sh
|
||||
~ ❯ rootlesskit --net=slirp4netns --copy-up=/etc --copy-up=/run \
|
||||
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
|
||||
sh -c "rm -f /run/containerd; exec containerd -c config.toml"
|
||||
|
||||
BusyBox v1.35.0 (2022-08-01 15:14:44 UTC) multi-call binary.
|
||||
|
||||
Usage: ip [OPTIONS] address|route|link|tunnel|neigh|rule [ARGS]
|
||||
|
||||
OPTIONS := -f[amily] inet|inet6|link | -o[neline]
|
||||
|
||||
ip addr add|del IFADDR dev IFACE | show|flush [dev IFACE] [to PREFIX]
|
||||
ip route list|flush|add|del|change|append|replace|test ROUTE
|
||||
ip link set IFACE [up|down] [arp on|off] [multicast on|off]
|
||||
[promisc on|off] [mtu NUM] [name NAME] [qlen NUM] [address MAC]
|
||||
[master IFACE | nomaster] [netns PID]
|
||||
ip tunnel add|change|del|show [NAME]
|
||||
[mode ipip|gre|sit] [remote ADDR] [local ADDR] [ttl TTL]
|
||||
ip neigh show|flush [to PREFIX] [dev DEV] [nud STATE]
|
||||
ip rule [list] | add|del SELECTOR ACTION
|
||||
[rootlesskit:parent] error: failed to setup network &{logWriter:0xc00014aa00 binary:slirp4netns mtu:65520 ipnet:<nil> disableHostLoopback:true apiSocketPath: enableSandbox:false enableSeccomp:false enableIPv6:false ifname:tap0 infoMu:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} info:<nil>}: setting up tap tap0: executing [[nsenter -t 28611 -n -m -U --preserve-credentials ip tuntap add name tap0 mode tap] [nsenter -t 28611 -n -m -U --preserve-credentials ip link set tap0 up]]: exit status 1
|
||||
[rootlesskit:child ] error: parsing message from fd 3: EOF
|
||||
```
|
||||
|
||||
**(Doll)** That looks like it broke, Miss.
|
||||
|
||||
**(Ashe)** *sigh*, yeah, that's broken alright. That output looks like ip didn't like the command supplied to it, so let's find out what that was.
|
||||
|
||||
Some troubleshooting later, it looks like this is to do with BusyBox's implementation of the ip commands. We've raised
|
||||
[an issue](https://github.com/rootless-containers/slirp4netns/issues/304), and we'll see how that goes.
|
||||
In the mean time, we'll just have to use native networking. This means we can't apply firewall rules per-container, which
|
||||
is moderately annoying, but won't actually hinder deployment. Just makes securing the deployment more annoying.
|
||||
|
||||
So let's try without the `--net=slirp4netns` (omitting anything that's INFO):
|
||||
```sh
|
||||
~ ❯ rootlesskit --copy-up=/etc --copy-up=/run \
|
||||
--state-dir=/tmp/1000-runtime-dir/rootlesskit-containerd --disable-host-loopback \
|
||||
sh -c "rm -f /run/containerd; exec containerd -c config.toml"
|
||||
WARN[2022-11-03T11:32:53.207241941+11:00] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
|
||||
WARN[2022-11-03T11:32:53.227691744+11:00] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
|
||||
WARN[2022-11-03T11:32:53.233006449+11:00] failed to load plugin io.containerd.internal.v1.opt error="mkdir /opt/containerd: permission denied"
|
||||
ERRO[2022-11-03T11:32:53.235151641+11:00] failed to load cni during init, please check CRI plugin status before setting up network for pods error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
|
||||
```
|
||||
|
||||
A few things of note here:
|
||||
```sh
|
||||
WARN[2022-11-03T11:32:53.233006449+11:00] failed to load plugin io.containerd.internal.v1.opt error="mkdir /opt/containerd: permission denied"
|
||||
[...]
|
||||
ERRO[2022-11-03T11:32:53.235151641+11:00] failed to load cni during init, please check CRI plugin status before setting up network for pods error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
|
||||
```
|
||||
|
||||
The warning tells us that it tried to create /opt/containerd, but was unable to. This is easy enough to fix:
|
||||
```sh
|
||||
~ ❯ doas mkdir /opt/containerd
|
||||
~ ❯ doas chmod 2770 /opt/containerd
|
||||
~ ❯ doas chown root:ctr /opt/containerd #Replace the username and group here as necessary
|
||||
```
|
||||
|
||||
The error is more interesting. CRI here stands for [Container Runtime Interface](https://github.com/containerd/cri), and
|
||||
it seems to be used for Kubernetes. Since we won't be using kubernetes here, we can just disable it by adding
|
||||
`disabled_plugins = ["io.containerd.grpc.v1.cri"]` to our `config.toml`.
|
||||
|
||||
**(Tammy)** If you *are* interested in Kubernetes, make sure to check out our [Home Server Build-Out]({{< ref "home-server-build-out" >}}) series. We're planning on setting up an entire cloud environment there.
|
||||
|
||||
Let's try that again (cutting out any info stuff):
|
||||
```sh
|
||||
[...]
|
||||
WARN[2022-11-03T16:18:35.425339343+11:00] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
|
||||
WARN[2022-11-03T16:18:35.427868986+11:00] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
|
||||
ERRO[2022-11-03T16:18:35.430061527+11:00] failed to initialize a tracing processor "otlp" error="no OpenTelemetry endpoint: skip plugin"
|
||||
containerd successfully booted in 0.024502s
|
||||
[...]
|
||||
```
|
||||
That's cleaned up those issues, but we still have two warnings about `devmapper`,
|
||||
and `containerd` couldn't find an OpenTelemetry endpoint.
|
||||
|
||||
We'll be skipping OpenTelemetry for now, but that sounds like a fun topic for a second blog post along side setting up
|
||||
Grafana.
|
||||
|
||||
**(Doll)** Doll will remember! Will remind Miss' to make a post about this!
|
||||
|
||||
### Setting up devmapper
|
||||
|
||||
`devmapper` is one of a few [snapshotters](https://github.com/containerd/containerd/tree/main/docs/snapshotters)
|
||||
that `containerd` can use. It's not the most performant (that honour goes to `overlayfs`), but it is one of
|
||||
the most robust, and least likely to break. This is more imporant to us than pure performance.
|
||||
If you're following along at home, you'll have to decide which storage driver is best for your use-case.
|
||||
|
||||
Following the [setup guide](https://github.com/containerd/containerd/blob/main/docs/snapshotters/devmapper.md),
|
||||
we'll need `dmsetup` installed. Under Alpine, this is provided by the `device-mapper` package,
|
||||
which we already have installed.
|
||||
|
||||
We've also got a 100GB block device attached to this VPS, so let's get that provisioned too.
|
||||
|
||||
#### Mounting and Formatting our block device
|
||||
|
||||
We can use `fdisk` to format our block device. `fdisk -l` lists all devices and partitions.
|
||||
|
||||
```
|
||||
~ ❯ doas fdisk -l
|
||||
Disk /dev/vda: 25 GB, 26843545600 bytes, 52428800 sectors
|
||||
52012 cylinders, 16 heads, 63 sectors/track
|
||||
Units: sectors of 1 * 512 = 512 bytes
|
||||
|
||||
Device Boot StartCHS EndCHS StartLBA EndLBA Sectors Size Id Type
|
||||
/dev/vda1 * 2,0,33 205,3,19 2048 206847 204800 100M 83 Linux
|
||||
/dev/vda2 205,3,20 1023,15,63 206848 52428799 52221952 24.9G 8e Linux LVM
|
||||
Disk /dev/vdb: 100 GB, 107374182400 bytes, 209715200 sectors
|
||||
208050 cylinders, 16 heads, 63 sectors/track
|
||||
Units: sectors of 1 * 512 = 512 bytes
|
||||
|
||||
Disk /dev/vdb doesn't contain a valid partition table
|
||||
Disk /dev/dm-0: 1968 MB, 2063597568 bytes, 4030464 sectors
|
||||
250 cylinders, 255 heads, 63 sectors/track
|
||||
Units: sectors of 1 * 512 = 512 bytes
|
||||
|
||||
Disk /dev/dm-0 doesn't contain a valid partition table
|
||||
Disk /dev/dm-1: 23 GB, 24670896128 bytes, 48185344 sectors
|
||||
2999 cylinders, 255 heads, 63 sectors/track
|
||||
Units: sectors of 1 * 512 = 512 bytes
|
||||
|
||||
Disk /dev/dm-1 doesn't contain a valid partition table
|
||||
```
|
||||
We know that our VPS has a 25GB disk, so `/dev/vdb` is our 100GB block device. We can format it with
|
||||
`doas fdisk /dev/vdb`. Let's see how we do that:
|
||||
|
||||
```sh
|
||||
~ ❯ doas fdisk /dev/vdb
|
||||
Device contains neither a valid DOS partition table, nor Sun, SGI, OSF or GPT disklabel
|
||||
Building a new DOS disklabel. Changes will remain in memory only,
|
||||
until you decide to write them. After that the previous content
|
||||
won't be recoverable.
|
||||
|
||||
|
||||
The number of cylinders for this disk is set to 208050.
|
||||
There is nothing wrong with that, but this is larger than 1024,
|
||||
and could in certain setups cause problems with:
|
||||
1) software that runs at boot time (e.g., old versions of LILO)
|
||||
2) booting and partitioning software from other OSs
|
||||
(e.g., DOS FDISK, OS/2 FDISK)
|
||||
|
||||
Command (m for help): n
|
||||
Partition type
|
||||
p primary partition (1-4)
|
||||
e extended
|
||||
p
|
||||
Partition number (1-4): 1
|
||||
First sector (63-209715199, default 63):
|
||||
Using default value 63
|
||||
Last sector or +size{,K,M,G,T} (63-209715199, default 209715199):
|
||||
Using default value 209715199
|
||||
|
||||
Command (m for help): w
|
||||
The partition table has been altered.
|
||||
Calling ioctl() to re-read partition table
|
||||
|
||||
```
|
||||
|
||||
Running `fdisk -l` again:
|
||||
```sh
|
||||
[...]
|
||||
Device Boot StartCHS EndCHS StartLBA EndLBA Sectors Size Id Type
|
||||
/dev/vdb1 0,1,1 1023,15,63 63 209715199 209715137 99.9G 83 Linux
|
||||
Disk /dev/dm-0: 1968 MB, 2063597568 bytes, 4030464 sectors
|
||||
250 cylinders, 255 heads, 63 sectors/track
|
||||
Units: sectors of 1 * 512 = 512 bytes
|
||||
[...]
|
||||
```
|
||||
|
||||
Looks like that worked.
|
||||
|
||||
#### Adding the formatted block device into LVM
|
||||
|
||||
Let's get this added into LVM. First, we need to create a physical volume with the `pvcreate`
|
||||
command:
|
||||
```sh
|
||||
~ ❯ doas pvcreate /dev/vdb1
|
||||
Physical volume "/dev/vdb1" successfully created.
|
||||
```
|
||||
|
||||
Let's create a new Volume Group for our workload data. There are two reasons for this:
|
||||
1. This will make it easier to extend in the future; and
|
||||
2. Our block device is spinning rust, and we don't necessarily want to mix SSDs with spinning rust.
|
||||
|
||||
With that in mind, we'll leave the existing VG, `vg0` as the volume group for programs and container images:
|
||||
```sh
|
||||
~ ❯ doas vgcreate data /dev/vdb
|
||||
Volume group "data" successfully created
|
||||
~ ❯ doas vgdisplay data
|
||||
--- Volume group ---
|
||||
VG Name data
|
||||
System ID
|
||||
Format lvm2
|
||||
Metadata Areas 1
|
||||
Metadata Sequence No 1
|
||||
VG Access read/write
|
||||
VG Status resizable
|
||||
MAX LV 0
|
||||
Cur LV 0
|
||||
Open LV 0
|
||||
Max PV 0
|
||||
Cur PV 1
|
||||
Act PV 1
|
||||
VG Size <100.00 GiB
|
||||
PE Size 4.00 MiB
|
||||
Total PE 25599
|
||||
Alloc PE / Size 0 / 0
|
||||
Free PE / Size 25599 / <100.00 GiB
|
||||
VG UUID 679FIe-aF9e-yBRy-bRH6-wRlY-KPgz-yUpXL9
|
||||
|
||||
```
|
||||
> **(Doll)** Is it working Miss? Doll wants to see websites in treasure chests go zoom!
|
||||
>
|
||||
> **(Ashe)** **Containers**, dear Doll. And yes, yes it is.
|
||||
> Only a few more steps and we'll be ready to start bringing things online, don't worry.
|
||||
|
||||
Speaking of, next we need to create our logical volumes. We'll create two. One for our container scratch storage, and
|
||||
one for persistent storage. We'll size scratch at 30GiB, and persistent at 70GiB. Let's get that done:
|
||||
|
||||
```sh
|
||||
~ ❯ doas lvcreate -n persist --size 70G data
|
||||
Logical volume "persist" created.
|
||||
~ ❯ doas lvcreate -n scratch --size 30G data
|
||||
Volume group "data" has insufficient free space (7679 extents): 7680 required.
|
||||
```
|
||||
|
||||
> **(Selene)** Oh interesting. What happened there?
|
||||
>
|
||||
> **(Ashe)** Our theoretically 100GiB device has one extent less than 100GiB, so we couldn't divide it into exactly 30/70.
|
||||
>
|
||||
> **(Tammy)** Wait is that why `fdisk` said the device was 99.9G?
|
||||
>
|
||||
> **(Ashe)** Good catch. Yeah. 100GiB doesn't divide evenly into 960KiB cylinders, so we end up with one cylinder
|
||||
> too few, and therefore—
|
||||
>
|
||||
> **(Tammy)** One extent too few! Sneaky!
|
||||
>
|
||||
> **(Ashe)** Yup. Actually, now that I look at it again, I forgot to make space for the metadata, so this works out
|
||||
> nicely.
|
||||
|
||||
#### Creating our nerdctl thin pool
|
||||
|
||||
Docker and nerdctl can control a block device directly to use as a storage driver via device-mapper, so we'll be letting
|
||||
nerdctl do that for it's mainline storage, and using our "persistent" pool for nerdctl volumes (which are persistent).
|
||||
|
||||
For this we'll need `device-mapper`, `lvm2-dmeventd`, and `thin-provisioning-tools`, so we'll `apk add` those in.
|
||||
|
||||
**(Ashe)** I'm going to skip showing the terminal output for installing packages from here on in to save space. I'm sure
|
||||
you've gotten the idea by now.
|
||||
|
||||
First up is creating a thin pool, which we'll do as follows:
|
||||
```sh
|
||||
~ ❯ doas lvcreate --wipesignatures y -n scratch data -l 95%FREE
|
||||
Logical volume "scratch" created.
|
||||
~ ❯ doas lvcreate --wipesignatures y -n scratchmeta data -l 10%FREE
|
||||
Logical volume "scratchmeta" created.
|
||||
~ ❯ doas lvconvert -y --zero n -c 512K --thinpool data/scratch --poolmetadata data/scratchmeta
|
||||
Thin pool volume with chunk size 512.00 KiB can address at most 126.50 TiB of data.
|
||||
WARNING: Converting data/scratch and data/scratchmeta to thin pool's data and metadata volumes with metadata wiping.
|
||||
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
|
||||
Converted data/scratch and data/scratchmeta to thin pool.
|
||||
~ ❯
|
||||
```
|
||||
So what did we do here?
|
||||
|
||||
> **(Doll)** Ooh! Ooh! Doll knows! Miss created one LV, umm, Logical Volume, taking up 95% of the free space, and one
|
||||
> taking up 10% of the free space... remaining free space? So ummm, ummm, 152 MiB?
|
||||
>
|
||||
> **(Ashe)** That's right! What next?
|
||||
>
|
||||
> **(Doll)** We umm. Combine the two into one? This one is confuse.
|
||||
>
|
||||
> **(Ashe)** Okay, I'll try to keep it simple. A normal (thick) pool allocates all of its data when we create it. So
|
||||
> all the space is reserved ahead of time. You can write to whatever bit of it you want, whenever you want.
|
||||
> Imagine something like a notebook you bought. A thin pool isn't like that. It initialises a small area
|
||||
> with zeroes, but otherwise leaves the rest of the device alone. Like you have a page, and you ask the store for
|
||||
> another blank page every time you get close to filling up your page.
|
||||
> So, what would happen if I wrote a 100M file that was all zeroes?
|
||||
>
|
||||
> **(Selene)** Let's see if I understand. Well, you'd write the file metadata, and allocate some space... Wait who's
|
||||
> keeping track of the size of the volume?
|
||||
>
|
||||
> **(Ashe)** Precisely, Selene. You need a metadata volume that contains information about the assigned blocks in
|
||||
> the thin pool, since it wasn't allocated all at once. So we create a pool for that, and then combine the two into our
|
||||
> final thin pool.
|
||||
|
||||
That done, we can configure autoextension by creating `/etc/lvm/profile/data-scratch.profile`:
|
||||
```sh
|
||||
activation {
|
||||
thin_pool_autoextend_threshold=80
|
||||
thin_pool_autoextend_percent=10
|
||||
}
|
||||
```
|
||||
Apply said profile with `doas lvchange --metadataprofile data-scratch data/scratch`, and check if the thin pool is being
|
||||
monitored:
|
||||
```sh
|
||||
~ ❯ doas lvs -o+seg_monitor
|
||||
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Monitor
|
||||
persist data -wi-a----- 70.00g
|
||||
scratch data twi---t--- <28.50g
|
||||
lv_root vg0 -wi-ao---- <22.98g
|
||||
lv_swap vg0 -wi-ao---- 1.92g
|
||||
```
|
||||
Looks good. Were the LV not monitored, we would see `not monitored` at the end of the `scratch data` line. Were that the
|
||||
case, we could fix that with `doas lvchange --monitor y data/scratch`.
|
||||
|
||||
#### Formatting the new Logical Volume
|
||||
|
||||
Our final step is to format the LV we'll be using for persistent volumes.
|
||||
We'll be using plain-old ext4 for this as I don't need to nor want to get fancy here.
|
||||
|
||||
```sh
|
||||
~ ❯ doas mkfs.ext4 /dev/data/persist
|
||||
mke2fs 1.46.5 (30-Dec-2021)
|
||||
Discarding device blocks: done
|
||||
Creating filesystem with 18349056 4k blocks and 4587520 inodes
|
||||
Filesystem UUID: c0a59a7b-1969-4476-9d2c-11af32628337
|
||||
Superblock backups stored on blocks:
|
||||
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
|
||||
4096000, 7962624, 11239424
|
||||
|
||||
Allocating group tables: done
|
||||
Writing inode tables: done
|
||||
Creating journal (131072 blocks): done
|
||||
Writing superblocks and filesystem accounting information: done
|
||||
```
|
||||
|
||||
#### Mounting our new logical drives and setting up automount
|
||||
|
||||
Final step. Mounting the drive is relative simple:
|
||||
```sh
|
||||
~ ❯ doas mkdir /data
|
||||
~ ❯ doas chmod 2770 /data
|
||||
~ ❯ doas mount /dev/data/persist /data
|
||||
~ ❯ doas chown root:ctr /data -R
|
||||
```
|
||||
|
||||
From here, we can configure `/etc/fstab` so they're automatically mounted at boot.
|
||||
|
||||
To achieve that, we'll add the following line to `/etc/fstab`:
|
||||
```fstab
|
||||
/dev/data/persist /data ext4 rw,relatime 0 0
|
||||
```
|
||||
|
||||
We don't need to mount the scratch LV (Logical Volume) as containerd will be controlling that directly.
|
||||
|
||||
And we should be good to go.
|
||||
|
||||
Last thing to do is add a minimal devmapper config to our `config.toml`:
|
||||
```toml
|
||||
[...]
|
||||
[plugins]
|
||||
[plugins."io.containerd.snapshotter.v1.devmapper"]
|
||||
root_path = "/opt/containerd/devmapper"
|
||||
pool_name = "data-scratch"
|
||||
base_image_size = "1024MB"
|
||||
[...]
|
||||
```
|
||||
|
||||
Let's see what happens when we launch `containerd` again:
|
||||
```sh
|
||||
WARN[2022-11-07T00:33:26.218437232+11:00] failed to load plugin io.containerd.snapshotter.v1.devmapper error="dmsetup version
|
||||
error: Library version: 1.02.170 (2020-03-24)
|
||||
/dev/mapper/control: open failed: Permission denied
|
||||
Failure to communicate with kernel device-mapper driver.
|
||||
Check that device-mapper is available in the kernel.
|
||||
Incompatible libdevmapper 1.02.170 (2020-03-24) and kernel driver (unknown version).
|
||||
Command failed.
|
||||
|
||||
: exit status 1"
|
||||
```
|
||||
|
||||
> **(Tammy)** That doesn't look great.
|
||||
>
|
||||
> **(Ashe)** No. It does not. Hmm. Let's investigate.
|
||||
>
|
||||
> **(Ashe)** Ah. Found it. Looks like devmapper isn't supported in rootless configs. Now we know.
|
||||
|
||||
{{< alert >}}
|
||||
**(Ashe)** Rootless containerd [does **not** support the devmapper snapshotter]
|
||||
(https://github.com/containerd/containerd/tree/main/docs/snapshotters).
|
||||
{{< /alert >}}
|
||||
|
||||
> **(Octavia)** And on that bomb-shell, I think it's about time we wrapped this up. Looks like we'll have to make this
|
||||
> into a series.
|
||||
>
|
||||
> **(Tammy)** Hopefully we'll have better luck next time.
|
||||
|
||||
We can adapt the [install script](https://github.com/containerd/nerdctl/blob/48f189a53a24c12838433f5bb5dd57f536816a8a/extras/rootless/containerd-rootless-setuptool.sh) nerdctl provides to our purposes.
|
||||
|
|
Loading…
Reference in New Issue