ariadne.space/content/blog/spelunking-through-the-apk-...

49 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
title: "spelunking through the apk-tools dependency solver"
date: "2021-10-31"
---
In our previous episode, I [wrote a high level overview](https://ariadne.space/2021/04/25/why-apk-tools-is-different-than-other-package-managers/) of apks differences verses traditional package managers, which many have cited as a helpful resource for understanding the behavior of apk when it does something different than a traditional package manager would. But that article didnt go into depth in enough detail to explain how it all actually works. This one hopefully will.
## A high level view of the moving parts
Our adventure begins at the `/etc/apk/world` file. This file contains the basic set of constraints imposed on the system: every constraint listed here must be solvable in order for the system to be considered correct, and no transaction may be committed that is incorrect. In other words, the package management system can be proven to be in a correct state every time a constraint is added or removed with the apk add/del commands.
Note I used the word transaction there: at its core, apk is a transactional package manager, though we have not fully exploited the transactional capabilities yet. A transaction is created by copying the current constraint list (`db->world`), manipulating it with `apk_deps_add` and then committing it with `apk_solver_commit`. The commitment phase does pre-flight checks directly and returns an error if the transaction fails to pass.
This means that removing packages works the same way: you copy the current constraint set, remove the desired constraint, and then commit the result, which either errors out or updates the installed constraint set after the transaction is committed.
## A deeper look into the solver itself
As noted above, the primary entry point into the solver is to call the `apk_solver_commit` function, which at the time that I am writing this, is located in the [apk-tools source code at src/commit.c:679](https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/master/src/commit.c#L679). This function does a few pre-flight checks and then calls into the solver itself, using `apk_solver_solve`, which generates the actual transaction to be committed. If there are errors, the generated transaction is discarded and a report is printed instead, otherwise the generated transaction is committed using `apk_solver_commit_changeset`.
In essence, the code in src/commit.c can be thought of as the middle layer between the applets and the core solver. The core solver itself lives in src/solver.c and as previously noted, the main entry point is `apk_solver_solve`, which generates a proposed transaction to satisfy the requested constraints. This function [lives at src/solver.c:1021](https://gitlab.alpinelinux.org/alpine/apk-tools/-/blob/master/src/solver.c#L1021), and is the only entry point into the solver itself.
The first thing the solver does is alphabetically sort the constraint set. If youve noticed that `/etc/apk/world` is always in alphabetical order, this is a side effect of that sorting.
Once the world constraints (the ones in `/etc/apk/world`) are alphabetically ordered, the next step is to figure out what package, if any, presently satisfies the constraint. This is handled by the `discover_name` function, which is called recursively on every constraint applicable to the system, starting with the world constraint.
The next step is to generate a fuzzy solution. This is done by walking the dependency graph again, calling the `apply_constraint` function. This step does basic dependency resolution, removing possible solutions which explicitly conflict. Reverse dependencies (`install_if`) are partially evaluated in this phase, but complex constraints (such as those involving a version constraint or multiple solutions) are not evaluated yet.
Once basic constraints are applied to the proposed updated world, the next step is to walk the dependency graph again, reconsidering the fuzzy solution generated in the step above. This step is done by the `reconsider_name` function, which walks over parts of the dependency graph that are still ambiguous. Finally, packages are selected to resolve these ambiguities using the `select_package` function. Afterwards, the final changeset is emitted by the `generate_changeset` function.
### A deep dive into `reconsider_name` and `select_package`
As should hopefully be obvious by now, the really complicated cases are handled by the `reconsider_name` function. These cases include scenarios such as virtual providers, situations where more than one package satisfies the constraint set, and so on. For these scenarios, it is the responsibility of the `reconsider_name` function to select the most optimal package. Similarly, it is the responsibility of the `select_package` function to check the work done by `reconsider_name` and finalize the package selection if appropriate by removing the constraint from the ambiguous list.
The primary purpose of the `reconsider_name` function is to use `discover_name` and `apply_constraint` to move more specific constraints upwards and downwards through the dependency graph, narrowing the possible set of packages which can satisfy a given restraint, ideally to one package or less. These simplified dependency nodes are then fed into `select_package` to deduce the best package selection to make.
The `select_package` function checks each constraint, and the list of remaining candidate packages, and then picks the best package for each constraint. This is done by calling `compare_providers` for each possible package and until the best one is found. The heuristics checked by `compare_providers` are, in order:
1. The packages are checked to see if they are `NULL` or not. The one that isn't `NULL` wins. This is mostly as a safety check.
2. We check to see if the user is using `--latest` or not. If they are, then the behavior changes a little bit. The details aren't so important, you can read the source if you really want to know. Basically, in this step, we determine how fresh a package is, in alignment with what the user's likely opinion on freshness would be.
3. The provider versions are compared, if applicable. Highest version wins.
4. The package versions themselves are compared. Highest version wins.
5. The already installed package is preferred if the version is the same (this is helpful in upgrade transactions to make them less noisy).
6. The `provider_priority` field is compared. Highest priority wins. This means that `provider_priority` is **only** checked for unversioned providers.
7. Finally, the earliest repository in `/etc/apk/repositories` is preferred if all else is the same.
Hopefully, this demystifies some of the common misconceptions around how the solver works, especially how `provider_priority` works. Personally, I think in retrospect, despite working on the spec and implementing it in apk-tools, that `provider_priority` was a mistake, and the preferred solution should be to always use versioned providers (e.g. `provides="foo=100"`) instead. The fact that we have moved to versioning `cmd:` providers in this way demonstrates that `provider_priority` isn't really a good design.
Next time: what is the maximum number of entries allowed in `/etc/apk/repositories` and why is it so low?