ariadne.space/content/blog/pkgconf-and-cve-2023-24056.md

11 KiB

title date
pkgconf, CVE-2023-24056 and disinformation 2023-01-24

Readers will have noticed that two maintenance releases of pkgconf were cut over the weekend, 1.9.4 and 1.8.1 respectively, to address CVE-2023-24056, a pkg-config specific variation of the now-classic "billion laughs attack". While fixing software defects is important, a lot went wrong with how this CVE was reported and the motivations behind its disclosure, and for my own catharsis, I want to talk about this.

The origin of pkgconf

To hopefully explain why I am so bothered by all of this, let's first understand the history of pkgconf: a project I began noodling on in March 2011.

2011 was a particularly rough year for me. In January, my father was diagnosed with pancreatic cancer, and declined to disclose this to anyone. When I came back to Oklahoma to visit my parents in early March, I walked into my dad's house and found him jaundiced. I drove him to the emergency room, and was informed that he only had a few months to live due to the pancreatic cancer he allowed to progress to stage 4. This was shocking to me, especially considering I was 23 at the time. The stress of it led to me breaking up with my boyfriend at the time.

I did the only thing I could do given the situation: spent as much time with him as possible. The hospital had installed Wi-Fi earlier that year, so I was able to take my computer and work on my projects while I spent time with him. This worked out well, because it gave us a common ground of subjects to talk about: my dad was the person who originally pushed me into getting involved with software engineering as a profession in the first place. While he himself never worked as a software engineer, he developed a number of small utilities and demo programs for MS-DOS. Later, he became heavily interested in BSD, and then Slackware.

During this time period, pkg-config 0.26 was released, which required either a complicated bootstrap procedure to satisfy the glib2 requirements by hand, or a pre-existing copy of pkg-config to exist. Alpine was impacted by this bootstrap problem, and we ultimately decided to hold back pkg-config on the 0.25 version because the bootstrapping problem was too complex to solve for the pending release.

At the same time, I was looking for something, anything to work on that would serve as a distraction and conversation piece. This created an opportunity: I could work on a replacement pkg-config implementation that did not have the bootstrap requirement that the freedesktop implementation required. I began working on pkgconf, specifically the .pc file parsing and dependency graph walking code, while my dad was in the hospital. He found talking about it fascinating, and so we discussed the various aspects of implementing a parser, and walking dependency graphs in C. In a limited way, it was a project we collaborated on, in that I would write code, tell him about it, and he'd point out ways my assumptions probably didn't hold true.

After he passed away, I quit working on it for a while, until a few friends of mine decided to pick it up and experiment with it in Gentoo and FreeBSD. Sadly, my father passed away in early April, so he didn't get to see the first viable release, or to see pkgconf integrated into Linux distributions.

Maintaining a production-quality build tool at scale

These days, pkgconf is basically everywhere. It is the default pkg-config implementation in every mainstream Linux distribution except Ubuntu. It is used heavily in embedded Linux development and in plenty of other scenarios. My distfiles server, distfiles.dereferenced.org, logs dozens of pkgconf downloads every second of the day.

The success of pkgconf is not without its problems though. There are aspects of the software which, given what I know today, I would probably implement substantially differently. The technical debt is real. I've been working, however, as time permits, to improve these problems in the pkgconf-1.9.x release series.

But when pkgconf does something which is unexpected, and breaks a user's build... those interactions are rarely fun. Many times, the user with the issue shows up on the issue tracker, or worse, my personal inbox in a bad mood, which results in a triage experience that is suboptimal for everyone involved. Thankfully, this doesn't happen so much anymore, as we have worked hard to balance compatibility and developer-friendly output from the tool.

But as smooth as things are these days, maintaining a production build tool imposes a lot of burden that you cannot begin to expect until you've done it before. It is not enough to simply tell a user that the framework he is using is doing things wrong, for example, underspecifying its dependencies. You must consider "self-service" features: ones which allow the user to diagnose the issues in his build and correct them himself. By doing so, you provide the user with a good experience, and keep support requests from annoyed users much lower. All of this has to be designed and implemented in production build tools.

The appearance of "competition"

The past weekend has been a wild ride for me. I recently moved to Seattle, and have been getting settled in. A few people brought u-config: a new, lean pkg-config clone to my attention. At first, I shrugged it off, and mostly would have continued to do so. An implementation of pkg-config on Windows would be good for me, personally, as I do not develop pkgconf on Windows, and different people who contribute to the maintenance of pkgconf's Windows support have different goals. This has led to some significant fragmentation of pkgconf on the Windows side, with different tools bundling it supporting specific aspects of the pkg-config format in different ways.

I have a number of social and technical observations about u-config. Some good, some not so good. To start off with the social aspects: I don't particularly appreciate the level of aggression directed toward pkgconf. While that alone would not normally be a turn-off for me (one has to have a reasonably thick skin when being a FOSS maintainer), casually dropping the "billion laughs" 0day with a snyde comment about how we should use ASan (we do) when developing pkgconf was too much, and the bug itself (a mistake in accounting for available buffer space during variable expansion) was overstated.

There is a lot of good things about u-config. By focusing on only the minimally required functionality, the author was able to write an excellent tool which has the potential to someday be a replacement to pkgconf. I am open to talking about such a deprecation, even.

However, after the initial blogpost (which contained disinformation about both freedesktop pkg-config and pkgconf), there was additional disinformation from another person who is enthusiastic about the u-config project. Notably, he submitted a patch, which amongst other things, could be misinterpreted by readers to conclude that pkgconf does not consider /usr/include as a system include path. When configured correctly, it definitely does. For example, on Alpine Linux:

pestilence:~$ pkgconf --dump-personality
Triplet: default
DefaultSearchPaths: /usr/lib/pkgconfig /usr/share/pkgconfig
SystemIncludePaths: /usr/include
SystemLibraryPaths: /usr/lib 

But this particular disinformation was merged by the author of the software, without regard for checking the comment for disinformation, despite how absurd it would be if it were true.

Update (28 January 2023): Since the initial publication of this blog, the comment introduced in the above patch has been corrected to reflect a specific edge case relating to -I/usr/include verses -I /usr/include. I believe the discrepancy in the handling of both fragments to be a bug, one which was not reported to me, but rather discussed only in the source code comment. The contributor of the patch in question to u-config, in particular, has pointed the fact that they later changed the source code comment to clarify the issue, as part of an attempt to deflect from the point of this blog: discussing how the u-config author and contributors have chosen to engage in bad faith with other pkg-config implementations (especially pkgconf) from the beginning of their project. While I plan to fix the non-reported discrepancy in the next pkgconf release, I will note that the u-config authors have so far chosen to not handle this edge case.

pkg-config implementations do specific things for a reason

In the UNIX environment, the behavior of the system toolchain is static and must be well-defined. Tools which act adjacently to the system C toolchain must behave in ways which are aware of how the C toolchain is configured to behave. This is why pkgconf checks several different environment variables to learn about how the system toolchain has been configured, and what deviations, if any, have been configured via the environment.

A frequent patten in UNIX pkg-config files is to write things like:

prefix=/usr
includedir=${prefix}/include
libdir=${prefix}/lib
Package: whatever
Version: 0
Cflags: -I${includedir}
Libs: -L${libdir} -lwhatever

On Windows, pkg-config implementations have --define-prefix, which is used to override the ${prefix} variable for this reason.

If pkg-config is not aware of /usr/include being a system include path, then a disaster can happen when querying for multiple dependencies at the same time. Consider this other pkg-config file:

prefix=/usr
includedir=${prefix}/include/OtherLib
libdir=${prefix}/lib
Package: OtherLib
Version: 0
Cflags: -I${includedir}
Libs: -L${libdir} -lother

Now lets say that OtherLib has a /usr/include/OtherLib/math.h file which uses #include_next to enhance the math.h header. A real-world example of a library which does this is libbsd. Well, if you query pkg-config with pkg-config --cflags --libs whatever OtherLib, then you will get:

pestilence:~$ pkgconf --with-path=examples/ --personality=examples/broken.personality whatever OtherLib
-I/usr/include -I/usr/include/OtherLib -lwhatever -lother

This means that /usr/include/math.h will be preferred over /usr/include/OtherLib/math.h, and your build will fail.

So this type of filtering, and the other types of filtering that pkgconf does, is very important in the UNIX environment. The author of u-config will unfortunately have to learn these things one by one as users come to him with bug reports.

There is probably an alternate reality where u-config and pkgconf work together to deprecate pkgconf, and someday I hope that will be the reality here. But until the disinformation and putdowns are addressed, it will unfortunately be impossible to collaborate.

Anyway, if you got through all of this, thanks for reading, I guess.