ariadne.space/content/blog/the-tragedy-of-gethostbynam...

73 lines
7.1 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

---
title: "the tragedy of gethostbyname"
date: "2022-03-27"
---
A frequent complaint expressed on a certain website about Alpine is related to the deficiencies regarding the musl DNS resolver when querying large zones. In response, it is usually mentioned that applications which are expecting reliable DNS lookups should be using a dedicated DNS library for this task, not the `getaddrinfo` or `gethostbyname` APIs, but this is usually rebuffed by comments saying that these APIs are fine to use because they are allegedly reliable on GNU/Linux.
For a number of reasons, the assertion that DNS resolution via these APIs under glibc is more reliable is false, but to understand why, we must look at the history of why a `libc` is responsible for shipping these functions to begin with, and how these APIs evolved over the years. For instance, did you know that `gethostbyname` originally didn't do DNS queries at all? And, the big question: why are these APIs blocking, when DNS is inherently an asynchronous protocol?
Before we get into this, it is important to again restate that if you are an application developer, and your application depends on reliable DNS performance, you must absolutely use a dedicated DNS resolver library designed for this task. There are many libraries available that are good for this purpose, such as [c-ares](https://c-ares.org/), [GNU adns](https://www.gnu.org/software/adns/), [s6-dns](https://skarnet.org/software/s6-dns/) and [OpenBSD's libasr](https://github.com/OpenSMTPD/libasr). As should hopefully become obvious at the end of this article, the DNS clients included with `libc` are designed to provide basic functionality only, and there is no guarantee of portable behavior across client implementations.
## the introduction of `gethostbyname`
Where did `gethostbyname` come from, anyway? Most people believe this function came from BIND, the reference DNS implementation developed by the Berkeley CSRG. In reality, it was introduced to BSD in 1982, alongside the `sethostent` and `gethostent` APIs. I happen to have a copy of the 4.2BSD source code, so here is the implementation from 4.2BSD, which was released in early 1983:
```c
struct hostent *
gethostbyname(name)
register char *name;
{
register struct hostent *p;
register char **cp;
sethostent(0);
while (p = gethostent()) {
if (strcmp(p->h_name, name) == 0)
break;
for (cp = p->h_aliases; *cp != 0; cp++)
if (strcmp(*cp, name) == 0)
goto found;
}
found:
endhostent();
return (p);
}
```
As you can see, the 4.2BSD implementation only checks the `/etc/hosts` file and nothing else. This answers the question about why `gethostbyname` and its successor, `getaddrinfo` do DNS queries in a blocking way: they did not want to introduce a replacement API for `gethostbyname` that was asynchronous.
## the introduction of DNS to `gethostbyname`
DNS resolution was first introduced to `gethostbyname` in 1984, when it was introduced to BSD. [This version, which is too long to include here](https://github.com/dank101/4.3BSD-Reno/blob/00328b5a67ffe35e67baeba8f7ab75af79f7ae64/lib/libc/net/gethostnamadr.c#L213) also translated dotted-quad IPv4 addresses into a `struct hostent`. In essence, the 4.3BSD implementation does the following:
1. If the requested hostname begins with a number, try to parse it as a dotted quad. If this fails, set `h_errno` to `HOST_NOT_FOUND` and bail. Yes, this means 4.3BSD would fail to resolve hostnames like `12-34-56-78.static.example.com`.
2. Attempt to do a DNS query using `res_search`. If the query was successful, return the first IP address found as the `struct hostent`.
3. If the DNS query failed, fall back to the original `/etc/hosts` searching algorithm above, now called `_gethtbyname` and using `strcasecmp` instead of `strcmp` (for consistency with DNS).
A fixed version of this algorithm was also included with BIND's `libresolv` as `res_gethostbyname`, and the `res_search` and related functions were imported into BSD libc from BIND.
## standardization of `gethostbyname` in POSIX
The `gethostbyname` and `getaddrinfo` APIs were first standardized in X/Open Networking Services Issue 4 (commonly referred to as XNS4) specification, which itself was part of the X/Open Single Unix Specification version 3 (commonly referred to as SUSv3), released in 1995. Of note, X/Open tried to deprecate `gethostbyname` in favor of `getaddrinfo` as part of the XNS5 specification, [removing it entirely except for a mention in their specification for `netdb.h`](https://pubs.opengroup.org/onlinepubs/009619199/netdbh.htm#tagcjh_06_02).
Later, it returned [as part of POSIX issue 6, released in 2004](https://pubs.opengroup.org/onlinepubs/009696799/functions/gethostbyaddr.html). That version says:
> **Note:** In many cases it is implemented by the Domain Name System, as documented in RFC 1034, RFC 1035, and RFC 1886.
>
> POSIX issue 6, IEEE 1003.1:2004.
Oh no, what is this about, and do application developers need to care about it? Very simply, it is about the [Name Service Switch](https://en.wikipedia.org/wiki/Name_Service_Switch), frequently referred to as NSS, which allows the `gethostbyname` function to have hotpluggable implementations. The Name Service Switch was a feature introduced to Solaris, which was implemented to allow support for Sun's NIS+ directory service.
As developers of other operating systems wanted to support software like Kerberos and LDAP, it quickly was reimplemented in other systems as well, such as GNU/Linux. These days, systems running systemd frequently use this feature in combination with a custom NSS module named `nss-systemd` to force use of `systemd-resolved` as the DNS resolver, which has different behavior than the original DNS client derived from BIND that ships in most `libc` implementations.
An administrator can disable support for DNS lookups entirely, simply by editing the `/etc/nsswitch.conf` file and removing the `dns` module, which means application developers depending on reliable DNS service need to care a lot about this: it means on systems with NSS, your application cannot depend on `gethostbyname` to actually support DNS at all.
## musl and DNS
Given the background above, it should be obvious by now that musl's DNS client was written under the assumption that applications that have specific requirements for DNS would be using a specialized library for this purpose, as `gethostbyname` and `getaddrinfo` are not really suitable APIs, since their behavior is entirely implementation-defined and largely focused around blocking queries to a directory service.
Because of this, the DNS client was written to behave as simply as possible, but the use of DNS for bulk data distribution, such as in DNSSEC, DKIM and other applications, have led to a desire to implement support for DNS over TCP as an extension to the musl DNS client.
In practice, this will fix the remaining complaints about the musl DNS client once it lands in a musl release, but application authors depending on reliable DNS performance should really use a dedicated DNS client library for that purpose: using APIs that were designed to simply parse `/etc/hosts` and had DNS support shoehorned into them will always deliver unreliable results.