7.1 KiB
title | date |
---|---|
the tragedy of gethostbyname | 2022-03-27 |
A frequent complaint expressed on a certain website about Alpine is related to the deficiencies regarding the musl DNS resolver when querying large zones. In response, it is usually mentioned that applications which are expecting reliable DNS lookups should be using a dedicated DNS library for this task, not the getaddrinfo
or gethostbyname
APIs, but this is usually rebuffed by comments saying that these APIs are fine to use because they are allegedly reliable on GNU/Linux.
For a number of reasons, the assertion that DNS resolution via these APIs under glibc is more reliable is false, but to understand why, we must look at the history of why a libc
is responsible for shipping these functions to begin with, and how these APIs evolved over the years. For instance, did you know that gethostbyname
originally didn't do DNS queries at all? And, the big question: why are these APIs blocking, when DNS is inherently an asynchronous protocol?
Before we get into this, it is important to again restate that if you are an application developer, and your application depends on reliable DNS performance, you must absolutely use a dedicated DNS resolver library designed for this task. There are many libraries available that are good for this purpose, such as c-ares, GNU adns, s6-dns and OpenBSD's libasr. As should hopefully become obvious at the end of this article, the DNS clients included with libc
are designed to provide basic functionality only, and there is no guarantee of portable behavior across client implementations.
the introduction of gethostbyname
Where did gethostbyname
come from, anyway? Most people believe this function came from BIND, the reference DNS implementation developed by the Berkeley CSRG. In reality, it was introduced to BSD in 1982, alongside the sethostent
and gethostent
APIs. I happen to have a copy of the 4.2BSD source code, so here is the implementation from 4.2BSD, which was released in early 1983:
struct hostent *
gethostbyname(name)
register char *name;
{
register struct hostent *p;
register char **cp;
sethostent(0);
while (p = gethostent()) {
if (strcmp(p->h_name, name) == 0)
break;
for (cp = p->h_aliases; *cp != 0; cp++)
if (strcmp(*cp, name) == 0)
goto found;
}
found:
endhostent();
return (p);
}
As you can see, the 4.2BSD implementation only checks the /etc/hosts
file and nothing else. This answers the question about why gethostbyname
and its successor, getaddrinfo
do DNS queries in a blocking way: they did not want to introduce a replacement API for gethostbyname
that was asynchronous.
the introduction of DNS to gethostbyname
DNS resolution was first introduced to gethostbyname
in 1984, when it was introduced to BSD. This version, which is too long to include here also translated dotted-quad IPv4 addresses into a struct hostent
. In essence, the 4.3BSD implementation does the following:
- If the requested hostname begins with a number, try to parse it as a dotted quad. If this fails, set
h_errno
toHOST_NOT_FOUND
and bail. Yes, this means 4.3BSD would fail to resolve hostnames like12-34-56-78.static.example.com
. - Attempt to do a DNS query using
res_search
. If the query was successful, return the first IP address found as thestruct hostent
. - If the DNS query failed, fall back to the original
/etc/hosts
searching algorithm above, now called_gethtbyname
and usingstrcasecmp
instead ofstrcmp
(for consistency with DNS).
A fixed version of this algorithm was also included with BIND's libresolv
as res_gethostbyname
, and the res_search
and related functions were imported into BSD libc from BIND.
standardization of gethostbyname
in POSIX
The gethostbyname
and getaddrinfo
APIs were first standardized in X/Open Networking Services Issue 4 (commonly referred to as XNS4) specification, which itself was part of the X/Open Single Unix Specification version 3 (commonly referred to as SUSv3), released in 1995. Of note, X/Open tried to deprecate gethostbyname
in favor of getaddrinfo
as part of the XNS5 specification, removing it entirely except for a mention in their specification for netdb.h
.
Later, it returned as part of POSIX issue 6, released in 2004. That version says:
Note: In many cases it is implemented by the Domain Name System, as documented in RFC 1034, RFC 1035, and RFC 1886.
POSIX issue 6, IEEE 1003.1:2004.
Oh no, what is this about, and do application developers need to care about it? Very simply, it is about the Name Service Switch, frequently referred to as NSS, which allows the gethostbyname
function to have hotpluggable implementations. The Name Service Switch was a feature introduced to Solaris, which was implemented to allow support for Sun's NIS+ directory service.
As developers of other operating systems wanted to support software like Kerberos and LDAP, it quickly was reimplemented in other systems as well, such as GNU/Linux. These days, systems running systemd frequently use this feature in combination with a custom NSS module named nss-systemd
to force use of systemd-resolved
as the DNS resolver, which has different behavior than the original DNS client derived from BIND that ships in most libc
implementations.
An administrator can disable support for DNS lookups entirely, simply by editing the /etc/nsswitch.conf
file and removing the dns
module, which means application developers depending on reliable DNS service need to care a lot about this: it means on systems with NSS, your application cannot depend on gethostbyname
to actually support DNS at all.
musl and DNS
Given the background above, it should be obvious by now that musl's DNS client was written under the assumption that applications that have specific requirements for DNS would be using a specialized library for this purpose, as gethostbyname
and getaddrinfo
are not really suitable APIs, since their behavior is entirely implementation-defined and largely focused around blocking queries to a directory service.
Because of this, the DNS client was written to behave as simply as possible, but the use of DNS for bulk data distribution, such as in DNSSEC, DKIM and other applications, have led to a desire to implement support for DNS over TCP as an extension to the musl DNS client.
In practice, this will fix the remaining complaints about the musl DNS client once it lands in a musl release, but application authors depending on reliable DNS performance should really use a dedicated DNS client library for that purpose: using APIs that were designed to simply parse /etc/hosts
and had DNS support shoehorned into them will always deliver unreliable results.