Unfortunately libfetch operates on raw sockets and is sending
each HTTP request line using separate syscall which causes the
HTTP request to be sent as multiple packets over the wire in most
configurations. This is not good for performance, but can also
cause subtle breakage if there's DPI firewall that does not get
the Host header.
Incidentally, it seems that on BSDs libfetch already sets
TCP_NOPUSH optimize the packetization. This commit adds same
logic for using TCP_CORK if available. When using TCP_CORK
there is no requirement to set TCP_NODELAY as uncorking will
also cause immediate send. Keep TCP_NODELAY in the fallback
codepaths.
Long term, it might make sense to replace or rewrite libfetch
to use application level buffering.