Also, improve a comment and shorten another, change a 'for' to a 'while'
(as the end point is not known), and rename a parameter from a single
letter to a word.
And in the bargain get rid of some duplicate code.
This makes a binary without UTF-8 support slightly slower, but that's
not important -- it is more than fast enough anyway. Important is that
the most used and longest code path, the UTF-8 case, becomes faster.
Note that 'is_cntrl_mbchar()' will fall back to 'is_cntrl_char()' for
a non-UTF-8 build, so the deleted piece of code really was equivalent
with the remaining piece for that case.
Again, if the most significant bit of a UTF-8 byte is zero, it means
the character is a single byte and we can skip the call of mblen(),
*and* if the character is one byte it also occupies just one column,
because all ASCII characters are single-column characters -- apart
from control codes.
This partially addresses https://savannah.gnu.org/bugs/?51491.
For UTF-8, if the most significant bit of a byte is zero, it means the
character is just a single byte and we can skip the call of mblen().
For files consisting of pure ASCII bytes (between 0x00 and 0x7F), this
change reduces the counting time of mbstrlen() by ninety six percent.
This partially addresses https://savannah.gnu.org/bugs/?50406.
When mbtowc() is never called with anything less than MAXCHARLEN as
the length parameter, it will apparently not get confused and will
not need to be reset.
Each leading tab is converted to two tabs, and any leading four spaces
is converted to one tab. The intended tab size (for keeping most lines
within 80 columns) is now four.
Instead of always stepping back four bytes and then tentatively
moving forward again (which is wasteful when most codes are just
one or two bytes long), inspect the preceding bytes one by one
and begin the move forward at the first valid starter byte.
This reduces the backwards searching time by close to 40 percent.
If the length of the haystack is smaller than the length of the needle,
this means that also the length of the tail will be smaller -- because
pointer will be bigger than or equal to haystack -- so the pointer gets
readjusted to be a needle length before the end of the haystack, which
means that it ends up /before/ the haystack: thus the while loop will
never run.
On average, this saves some 200 nanoseconds per line.
The interval 2013-2017 for the Free Software Foundation is valid
because in those years there were releases with changes by either
Chris or David, and the GNU maintainers guide advises to mention
a new year in all files of a package, not just in the ones that
actually changed, and be done with it for the rest of the year.
The platform's default char type might be signed, which could cause
problems in 8-bit locales.
This addresses https://savannah.gnu.org/bugs/?50289.
Reported-by: Hans-Bernhard Broeker <HBBroeker@T-Online.de>
In path names and file names, 0x0A means an embedded newline and
should be shown as ^J, but in anything related to the file's data,
0x0A is an encoded NUL and should be displayed as ^@.
So... switch mode at the two main entry points into the "file system"
(reading in a file, and writing out a file), and also when drawing the
titlebar. Switch back to the default mode in the main loop.
This fixes https://savannah.gnu.org/bugs/?49893.
The byte 0x0A means 0x00 *only* when it is found in nano's internal
representation of a file's data, not when it occurs in a file name.
This fixes the second part of https://savannah.gnu.org/bugs/?49867.