This makes the handling of plain ASCII a tiny bit slower, but it
affects only the users of --constantshow without --minibar, so...
All other uses of mbstrlen() and collect_char() are not in speed-
critical code paths.
Since the previous commit, mbwidth() is used only to determine whether
a character is either double width or zero width. There is no need to
return the actual width of the character; a simple yes or no is enough.
Transforming mbwidth() into is_doublewidth() also allows streamlining
it and is_zerowidth() a bit, so that they become slightly faster.
The number of bytes in the character were determined twice: first in
mbwidth() and then in char_length(). Do it just once, in mbtowide().
Also, avoid calling is_cntrl_char(), because it does unneeded checks
when we already know that the high bit is set.
This duplicates some code, but advance_over() is called a lot, so it
is important that it is as fast as possible.
This shouldn't slow down plain ASCII, as the extra checks (use_utf8
and *string < 0xA0) are done only for non-ASCII (apart from DEL).
The 'start_index' was in index in the given text, while 'index' is an
index in the displayable string. Having both of them using 'index' in
their name was somewhat confusing.
For a normal file (without overlong lines) the strlen() wasn't much
of a problem. But when there are very long lines, it wasted time
counting stuff that wouldn't be displayed on the current row anyway,
and reserved *far* too much memory for the displayable string.
Problem existed since commit cf0eed6c from five years ago that traded
a continuous comparison (of the used space with the reserved space)
against a one-time big reservation up front involving a strlen().
In retrospect that was not a good trade-off when softwrapping.
The extra check (charwidth == 0) is incurred only by characters that
have their high bit set, so the average file (with only ASCII) is not
affected by this -- it just loses an unneeded call of strlen().
In UTF-8 valid multibyte characters are at most four bytes long,
and now that we no longer make use of mblen() and mbtowc() from
the underlying system, we won't get five- or six-byte sequences
mistakenly reported as valid (by glibc). So it is always enough
to reserve space for just four bytes per character.
Calling wctomb() with NULL as the first parameter returns zero in a
UTF-8 locale, meaning that there is no state, so there is no point
in resetting it either.
The two calls of draw_row() are each immediately preceded by a call to
display_string(), which has already determined from which x position
and until which x position in the relevant line the current row will
be drawn -- doing this again in draw_row() is a waste of time. Even
though it is ugly, pass the two data points from one function to the
other via global variables.
For normal files (without overlong lines), this saves on average some
fifty calls of advance_over() per row. When softwrapping a file with
overlong lines, the savings for each softwrapped chunk are much higher.
(Well, it now checks that ^G is still the first shortcut that is bound
to 'do_help', but that is good enough: if the user did any rebinding,
they probably do not need any reminder about how to invoke 'Help'.)
This fixes https://savannah.gnu.org/bugs/?60315.
Reported-by: Robert Goulding <goulding.2@nd.edu>
This saves a function call, and the passing and checking of the
MAXCHARLEN parameter, and the checking whether wc is maybe NULL
(which for nano is never the case), and who knows what other
overheads mbtowc() has, and our workaround for glibc.
Code was written after looking at gnulib/lib/mbrtowc-impl-utf8.h.
Most implementations of mblen() do a call to mbtowc(), which is
a waste of time when all we want to know is the number of bytes
(and when we already know that we're using UTF-8 and that the
first byte is at least 0xC2).
(This also avoids burdening correct implementations with the
workaround that was needed only for glibc.)
Code was written after looking at gnulib/lib/mbrtowc-impl-utf8.h.
The mblen() and mbtowc() functions will happily return 4 or 5 or 6
for byte sequences that start with 0xF4 0x90 or higher. But those
sequences encode for U+110000 or higher, which are not valid Unicode
code points. The libc of FreeBSD and OpenBSD and Alpine correctly
return -1 for such sequences. Make nano behave correctly also when
linked against glibc, so that invalid sequences are always presented
as a series of invalid bytes and never as a single invalid code.
This fixes https://savannah.gnu.org/bugs/?60262.
Bug existed since before version 2.0.0.
The call of this function in make_mbchar() does not add anything,
because wctomb() already returns -1 for codes U+D800 to U+DFFF,
and parse_verbatim_kbinput() already rejects anything that starts
with U+11.... or higher, so make_mbchar() is never called for codes
beyond U+10FFFF.
And the call in display_string() just needs to check for wc <= 0x10FFFF
because mbtowc() already returns -1 for codes U+D800 to U+DFFF.
That is, accept U+FDD0 to U+FDEF, and accept U+xxFFFE and U+xxFFFF
for xx from 00 to 10 hex, being the 66 reserved "non-characters".
It may not be wise of the user to input these "things" (by typing
their code after M-V), but the codes are valid Unicode code points
and should not be rejected.
See https://www.unicode.org/faq/private_use.html#nonchar8 et al.
This fixes https://savannah.gnu.org/bugs/?60263.
Bug existed since before version 2.0.0.
When inserting a file into the current buffer, the 'fmt' element will
already be set. When we avoid overwriting the current value of 'fmt'
(when it's other than UNSPECIFIED), we don't need to save and restore
the value when inserting a file.
When saving the buffer under a different name, it should by default
have the same format as the original file.
This fixes https://savannah.gnu.org/bugs/?60278.
Bug existed since version 2.6.0, commit 0293eac1.
This improves the fix for https://savannah.gnu.org/bugs/?60269,
by not dropping error messages that happen before a buffer is opened.
This basically reverts commit b63c90bf from a year ago, except that
it now always deletes the created buffer when the user does not want
to override the lock file, also when it is the only buffer.
Set the 'format' of a file only when it has been fully read in,
so that this field can be used to indicate that any later error
message cannot be meant for this buffer.
This fixes https://savannah.gnu.org/bugs/?60269.
Bug existed since commit 6bf52dcc from yesterday.
Make sure there is an 'openfile' record before trying to save an
error message in this record.
This fixes https://savannah.gnu.org/bugs/?60268.
Bug existed since commit ede64d7e from yesterday.
Labels may contain digits (after the first character).
And the colon after "default" should not be colored.
Inspired-by: Hussam al-Homsi <sawuare@gmail.com>
When opening multiple files and some of them had an error, only the
first message was shown and the others were lost -- indicated only
by three dots. Improve upon this by storing the first error message
for each buffer and showing this message when the buffer is first
switched to.
Requested-by: Mike Frysinger <vapier@gentoo.org>
That is: reserve for the current line and current character the number
of positions needed for the total number of lines and characters, and
reserve two positions for both the current column and the total number
of columns. This will keep all nine numbers in the output in the same
place -- as long as there are no lines with more than 99 columns. In
this latter case there will still be some jitter, but all-in-all the
output is much stabler than it was.
Suggested-by: Mike Frysinger <vapier@gentoo.org>
The number of lines in a group is the difference in line numbers
between the last line and the first line *plus one*.
This fixes https://savannah.gnu.org/bugs/?60104.
Bug existed since version 2.9.0, since indenting and unindenting
became undoable, and commit f722c532 formed a part of that.
Allocating it again would leak the existing space.
This fixes https://savannah.gnu.org/bugs/?60172.
Reported-by: Mike Frysinger <vapier@gentoo.org>
Bug existed since version 5.6, commit 1fdd23d3.
When the version number is a trio, the version string will occupy
ten bytes and the terminating NUL byte would not be written (which
was not a problem as byte 12 of the lock data is zero anyway).
But it's better to not have the compiler complain, so allow writing
the terminating NUL byte outside of the ten bytes reserved for the
version string.