Commit Graph

297 Commits (bc36813349178c805b670c03130042caaae0a845)

Author SHA1 Message Date
Benno Schulenberg 6360e4170a copyright: update the years for the FSF 2021-01-11 14:22:51 +01:00
Benno Schulenberg 24e5f956d0 build: fix compilation when configured with --disable-utf8
This fixes https://savannah.gnu.org/bugs/?59842.
Reported-by: Ruben van Wyk <admin@knwip.com>

Bug existed since commit 5129e718 from two days ago.
2021-01-08 12:05:55 +01:00
Benno Schulenberg 10b99d8ac0 chars: short-circuit determining the width of characters under U+0300
The combining characters (that are zero-width) start at U+0300.
After that it's pretty much chaos, width-wise.

The mbwidth() function is not called for control characters (whose
representation takes up two columns), as they are handled separately.

The calls of mbwidth() that *can* happen with a control character as
argument are only to determine whether the character is zero-width,
and then it doesn't matter whether the exact width is 1 or 2.
2021-01-06 20:15:14 +01:00
Benno Schulenberg 5129e718d7 chars: speed up the handling of invalid UTF-8 starter bytes
The first byte of a multi-byte UTF-8 sequence must be in the range
0xC2...0xFF.  Any other byte cannot be a starter byte and can thus
immediately be treated as a single byte.
2021-01-06 12:41:49 +01:00
Benno Schulenberg a4675acdba copyright: update to the current year for significantly changed files 2020-11-30 12:01:47 +01:00
Benno Schulenberg 687efd210c moving: skip combining characters and other zero-width characters
This makes the cursor move smoothly left and right -- instead of
"stuttering" when passing over a zero-width character.

Pressing <Delete> on a normal (spacing) character also deletes
any zero-width characters after it.  But pressing <Backspace>
while the cursor is placed after a zero-width character, just
deletes that zero-width character.  The latter behavior allows
deleting and retyping just the combining diacritic of a character
instead of the whole character.

This addresses https://savannah.gnu.org/bugs/?50773.
Requested-by: Mike Frysinger <vapier@gentoo.org>
2020-11-17 10:21:50 +01:00
Benno Schulenberg 5a635db262 chars: reduce searching time with roughly 85 percent for plain ASCII
Make case-insensitive searching in a UTF-8 locale eight times faster
when the actual characters involved are plain ASCII.

This makes us faster than 'less', and as fast as Vim and Emacs.

The disadvantage of this change is that searching for a string that
begins with a multibyte character is nearly ten times slower than
searching for one that begins with an ASCII character.  This may be
unsettling when searching a huge file first for a simple ASCII string
and later for a UTF-8 one.  Doing this second search, the user might
get impatient: "Why is it taking so long?"

(This patch fell through the cracks four years ago, when I worked on
the searching code.  It sat in a branch on top of other changes that
I never applied because I made different improvements.  The speedup
at the time, on that machine, was only around sixty percent, though.
But measuring it now again on the same machine, it clocks in at an
82 percent reduction with -O0 and an 87 percent reduction with -O2.)
2020-09-01 19:35:34 +02:00
Hussam al-Homsi c87bc1d55f tweaks: stop casting the return of malloc() and friends
Those casts are redundant, and sometimes ugly.  And as the types of
variables are extremely unlikely to change any more at this point,
the protection they offer against miscompilations is moot.

Signed-off-by: Hussam al-Homsi <sawuare@gmail.com>
2020-08-31 12:17:27 +02:00
Benno Schulenberg 8249f3560f tweaks: normalize the indentation after the previous change 2020-07-20 19:46:27 +02:00
Benno Schulenberg dd1b16cd54 tweaks: trim an ASCII case, as the function is called only for UTF-8 2020-07-20 19:37:40 +02:00
Benno Schulenberg 90f6342fd1 tweaks: rename two header files, to be distinct and not an abbreviation 2020-06-20 12:09:31 +02:00
Benno Schulenberg 547de4a7bb counting: count words correctly also when --wordchars is used
It should give the same result as 'wc -w' as long as the content
of 'wordchars' does not affect the counting.

This fixes https://savannah.gnu.org/bugs/?58123.

Bug existed since version 2.6.2, since the --wordchars option was
introduced in commit 6f12992c.
2020-04-06 11:17:43 +02:00
Benno Schulenberg f528ced22b tweaks: use a symbol instead of a number, and drop two unneeded casts 2020-03-22 14:29:10 +01:00
Benno Schulenberg 4ce2e146ea tweaks: elide three unneeded #defines
Backspace and Tab and Carriage Return have standard backslash escapes.
2020-03-19 14:40:51 +01:00
Benno Schulenberg 9917a05f04 tweaks: exclude a function when compiled without spell-checking support 2020-03-13 11:59:08 +01:00
Benno Schulenberg fcda76f684 build: restore non-UTF8 fallbacks, to allow compiling with --disable-utf8
Commits b2c63c3d and 004af03e from yesterday mistakenly removed those
calls.
2020-03-13 11:43:31 +01:00
Benno Schulenberg 21ed79938e tweaks: normalize the indentation after the previous two changes 2020-03-12 15:54:19 +01:00
Benno Schulenberg 004af03ea5 tweaks: remove non-UTF-8 code from three more functions 2020-03-12 15:54:19 +01:00
Benno Schulenberg b2c63c3d3c chars: optimize a function for the most common blanks: space and tab
Also, do not bother to provide separate code for the non-UTF-8 case.
Instead, optimize for plain ASCII characters.
2020-03-12 15:54:19 +01:00
Benno Schulenberg ae139021eb tweaks: rename four more functions, to get rid of an abbreviation
Also, improve their comments.
2020-03-12 15:54:19 +01:00
Benno Schulenberg f6dedf3598 tweaks: rename another function, to remove the obscuring abbreviation 2020-03-12 15:54:19 +01:00
Benno Schulenberg 8003842e5c tweaks: rename a function, to remove an obscuring abbreviation
The "mb" made the name harder to read.  Also, the function is
not only for multibyte characters but for any character.
2020-03-12 15:53:49 +01:00
Benno Schulenberg 1d4411a474 tweaks: elide a function call, by copying a byte directly
Now all remaining calls of measured_copy() have a "+ 1" in their
second argument, and can thus be simplified.  And each of those
calls is followed by terminating the string with a NUL byte, so
thát can be pulled into the function.
2020-02-20 16:38:14 +01:00
Benno Schulenberg a9f7277b1b tweaks: remove a now-unused helper function 2020-02-16 12:33:29 +01:00
Benno Schulenberg 0a31a9aa38 tweaks: make two conditions more direct, and thus elide two functions
Using straightforward comparisons is clearer and faster and shorter.

Again, note that this does not filter out 0x7F (DEL).  But that is
okay, as that code will never be returned from get_kbinput().
2020-02-12 11:38:33 +01:00
Benno Schulenberg 2148e857e5 copyright: update the years for significantly changed files 2020-01-15 12:11:56 +01:00
Benno Schulenberg afa4c6b9fc copyright: update the years for the FSF 2020-01-15 11:42:38 +01:00
Benno Schulenberg 3c695664ec tweaks: elide a function call for the plain ASCII case
When dealing with a plain, seven-bit ASCII character, don't bother
calling is_cntrl_mbchar() but determine directly whether it is a
control character.  Also reshuffle things so that we don't compare
charlen == 1 when we already know it is 1.
2019-10-21 18:52:44 +02:00
Benno Schulenberg 8a7634f070 tweaks: rename two parameters plus a variable, to match others
Also improve a comment and normalize an indentation.
2019-10-21 13:02:17 +02:00
Benno Schulenberg fa88fcc8f2 tweaks: rename a function, and elide a parameter that is always NULL
After the previous change, all remaining calls of parse_mbchar() have
NULL as their third parameter.  So, drop that parameter and remove the
chunk of code that handles it.  Also rename the function, as there are
already too many functions that start with "parse".
2019-10-21 12:35:14 +02:00
Benno Schulenberg c2d8641f01 chars: add a faster version of the character-parsing function
It elides a parameter that is always NULL, and elides two ifs
that always take the same path.
2019-10-21 12:24:23 +02:00
Benno Schulenberg 17c16a4bf5 tweaks: rename a function and elide its first parameter 2019-10-20 09:45:58 +02:00
Benno Schulenberg 31ff7ead73 tweaks: move a function to before its callers and next to its kind
Also, improve the indentation of two random lines.
2019-10-03 11:24:01 +02:00
Benno Schulenberg 5398d986ef tweaks: speed up determining the width of plain ASCII characters 2019-10-03 11:09:21 +02:00
Benno Schulenberg b02dccc51f tweaks: elide a function from a non-UTF8 build
In a non-UTF8 build, mbwidth() returns always 1, so it is pointless
to call that function and compare its result to zero then.

Also, don't bother special-casing the function for a non-UTF8 locale.
2019-10-03 10:48:10 +02:00
Benno Schulenberg 3158133edd tweaks: rename three variables, for contrast and more sense 2019-10-03 10:12:30 +02:00
Benno Schulenberg 0c63b50fdc tweaks: move a general function to a better place 2019-08-09 19:24:30 +02:00
Benno Schulenberg c57d040e99 tweaks: don't bother calling mblen() in a non-UTF-8 build
There is no need, because in non-UTF-8 encodings nano treats
each single byte as one character anyway.
2019-06-11 19:48:03 +02:00
Benno Schulenberg cd09482231 tweaks: elide a function that is an amalgam of three others
In addition, the function was used just once, had a weird return value,
and now some more code can be excluded from a non-UTF8 build.

Make use of the fact that any single-byte character always occupies
just one column, and call the costly mbtowc() and wcwidth() only for
characters that actually are multibyte.
2019-06-10 19:43:50 +02:00
Benno Schulenberg c5955d14ce chars: speed up the determination of length and width for plain ASCII 2019-06-10 17:22:41 +02:00
Benno Schulenberg 7d38379919 tweaks: rename two parameters, away from single letters 2019-06-10 12:36:16 +02:00
Benno Schulenberg 45bf18f8fe tweaks: rename three variables, to get rid of a suffix or an underscore
Also drop an unneeded cast.
2019-06-10 12:34:24 +02:00
Benno Schulenberg 787dca6724 tweaks: elide an unneeded variable 2019-06-10 12:06:12 +02:00
Benno Schulenberg 15e36956b5 tweaks: avoid parsing a character twice
Let mbtowc() do all the work, and thus also elide a variable.
2019-06-10 12:01:10 +02:00
Benno Schulenberg 967f581860 tweaks: adjust some whitespace and rewrap a few lines
And remove two unneeded casts.
2019-06-09 20:03:44 +02:00
Benno Schulenberg 1075de1222 tweaks: rename two functions, to get rid of the "mb" abbreviation
Also, for me "move" is about moving the cursor.  But these functions
are about moving an index in a text, which is more general.
2019-06-09 19:37:56 +02:00
Benno Schulenberg 710a600f22 chars: speed up case-insensitive searching by roughly one percent
It is less of a speedup than I was hoping for, though.
2019-06-09 19:13:25 +02:00
Benno Schulenberg 781c7a7a5f chars: create a dedicated function for getting the length of a character
Instead of calling in twenty places parse_mbchar(pointer, NULL, NULL),
use a simpler and faster char_length(pointer).  This saves pushing two
unneeded parameters onto the stack, avoids two needless ifs, and elides
an intermediate variable.

Its main purpose will follow in a later commit: to speed up searching.
2019-06-09 18:38:46 +02:00
Benno Schulenberg aa205f58ca tweaks: rename a bunch of variables, to become identical to others 2019-06-09 17:07:02 +02:00
Benno Schulenberg 71236e145d tweaks: rename two variables, away from a single letter
And adjust the indentation after the previous change.
2019-06-09 11:08:34 +02:00