ariadne/smol - smol - Treehouse Gitea

Commit Graph

Author	SHA1	Message	Date
Benno Schulenberg	5512c63bdd	copyright: update to the current year for significantly changed files	2021-09-24 11:01:41 +02:00
Benno Schulenberg	3c35538e8b	tweaks: add Schiermonnikoog to the list of friendly islands (The commit message is a joke, of course. Instead, this commit just removes some unneeded comments and corrects one bit of whitespace.)	2021-06-16 11:19:23 +02:00
Benno Schulenberg	30bafc70cc	tweaks: prevent two more size_t subtractions from going negative This fully fixes https://savannah.gnu.org/bugs/?60658. Found by compiling with -fsanitize=undefined.	2021-05-24 11:00:29 +02:00
Benno Schulenberg	ceaae49b2d	tweaks: avoid the subtraction of two size_t variables becoming negative This fixes https://savannah.gnu.org/bugs/?60658. Found by compiling with -fsanitize=undefined.	2021-05-23 11:46:37 +02:00
Benno Schulenberg	bb81932422	chars: work around the wrong private-use-character widths on OpenBSD This fixes https://savannah.gnu.org/bugs/?60393.	2021-04-20 11:13:08 +02:00
Benno Schulenberg	48fa14acc0	tweaks: simplify two fragments of code This makes the handling of plain ASCII a tiny bit slower, but it affects only the users of --constantshow without --minibar, so... All other uses of mbstrlen() and collect_char() are not in speed- critical code paths.	2021-04-13 11:19:32 +02:00
Benno Schulenberg	b4a5aedc6c	tweaks: remove a misplaced (and nested) #ifdef It was accidentally introduced two weeks ago by commit `1c010d8e`.	2021-04-09 16:55:07 +02:00
Benno Schulenberg	d6ed174d09	tweaks: morph a function into what it is actually used for Since the previous commit, mbwidth() is used only to determine whether a character is either double width or zero width. There is no need to return the actual width of the character; a simple yes or no is enough. Transforming mbwidth() into is_doublewidth() also allows streamlining it and is_zerowidth() a bit, so that they become slightly faster.	2021-04-09 16:38:23 +02:00
Benno Schulenberg	78f92e044a	tweaks: avoid parsing a multibyte character twice The number of bytes in the character were determined twice: first in mbwidth() and then in char_length(). Do it just once, in mbtowide(). Also, avoid calling is_cntrl_char(), because it does unneeded checks when we already know that the high bit is set. This duplicates some code, but advance_over() is called a lot, so it is important that it is as fast as possible. This shouldn't slow down plain ASCII, as the extra checks (use_utf8 and *string < 0xA0) are done only for non-ASCII (apart from DEL).	2021-04-09 11:32:15 +02:00
Benno Schulenberg	c75a3839da	tweaks: elide a small function that is used just once	2021-04-07 17:08:05 +02:00
Benno Schulenberg	b6a32fbd5f	tweaks: elide an unneeded resetting NULL call to wctomb() Calling wctomb() with NULL as the first parameter returns zero in a UTF-8 locale, meaning that there is no state, so there is no point in resetting it either.	2021-04-07 16:11:40 +02:00
Benno Schulenberg	0dcac9188f	tweaks: simplify two fragments of code, eliding useless character copying	2021-03-29 20:06:05 +02:00
Benno Schulenberg	1c010d8ec9	chars: implement mbtowc() ourselves, for more efficiency This saves a function call, and the passing and checking of the MAXCHARLEN parameter, and the checking whether wc is maybe NULL (which for nano is never the case), and who knows what other overheads mbtowc() has, and our workaround for glibc. Code was written after looking at gnulib/lib/mbrtowc-impl-utf8.h.	2021-03-29 12:36:10 +02:00
Benno Schulenberg	b020937475	chars: implement mblen() ourselves, for efficiency Most implementations of mblen() do a call to mbtowc(), which is a waste of time when all we want to know is the number of bytes (and when we already know that we're using UTF-8 and that the first byte is at least 0xC2). (This also avoids burdening correct implementations with the workaround that was needed only for glibc.) Code was written after looking at gnulib/lib/mbrtowc-impl-utf8.h.	2021-03-27 14:38:28 +01:00
Benno Schulenberg	df7fe1280d	tweaks: drop unneeded braces and adjust indentation after previous change	2021-03-26 12:17:44 +01:00
Benno Schulenberg	929770191e	chars: work around a UTF-8 bug in glibc, to display invalid codes right The mblen() and mbtowc() functions will happily return 4 or 5 or 6 for byte sequences that start with 0xF4 0x90 or higher. But those sequences encode for U+110000 or higher, which are not valid Unicode code points. The libc of FreeBSD and OpenBSD and Alpine correctly return -1 for such sequences. Make nano behave correctly also when linked against glibc, so that invalid sequences are always presented as a series of invalid bytes and never as a single invalid code. This fixes https://savannah.gnu.org/bugs/?60262. Bug existed since before version 2.0.0.	2021-03-26 11:07:05 +01:00
Benno Schulenberg	66d9d6c6d2	tweaks: elide the pointless is_valid_unicode() function The call of this function in make_mbchar() does not add anything, because wctomb() already returns -1 for codes U+D800 to U+DFFF, and parse_verbatim_kbinput() already rejects anything that starts with U+11.... or higher, so make_mbchar() is never called for codes beyond U+10FFFF. And the call in display_string() just needs to check for wc <= 0x10FFFF because mbtowc() already returns -1 for codes U+D800 to U+DFFF.	2021-03-25 11:24:41 +01:00
Benno Schulenberg	de816840cb	input: accept Unicode codes for non-characters as valid, since they are That is, accept U+FDD0 to U+FDEF, and accept U+xxFFFE and U+xxFFFF for xx from 00 to 10 hex, being the 66 reserved "non-characters". It may not be wise of the user to input these "things" (by typing their code after M-V), but the codes are valid Unicode code points and should not be rejected. See https://www.unicode.org/faq/private_use.html#nonchar8 et al. This fixes https://savannah.gnu.org/bugs/?60263. Bug existed since before version 2.0.0.	2021-03-24 17:11:05 +01:00
Benno Schulenberg	6360e4170a	copyright: update the years for the FSF	2021-01-11 14:22:51 +01:00
Benno Schulenberg	24e5f956d0	build: fix compilation when configured with --disable-utf8 This fixes https://savannah.gnu.org/bugs/?59842. Reported-by: Ruben van Wyk <admin@knwip.com> Bug existed since commit `5129e718` from two days ago.	2021-01-08 12:05:55 +01:00
Benno Schulenberg	10b99d8ac0	chars: short-circuit determining the width of characters under U+0300 The combining characters (that are zero-width) start at U+0300. After that it's pretty much chaos, width-wise. The mbwidth() function is not called for control characters (whose representation takes up two columns), as they are handled separately. The calls of mbwidth() that can happen with a control character as argument are only to determine whether the character is zero-width, and then it doesn't matter whether the exact width is 1 or 2.	2021-01-06 20:15:14 +01:00
Benno Schulenberg	5129e718d7	chars: speed up the handling of invalid UTF-8 starter bytes The first byte of a multi-byte UTF-8 sequence must be in the range 0xC2...0xFF. Any other byte cannot be a starter byte and can thus immediately be treated as a single byte.	2021-01-06 12:41:49 +01:00
Benno Schulenberg	a4675acdba	copyright: update to the current year for significantly changed files	2020-11-30 12:01:47 +01:00
Benno Schulenberg	687efd210c	moving: skip combining characters and other zero-width characters This makes the cursor move smoothly left and right -- instead of "stuttering" when passing over a zero-width character. Pressing <Delete> on a normal (spacing) character also deletes any zero-width characters after it. But pressing <Backspace> while the cursor is placed after a zero-width character, just deletes that zero-width character. The latter behavior allows deleting and retyping just the combining diacritic of a character instead of the whole character. This addresses https://savannah.gnu.org/bugs/?50773. Requested-by: Mike Frysinger <vapier@gentoo.org>	2020-11-17 10:21:50 +01:00
Benno Schulenberg	5a635db262	chars: reduce searching time with roughly 85 percent for plain ASCII Make case-insensitive searching in a UTF-8 locale eight times faster when the actual characters involved are plain ASCII. This makes us faster than 'less', and as fast as Vim and Emacs. The disadvantage of this change is that searching for a string that begins with a multibyte character is nearly ten times slower than searching for one that begins with an ASCII character. This may be unsettling when searching a huge file first for a simple ASCII string and later for a UTF-8 one. Doing this second search, the user might get impatient: "Why is it taking so long?" (This patch fell through the cracks four years ago, when I worked on the searching code. It sat in a branch on top of other changes that I never applied because I made different improvements. The speedup at the time, on that machine, was only around sixty percent, though. But measuring it now again on the same machine, it clocks in at an 82 percent reduction with -O0 and an 87 percent reduction with -O2.)	2020-09-01 19:35:34 +02:00
Hussam al-Homsi	c87bc1d55f	tweaks: stop casting the return of malloc() and friends Those casts are redundant, and sometimes ugly. And as the types of variables are extremely unlikely to change any more at this point, the protection they offer against miscompilations is moot. Signed-off-by: Hussam al-Homsi <sawuare@gmail.com>	2020-08-31 12:17:27 +02:00
Benno Schulenberg	8249f3560f	tweaks: normalize the indentation after the previous change	2020-07-20 19:46:27 +02:00
Benno Schulenberg	dd1b16cd54	tweaks: trim an ASCII case, as the function is called only for UTF-8	2020-07-20 19:37:40 +02:00
Benno Schulenberg	90f6342fd1	tweaks: rename two header files, to be distinct and not an abbreviation	2020-06-20 12:09:31 +02:00
Benno Schulenberg	547de4a7bb	counting: count words correctly also when --wordchars is used It should give the same result as 'wc -w' as long as the content of 'wordchars' does not affect the counting. This fixes https://savannah.gnu.org/bugs/?58123. Bug existed since version 2.6.2, since the --wordchars option was introduced in commit `6f12992c`.	2020-04-06 11:17:43 +02:00
Benno Schulenberg	f528ced22b	tweaks: use a symbol instead of a number, and drop two unneeded casts	2020-03-22 14:29:10 +01:00
Benno Schulenberg	4ce2e146ea	tweaks: elide three unneeded #defines Backspace and Tab and Carriage Return have standard backslash escapes.	2020-03-19 14:40:51 +01:00
Benno Schulenberg	9917a05f04	tweaks: exclude a function when compiled without spell-checking support	2020-03-13 11:59:08 +01:00
Benno Schulenberg	fcda76f684	build: restore non-UTF8 fallbacks, to allow compiling with --disable-utf8 Commits `b2c63c3d` and `004af03e` from yesterday mistakenly removed those calls.	2020-03-13 11:43:31 +01:00
Benno Schulenberg	21ed79938e	tweaks: normalize the indentation after the previous two changes	2020-03-12 15:54:19 +01:00
Benno Schulenberg	004af03ea5	tweaks: remove non-UTF-8 code from three more functions	2020-03-12 15:54:19 +01:00
Benno Schulenberg	b2c63c3d3c	chars: optimize a function for the most common blanks: space and tab Also, do not bother to provide separate code for the non-UTF-8 case. Instead, optimize for plain ASCII characters.	2020-03-12 15:54:19 +01:00
Benno Schulenberg	ae139021eb	tweaks: rename four more functions, to get rid of an abbreviation Also, improve their comments.	2020-03-12 15:54:19 +01:00
Benno Schulenberg	f6dedf3598	tweaks: rename another function, to remove the obscuring abbreviation	2020-03-12 15:54:19 +01:00
Benno Schulenberg	8003842e5c	tweaks: rename a function, to remove an obscuring abbreviation The "mb" made the name harder to read. Also, the function is not only for multibyte characters but for any character.	2020-03-12 15:53:49 +01:00
Benno Schulenberg	1d4411a474	tweaks: elide a function call, by copying a byte directly Now all remaining calls of measured_copy() have a "+ 1" in their second argument, and can thus be simplified. And each of those calls is followed by terminating the string with a NUL byte, so thát can be pulled into the function.	2020-02-20 16:38:14 +01:00
Benno Schulenberg	a9f7277b1b	tweaks: remove a now-unused helper function	2020-02-16 12:33:29 +01:00
Benno Schulenberg	0a31a9aa38	tweaks: make two conditions more direct, and thus elide two functions Using straightforward comparisons is clearer and faster and shorter. Again, note that this does not filter out 0x7F (DEL). But that is okay, as that code will never be returned from get_kbinput().	2020-02-12 11:38:33 +01:00
Benno Schulenberg	2148e857e5	copyright: update the years for significantly changed files	2020-01-15 12:11:56 +01:00
Benno Schulenberg	afa4c6b9fc	copyright: update the years for the FSF	2020-01-15 11:42:38 +01:00
Benno Schulenberg	3c695664ec	tweaks: elide a function call for the plain ASCII case When dealing with a plain, seven-bit ASCII character, don't bother calling is_cntrl_mbchar() but determine directly whether it is a control character. Also reshuffle things so that we don't compare charlen == 1 when we already know it is 1.	2019-10-21 18:52:44 +02:00
Benno Schulenberg	8a7634f070	tweaks: rename two parameters plus a variable, to match others Also improve a comment and normalize an indentation.	2019-10-21 13:02:17 +02:00
Benno Schulenberg	fa88fcc8f2	tweaks: rename a function, and elide a parameter that is always NULL After the previous change, all remaining calls of parse_mbchar() have NULL as their third parameter. So, drop that parameter and remove the chunk of code that handles it. Also rename the function, as there are already too many functions that start with "parse".	2019-10-21 12:35:14 +02:00
Benno Schulenberg	c2d8641f01	chars: add a faster version of the character-parsing function It elides a parameter that is always NULL, and elides two ifs that always take the same path.	2019-10-21 12:24:23 +02:00
Benno Schulenberg	17c16a4bf5	tweaks: rename a function and elide its first parameter	2019-10-20 09:45:58 +02:00

1 2 3 4 5 ...

315 Commits (5c8de3e39f77cca37e75eea3b09550e3bad687bb)