Commit Graph

315 Commits (5c8de3e39f77cca37e75eea3b09550e3bad687bb)

Author SHA1 Message Date
Benno Schulenberg 31ff7ead73 tweaks: move a function to before its callers and next to its kind
Also, improve the indentation of two random lines.
2019-10-03 11:24:01 +02:00
Benno Schulenberg 5398d986ef tweaks: speed up determining the width of plain ASCII characters 2019-10-03 11:09:21 +02:00
Benno Schulenberg b02dccc51f tweaks: elide a function from a non-UTF8 build
In a non-UTF8 build, mbwidth() returns always 1, so it is pointless
to call that function and compare its result to zero then.

Also, don't bother special-casing the function for a non-UTF8 locale.
2019-10-03 10:48:10 +02:00
Benno Schulenberg 3158133edd tweaks: rename three variables, for contrast and more sense 2019-10-03 10:12:30 +02:00
Benno Schulenberg 0c63b50fdc tweaks: move a general function to a better place 2019-08-09 19:24:30 +02:00
Benno Schulenberg c57d040e99 tweaks: don't bother calling mblen() in a non-UTF-8 build
There is no need, because in non-UTF-8 encodings nano treats
each single byte as one character anyway.
2019-06-11 19:48:03 +02:00
Benno Schulenberg cd09482231 tweaks: elide a function that is an amalgam of three others
In addition, the function was used just once, had a weird return value,
and now some more code can be excluded from a non-UTF8 build.

Make use of the fact that any single-byte character always occupies
just one column, and call the costly mbtowc() and wcwidth() only for
characters that actually are multibyte.
2019-06-10 19:43:50 +02:00
Benno Schulenberg c5955d14ce chars: speed up the determination of length and width for plain ASCII 2019-06-10 17:22:41 +02:00
Benno Schulenberg 7d38379919 tweaks: rename two parameters, away from single letters 2019-06-10 12:36:16 +02:00
Benno Schulenberg 45bf18f8fe tweaks: rename three variables, to get rid of a suffix or an underscore
Also drop an unneeded cast.
2019-06-10 12:34:24 +02:00
Benno Schulenberg 787dca6724 tweaks: elide an unneeded variable 2019-06-10 12:06:12 +02:00
Benno Schulenberg 15e36956b5 tweaks: avoid parsing a character twice
Let mbtowc() do all the work, and thus also elide a variable.
2019-06-10 12:01:10 +02:00
Benno Schulenberg 967f581860 tweaks: adjust some whitespace and rewrap a few lines
And remove two unneeded casts.
2019-06-09 20:03:44 +02:00
Benno Schulenberg 1075de1222 tweaks: rename two functions, to get rid of the "mb" abbreviation
Also, for me "move" is about moving the cursor.  But these functions
are about moving an index in a text, which is more general.
2019-06-09 19:37:56 +02:00
Benno Schulenberg 710a600f22 chars: speed up case-insensitive searching by roughly one percent
It is less of a speedup than I was hoping for, though.
2019-06-09 19:13:25 +02:00
Benno Schulenberg 781c7a7a5f chars: create a dedicated function for getting the length of a character
Instead of calling in twenty places parse_mbchar(pointer, NULL, NULL),
use a simpler and faster char_length(pointer).  This saves pushing two
unneeded parameters onto the stack, avoids two needless ifs, and elides
an intermediate variable.

Its main purpose will follow in a later commit: to speed up searching.
2019-06-09 18:38:46 +02:00
Benno Schulenberg aa205f58ca tweaks: rename a bunch of variables, to become identical to others 2019-06-09 17:07:02 +02:00
Benno Schulenberg 71236e145d tweaks: rename two variables, away from a single letter
And adjust the indentation after the previous change.
2019-06-09 11:08:34 +02:00
Benno Schulenberg 7be76af418 tweaks: speed up the counting of characters in mbstrlen()
This function is used in get_totsize(), so speed is important.

There is no reason why the length of the string must limited to a
certain size -- that is just a leftover from the function merge in
commit ba2e6f43 from a year ago.
2019-06-09 11:04:52 +02:00
Benno Schulenberg fb17929fab tweaks: use FALSE for booleans instead of zero
Also adjust some indentation and reduce the scope of a variable.
2019-06-09 10:41:14 +02:00
Benno Schulenberg 55952c0984 tweaks: adjust the indentation after the previous change
Also, improve a comment and shorten another, change a 'for' to a 'while'
(as the end point is not known), and rename a parameter from a single
letter to a word.
2019-04-06 19:45:45 +02:00
Benno Schulenberg d9cfdd364b tweaks: don't bother special-casing non-UTF8 when seeking a character
This code is seldom used (only when searching for a matching bracket),
so speed is not a priority.
2019-04-06 19:34:09 +02:00
Benno Schulenberg e02c8aeb20 tweaks: adjust indentation after previous change, and rename a parameter 2019-03-24 09:42:57 +01:00
Benno Schulenberg 003ddc763e tweaks: don't bother special-casing non-UTF8 when checking for a blank
This code is almost never used; conciseness is the only consideration.
2019-03-24 09:36:15 +01:00
Benno Schulenberg 1c3953705c tweaks: avoid parsing the same character twice
Also, make the second loop similar in form to the first.
2019-03-24 09:34:21 +01:00
Benno Schulenberg 01e6d435fe tweaks: rename a function, to be simpler and more accurate 2019-03-21 17:42:34 +01:00
Benno Schulenberg 2b21d53857 tweaks: elide a function that is called just once 2019-03-21 17:36:46 +01:00
Benno Schulenberg a20340b5a8 copyright: update the years for significantly changed files 2019-03-10 17:03:42 +01:00
Benno Schulenberg 79ca3ceabf copyright: update the years for the FSF 2019-02-24 19:35:56 +01:00
Benno Schulenberg ce0ecf67a6 tweaks: elide another function that is used just once 2018-07-10 13:57:05 +02:00
Benno Schulenberg b6f7ff5c6f tweaks: normalize the indentation after the previous change
And remove two superfluous pairs of braces.
2018-07-10 13:55:51 +02:00
Benno Schulenberg 035b91cb15 chars: make the UTF-8 case ever so slightly faster by eliding an 'if'
And in the bargain get rid of some duplicate code.

This makes a binary without UTF-8 support slightly slower, but that's
not important -- it is more than fast enough anyway.  Important is that
the most used and longest code path, the UTF-8 case, becomes faster.

Note that 'is_cntrl_mbchar()' will fall back to 'is_cntrl_char()' for
a non-UTF-8 build, so the deleted piece of code really was equivalent
with the remaining piece for that case.
2018-07-10 12:22:11 +02:00
Benno Schulenberg 7ae8f3bca4 tweaks: elide a function that is used just once 2018-07-08 10:40:22 +02:00
Benno Schulenberg 9a7ba5db79 chars: speed up the parsing of a character for the plain ASCII case
Again, if the most significant bit of a UTF-8 byte is zero, it means
the character is a single byte and we can skip the call of mblen(),
*and* if the character is one byte it also occupies just one column,
because all ASCII characters are single-column characters -- apart
from control codes.

This partially addresses https://savannah.gnu.org/bugs/?51491.
2018-06-04 14:07:43 +02:00
Benno Schulenberg cc2b19c8fd chars: speed up the counting of string length for the plain ASCII case
For UTF-8, if the most significant bit of a byte is zero, it means the
character is just a single byte and we can skip the call of mblen().

For files consisting of pure ASCII bytes (between 0x00 and 0x7F), this
change reduces the counting time of mbstrlen() by ninety six percent.

This partially addresses https://savannah.gnu.org/bugs/?50406.
2018-06-03 20:07:04 +02:00
Benno Schulenberg 4e667bd048 tweaks: reduce the counting of characters to just the needed function
Evade the indirect use of the general-purpose function parse_mbchar().
This reduces the counting time by roughly ten percent.
2018-06-03 17:58:05 +02:00
Benno Schulenberg ba2e6f43c2 tweaks: elide a subfunction that is used just once
(Forgot to say: the previous two commits addressed
https://savannah.gnu.org/bugs/?54044.)
2018-06-03 17:47:02 +02:00
Benno Schulenberg 6c555828c9 tweaks: remove redundant braces and conditions after the previous change 2018-06-03 14:13:33 +02:00
Benno Schulenberg 0ff068380c tweaks: remove the superfluous calls that reset the mbtowc() state
When mbtowc() is never called with anything less than MAXCHARLEN as
the length parameter, it will apparently not get confused and will
not need to be reset.
2018-06-03 14:06:59 +02:00
Benno Schulenberg f72fecee9b copyright: update the years for the FSF
And one for me, for the much changed keyboard stuff.
2018-01-24 10:14:43 +01:00
Benno Schulenberg 17429d7f38 tweaks: fix some whitespace errors, and convert alignment tabs to spaces 2017-12-29 21:35:14 +01:00
Benno Schulenberg 87206c0607 tweaks: convert the indentation to use only tabs
Each leading tab is converted to two tabs, and any leading four spaces
is converted to one tab.  The intended tab size (for keeping most lines
within 80 columns) is now four.
2017-12-29 20:06:50 +01:00
Benno Schulenberg 5198c1f139 tweaks: frob a couple of comments 2017-11-12 20:08:28 +01:00
Benno Schulenberg 5239e7c52b copyright: update some years, and standardize on the dashed format 2017-11-12 10:46:20 +01:00
Benno Schulenberg db0b849f9b tweaks: transform the token DISABLE_JUSTIFY to ENABLE_JUSTIFY 2017-10-31 17:40:44 +01:00
Benno Schulenberg 11072ed587 tweaks: sort the includes, so it's a little easier to see what is there 2017-08-06 19:40:30 +02:00
Benno Schulenberg 858e75e4cf tweaks: transform the token DISABLE_NANORC to ENABLE_NANORC
Also, allow rebinding the word and block jumping functions
in the tiny version when nanorc files are reenabled.
2017-05-09 11:59:34 +02:00
David Lawrence Ramsey 03c3e2b7c0 tweaks: fix several whitespace irregularities
Add missing spaces, remove excess spaces, and
replace groups of indentation spaces with tabs.
2017-05-07 18:20:01 +02:00
Benno Schulenberg f5155786e1 tweaks: adjust whitespace and comments after the preceding change 2017-05-05 21:54:23 +02:00
Benno Schulenberg a9abc3d95f chars: optimize moving a character left in the non-UTF-8 case
When not in a UTF-8 locale, each character is just a single byte.
2017-05-05 21:40:00 +02:00
Benno Schulenberg 09cabcad5d chars: probe for a valid UTF-8 starter byte, instead of overstepping
Instead of always stepping back four bytes and then tentatively
moving forward again (which is wasteful when most codes are just
one or two bytes long), inspect the preceding bytes one by one
and begin the move forward at the first valid starter byte.

This reduces the backwards searching time by close to 40 percent.
2017-05-05 21:36:45 +02:00
Benno Schulenberg f162a6a2ab chars: valid UTF-8 codes are at most 4 bytes long, so look only that far
This reduces the backwards searching time by a good 20 percent.
2017-05-05 21:34:23 +02:00
Benno Schulenberg c42d6d378a tweaks: check for an empty needle in a central place
Searching for an empty string should be impossible, it should never
happen, but it is bit too hard to verify this at the moment.
2017-04-30 20:32:01 +02:00
Benno Schulenberg 1a79b3d514 tweaks: remove a superfluous strlen() call from the reverse searches
If the length of the haystack is smaller than the length of the needle,
this means that also the length of the tail will be smaller -- because
pointer will be bigger than or equal to haystack -- so the pointer gets
readjusted to be a needle length before the end of the haystack, which
means that it ends up /before/ the haystack: thus the while loop will
never run.

On average, this saves some 200 nanoseconds per line.
2017-04-30 19:51:36 +02:00
Benno Schulenberg 7d3d3dec9a tweaks: use the logic from revstrstr() also in mbrevstrcasestr()
Because it is slightly faster.
2017-04-30 19:51:14 +02:00
Benno Schulenberg 6240805c41 tweaks: rename one variable again
It is not an index, it is not an offset from anything,
it is a direct pointer.
2017-04-30 17:50:12 +02:00
Benno Schulenberg 6967fae35d tweaks: use the logic from revstrstr() also in revstrcasestr()
This elides a counter and a comparison from the central loop,
and thus makes the search a tiny bit faster.
2017-04-28 22:12:53 +02:00
Benno Schulenberg a37435141a tweaks: rename some more of these 'rev_start' variables 2017-04-28 22:09:38 +02:00
Benno Schulenberg 329021e24a tweaks: rename two variables, because this 'rev_start' is irksome
And one-letter variables I cannot "see" -- they are too small.
2017-04-28 21:02:24 +02:00
Benno Schulenberg b06407fbd7 tweaks: drop a bunch of asserts 2017-04-28 16:07:27 +02:00
Benno Schulenberg 754c62c5cc copyright: update the years, use ranges, and explain this usage
The interval 2013-2017 for the Free Software Foundation is valid
because in those years there were releases with changes by either
Chris or David, and the GNU maintainers guide advises to mention
a new year in all files of a package, not just in the ones that
actually changed, and be done with it for the rest of the year.
2017-04-09 12:09:23 +02:00
Benno Schulenberg aedc3ddd49 tweaks: replace a function call or a macro with a hard number
Verify at startup that the number is not too small.
2017-04-04 19:17:02 +02:00
Hans-Bernhard Broeker 636b7348a6 tweaks: make sure calls to <ctype.h> functions/macros use "unsigned char"
The platform's default char type might be signed, which could cause
problems in 8-bit locales.

This addresses https://savannah.gnu.org/bugs/?50289.
Reported-by: Hans-Bernhard Broeker <HBBroeker@T-Online.de>
2017-03-06 20:58:25 +01:00
Mike Frysinger e7c43521fc drop the wchar.h/wctype.h/stdarg.h checks
Since gnulib provides these now, we can assume them.
2017-03-06 12:01:21 +01:00
Mike Frysinger 63cae0c199 drop the isblank/iswblank fallback functions
Switch over to gnulib for these.
2017-03-06 12:01:05 +01:00
Mike Frysinger 28133e934d drop various str fallback functions
These are provided by gnulib now.
2017-03-06 12:00:57 +01:00
Benno Schulenberg eef7d1047a screen: display byte value 0x0A in the right places as ^@ or as ^J
In path names and file names, 0x0A means an embedded newline and
should be shown as ^J, but in anything related to the file's data,
0x0A is an encoded NUL and should be displayed as ^@.

So... switch mode at the two main entry points into the "file system"
(reading in a file, and writing out a file), and also when drawing the
titlebar.  Switch back to the default mode in the main loop.

This fixes https://savannah.gnu.org/bugs/?49893.
2016-12-23 11:00:55 +01:00
Benno Schulenberg 116d9e6f01 chars: use memory on the stack instead of calling malloc() and free() 2016-12-20 10:05:09 +01:00
Benno Schulenberg cd705a7c4c tweaks: elide a counter and a comparison
For clarity and a tiny bit more speed.  Also rename some variables.
2016-12-19 09:44:30 +01:00
Benno Schulenberg eafae5d417 screen: show an embedded newline in filenames as ^J instead of ^@
The byte 0x0A means 0x00 *only* when it is found in nano's internal
representation of a file's data, not when it occurs in a file name.

This fixes the second part of https://savannah.gnu.org/bugs/?49867.
2016-12-18 11:13:50 +01:00
Benno Schulenberg 0562d27b9c tweaks: delete a bunch of unneeded asserts
Nano would crash straight afterward if any of these asserts would fail,
so they don't add anything.  A few others are simply superfluous.
2016-12-15 21:15:32 +01:00
Benno Schulenberg c5f49167ea tweaks: write two pieces of conditionalized code like all others
Also trim or improve a few comments.
2016-12-15 19:48:09 +01:00
Benno Schulenberg 9765c2faa0 tweaks: elide a function that is called just once 2016-12-15 19:28:43 +01:00
Benno Schulenberg 85ebe971e2 chars: optimize for the most common case
That is: elide a second test from the most travelled path: a valid
character.  This adds a second call of mblen() when parse_mbchar()
is called on a terminating zero, but that should never happen.
2016-12-15 17:44:18 +01:00
Benno Schulenberg fc101a6ded tweaks: rename a variable to be shorter and clearer 2016-12-15 15:50:07 +01:00
Benno Schulenberg 08cd197bf1 general: include word-jumping and block-jumping into the tiny version
And also case-sensitive searches, backward searches, and searching again.
2016-09-13 09:27:04 +02:00
Benno Schulenberg 514cd9a099 update the license text to the preferred version
Mentioning "GNU nano" instead of "This program" and referring to the
website instead of to a postal address.
2016-08-29 21:27:16 +02:00
Benno Schulenberg 406e5242a3 update the copyright notices 2016-08-29 21:27:05 +02:00
Benno Schulenberg 86a64b1bb5 tweaks: reduce two comparisons to a single one 2016-08-07 13:00:21 +02:00
Benno Schulenberg c8bc05b10e chars: make searching case-insensitively some ten percent faster
It is quicker to do a handful of superfluous compares at the end of
each line than it is to compute and keep track of and compare the
remaining line length the whole time.

The typical line is some sixty characters long, the typical search
string ten characters -- with a shorter search string the speedup is
even higher: some fifteen percent.  Only when the string is longer
than half the average line length does searching become slower with
this new method.

All this for a UTF-8 locale.  For a C locale it makes no difference.
2016-08-07 11:02:41 +02:00
Benno Schulenberg 370406bb41 tweaks: don't optimize for a special case -- it is far too seldom 2016-08-06 11:11:56 +02:00
Benno Schulenberg 85844ee6ef chars: remove superfluous afterchecks
Now that mbstrncasecmp() does the right thing, there is no need any
more to verify that only a valid multibyte sequence was matched.

(See https://savannah.gnu.org/bugs/?45579 for a test case.)

Also, this will make it possible to search for invalid sequences.

(Currently it isn't possible to enter a search string with invalid
characters, but... a user might edit the search history file.  And
if pasting at the prompt is implemented, it will be trivial to enter
invalid sequences if you have a file that contains them.)
2016-08-06 11:10:39 +02:00
Benno Schulenberg e38e2c634b chars: don't persist when only one of the compared sequences is invalid
Persisting might lead to count 'n' reaching zero, which would mean that
the needle has matched, which is wrong when one of the strings contains
an invalid or incomplete multibyte sequence.
2016-08-06 10:34:38 +02:00
Benno Schulenberg d80109dd5e chars: properly compare strings of different lengths
That is: don't run towlower() on the two differing bytes when having
reached the end of one of the strings.

This fixes https://savannah.gnu.org/bugs/?48700.

In the bargain, don't do the conversion to lowercase twice.

Furthermore, persist when encountering invalid byte sequences --
until finding bytes that differ.
2016-08-05 16:07:55 +02:00
Benno Schulenberg b305911cba chars: straighten out the flow of a loop, so it is easier to follow 2016-08-04 13:40:55 +02:00
Benno Schulenberg d60f95137e chars: remove a special case that never occurs
The needle is never part of the hay -- it is always a separate string.

(And even if needle and haystack were identical, the routine works fine,
the case does not need special treatment.)
2016-08-04 13:40:19 +02:00
Benno Schulenberg 20058a1b63 spelling: don't consider digits as word parts, because GNU spell doesn't
This fixes https://savannah.gnu.org/bugs/?48660.
2016-08-03 12:43:57 +02:00
Benno Schulenberg 90a90365a8 tweaks: rename three constants, for clarity, and hardcode two others 2016-08-01 12:56:05 +02:00
Benno Schulenberg 41ad376b70 chars: plug a gushing memory leak 2016-07-22 15:30:09 +02:00
Benno Schulenberg bf091be778 chars: don't try to see a character in an empty line
This fixes https://savannah.gnu.org/bugs/?48578.
2016-07-21 09:46:47 +02:00
Benno Schulenberg 6f12992cea new feature: add the option --wordchars, to set extra word characters
This allows the user to specify which other characters, besides the
default alphanumeric ones, should be considered as part of a word, so
that word operations like Ctrl+Left and Ctrl+Right will pass them by.

Using this option overrides the option --wordbounds.

This fulfills https://savannah.gnu.org/bugs/?47283.
2016-07-13 20:49:30 +02:00
Benno Schulenberg e33a0b6dbe screen: avoid converting each character twice from multibyte to wide 2016-07-12 19:41:13 +02:00
Benno Schulenberg 0894587305 screen: elide another intermediate buffer for every visible character 2016-07-12 19:30:50 +02:00
Benno Schulenberg b6efea266e chars: invalid sequences are not blank, nor text, nor punctuation
So, slightly speed up the functions that check for those.
2016-06-30 14:34:34 +02:00
Benno Schulenberg 8686cb3d3d chars: measure invalid sequences and unassigned codepoints more quickly
Invalid multibyte sequences get depicted with the Replacement Character,
and unassigned codepoints are shown as if they were a space.  Both have
a width of one.
2016-06-30 14:33:25 +02:00
Benno Schulenberg af53c56ec8 chars: speed up the determination whether something is a control character
Use knowledge of UTF-8 instead of converting to wide characters first.
2016-06-29 20:56:50 +02:00
Benno Schulenberg 019d7b34ca chars: delete a now-unused function 2016-06-29 20:56:50 +02:00
Benno Schulenberg 622995fb12 chars: the representation of a control character is always two bytes
Any control character is represented by a ^ plus an ASCII character.
2016-06-29 20:56:50 +02:00
Benno Schulenberg 03586c60da chars: represent the high-bit controls more intelligibly
Instead of showing the upper control codes like this:

   ^À ^Á ^Â ^Ã ^Ä ^Å ^Æ ^Ç ^È ^É ^Ê ^Ë ^Ì ^Í ^Î ^Ï
   ^Ð ^Ñ ^Ò ^Ó ^Ô ^Õ ^Ö ^× ^Ø ^Ù ^Ú ^Û ^Ü ^Ý ^Þ ^ß

show them like this:

   ^` ^a ^b ^c ^d ^e ^f ^g ^h ^i ^j ^k ^l ^m ^n ^o
   ^p ^q ^r ^s ^t ^u ^v ^w ^x ^y ^z ^{ ^| ^} ^~ ^=

The lower control codes continue to be shown like this:

   ^@ ^A ^B ^C ^D ^E ^F ^G ^H ^I ^J ^K ^L ^M ^N ^O
   ^P ^Q ^R ^S ^T ^U ^V ^W ^X ^Y ^Z ^[ ^\ ^] ^^ ^_

The representation of DEL (0x7F) continues as ^?.

Further, use knowledge of UTF-8 to avoid a roundtrip through
wide characters.
2016-06-29 20:56:50 +02:00
Benno Schulenberg 6fda7a7057 chars: speed up two reverse-searching routines a bit
By removing from their main loops a condition that occurs just once.
2016-06-27 19:22:28 +02:00