Commit Graph

12 Commits (b9b4db483cc588a2eb334b63fe6740c8dad9b57b)

Author SHA1 Message Date
Claire f1657e6d62
Clamp dates when serializing to Elasticsearch API () 2023-11-27 13:25:54 +00:00
Claire 6b58cfd8dd
Fix searching by username by reverting account verbatim tokenizer to `standard` () 2023-08-31 15:35:58 +02:00
Eugen Rochko 7bd5ebb0c5
Fix multiple issues with status index mappings () 2023-08-28 11:36:17 +02:00
jsgoldstein 30c191aaa0
Add new public status index ()
Co-authored-by: Eugen Rochko <eugen@zeonfederated.com>
Co-authored-by: Claire <claire.github-309c@sitedethib.com>
2023-08-24 16:40:04 +02:00
Claire f5778caa3a
Add `ES_PRESET` option to customize numbers of shards and replicas ()
Co-authored-by: Eugen Rochko <eugen@zeonfederated.com>
2023-08-14 17:46:16 +02:00
Eugen Rochko 72423bc8f6
Change account search tokenizer and queries () 2023-08-08 09:09:14 +02:00
jsgoldstein 4581a528f7
Change account search to match by text when opted-in ()
Co-authored-by: Eugen Rochko <eugen@zeonfederated.com>
2023-06-29 13:05:21 +02:00
Eugen Rochko a9b64b24d6
Change algorithm of `tootctl search deploy` to improve performance () 2022-05-22 22:16:43 +02:00
Eugen Rochko 679b7158e3
Change search indexing to use batches to minimize resource usage () 2022-05-18 23:29:14 +02:00
Takeshi Umeda 3419d3ec84
Bump chewy from 5.2.0 to 7.2.3 (supports Elasticsearch 7.x) ()
* Bump chewy from 5.2.0 to 7.2.2

* fix style (codeclimate)

* fix style

* fix style

* Bump chewy from 7.2.2 to 7.2.3
2021-11-18 22:02:08 +01:00
Eugen Rochko 70da6d6630
Fix accounts search by full/partial display name and others ()
- Restrict followers counts to local users to minimize local advantage
- Fix emoji shortcodes causing error in search
- Fix search syntax parse errors not being caught
2019-08-16 13:00:30 +02:00
Eugen Rochko 8fdff2748f
Add more accurate account search ()
* Add more accurate account search

When ElasticSearch is available, a more accurate search is implemented:

- Using edge n-gram index for acct and display name
- Using asciifolding and cjk width normalization on display names
- Using Gaussian decay on account activity for additional scoring (recency)
- Using followers/friends ratio for additional scoring (spamminess)
- Using followers number for additional scoring (size)

The exact match precedence only takes effect when the input conforms
to the username format and the username part of it is complete, i.e.
when the user started typing the domain part.

* Support single-letter usernames

* Fix tests

* Fix not picking up account updates

* Add weights and normalization for scores, skip zero terms queries

* Use local counts for accounts index, adjust search parameters

* Fix mistakes

* Using updated_at of accounts is inadequate for remote accounts
2019-08-16 01:24:03 +02:00