Commit Graph

17 Commits (main)

Author SHA1 Message Date
Eugen Rochko e7aa2be828
Change how hashtags are normalized (#18795)
* Change how hashtags are normalized

* Fix tests
2022-07-13 15:03:28 +02:00
abcang 7f0c49c58a
Improve tag search query (#16104) 2021-04-25 06:33:28 +02:00
Eugen Rochko 0762258aec
Fix hashtags being split by ZWNJ character (#11821)
Fix #11761
2019-09-13 16:01:26 +02:00
Eugen Rochko cc0a55cf9a
Add more accurate hashtag search (#11579)
* Add more accurate hashtag search

Using ElasticSearch to index hashtags with edge n-grams and score
them by usage within the last 7 days since last activity. Only
hashtags that have been reviewed and are listable can appear in
searches, unless they match the query exactly

* Fix search analyzer dropping non-ascii characters
2019-08-18 03:45:51 +02:00
Eugen Rochko 92de439c04
Change hashtag search to only return results that have trended in the past (#11448)
* Change hashtag search to only return results that have trended in the past

A way to eliminate typos and other one-off "junk" results

* Fix excluding exact matches that don't have a score

* Fix tests
2019-07-30 20:29:50 +02:00
Eugen Rochko e136112ab7
Fix tag normalization and migration not removing duplicate tags (#11441)
Fix #11428
2019-07-29 20:40:21 +02:00
ThibG c37c1da41e Disallow numeric-only hashtags (#11363)
* Add spec covering numeric-only hashtags

* Fix hashtag regex
2019-07-19 23:22:35 +02:00
Eugen Rochko 84e988479e
Fix only one middle dot being recognized in hashtags (#11345)
Fix #10934
2019-07-18 03:02:56 +02:00
ysksn 7d7df877ef Add a test for Tag#to_param (#5705) 2017-11-15 16:04:41 +01:00
masarakki a49be27145 add validation to tag name (#4194) 2017-07-14 11:02:49 +02:00
unarist c2753fdfb4 Make tag search case insensitive again (#4184) 2017-07-13 19:31:33 +02:00
Eugen Rochko 3f59238207 Add important test for full-width hashtags (#3911) 2017-06-23 17:01:53 +02:00
unarist 004672aa6c Fix tag search order and not to use tsvector (#3611)
* Sort results by the name
* Switch search method to simple `LIKE` matching instead of tsvector/tsquery

Previously we used scores from ts_rank_cd() to sort results, but it didn't work
because the function returns same score for all results. It's not for calculate
similarity of single words. Sometimes this bug even push out exact matching tag
from results.

Additionally, PostgreSQL supports prefix searching with standard btree index.
Using it offers simpler code, but also less index size and some speed.
2017-06-06 16:07:06 +02:00
Matt Jankowski 388ec0d5b6 Search cleanup (#1333)
* Clean up SQL output in Tag and Account search methods

* Add basic coverage for Tag.search_for

* Add coverage for Account.search_for

* Add coverage for Account.advanced_search_for
2017-04-09 14:45:01 +02:00
Eugen Rochko 4fb95c91fb Fix wrongful matching of last period in extended usernames
Fix anchor tags in some wikipedia URLs being matches as a hashtag
2017-03-05 18:08:19 +01:00
Eugen Rochko 546c4718e7 Localizations for most server-side strings 2016-11-16 00:55:33 +01:00
Eugen Rochko 62292797ec Adding hashtag model 2016-11-04 19:12:59 +01:00