Skip to content

Commit 308fc71

Browse files
authored
Make title matching a bit smarter (#23)
Signed-off-by: Hofi <hofione@gmail.com>
2 parents a936b42 + cc3c564 commit 308fc71

File tree

2 files changed

+23
-20
lines changed

2 files changed

+23
-20
lines changed

_plugins/generate_tooltips.rb

+11-8
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ def process_markdown_parts(page, markdown)
126126

127127
# Search for known link titles
128128
# NOTE: Using multi line matching here will not help either if the pattern itself is in the middle broken/spaned to multiple lines, so using whitespace replacements now inside the patter to handle this, see above!
129-
full_pattern = /(^|[\s.,;:&'"\-(])(#{pattern})([\s.,;:&'"\-)]|\z)(?![^<]*?<\/a>)/
129+
full_pattern = /(^|[\s.,;:&'"(])(#{pattern})([\s.,;:&'")]|\z)(?![^<]*?<\/a>)/
130130
markdown_part = process_markdown_part(page, markdown_part, page_links, full_pattern, id, url, needs_tooltip, true)
131131
else
132132
# Content inside of special Markdown blocks
@@ -232,7 +232,11 @@ def page_links_ids_sorted_by_title(page_links)
232232
end
233233
end
234234

235-
sorted_arr.sort_by { |page| page["title"].downcase }.reverse
235+
# With this reversed length sort order we try to guarantie that
236+
# the autolink/tooltip title pattern matching finds titles like
237+
# 'Soft macros' before 'macros'
238+
# In most of the cases matching the longer titles first will eliminate such issues
239+
sorted_arr.sort_by { |page| page["title"].length }.reverse
236240
end
237241

238242
def gen_page_link_data(links_dir, link_files_pattern)
@@ -287,16 +291,15 @@ def gen_page_link_data(links_dir, link_files_pattern)
287291
puts "Unknow ID (#{alias_id}) in alias definition"
288292
exit 4
289293
end
290-
_, aliases = alias_data.first
291-
page_link_data["title"] = aliases.concat(page_link_data["title"])
292-
#puts "page_link_data: #{page_link_data}"
294+
page_link_data["title"].concat(alias_data["aliases"])
295+
# puts "page_link_data: #{page_link_data}"
293296
end
294297

295298
# Just for debugging
296299
# pp page_links_dictionary
297-
page_links_ids_sorted_by_title(page_links_dictionary).each do |data|
298-
#puts data
299-
end
300+
# page_links_ids_sorted_by_title(page_links_dictionary).each do |data|
301+
# puts data
302+
# end
300303

301304
#pp page_links_dictionary
302305
return page_links_dictionary

doc/site-internal/lunr_search_help.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ Please visit the original Lunar site for more information.
1313
The simplest way to start is to pass the text on which you want to search into the search field:
1414

1515
``` javascript
16-
'foo'
16+
foo
1717
```
1818

1919
The above will return details of all documents that match the term “foo”. Although it looks like a string, the search method parses the string into a search query. This supports special syntax for defining more complex queries.
2020

2121
Searches for multiple terms are also supported. If a document matches at least one of the search terms, it will show in the results. The search terms are combined with OR.
2222

2323
``` javascript
24-
'foo bar'
24+
foo bar
2525
```
2626

2727
The above example will match documents that contain either “foo” or “bar”. Documents that contain both will score more highly and will be returned first.
@@ -37,21 +37,21 @@ For example, let’s say you’re indexing a collection of documents about JavaS
3737
Lunr supports wildcards when performing searches. A wildcard is represented as an asterisk (*) and can appear anywhere in a search term. For example, the following will match all documents with words beginning with “foo”:
3838

3939
``` javascript
40-
'foo*'
40+
foo*
4141
```
4242

4343
This will match all documents that end with ‘oo’:
4444

4545
``` javascript
46-
'*oo'
46+
*oo
4747
```
4848

4949
Leading wildcards, as in the above example, should be used sparingly. They can have a negative impact on the performance of a search, especially in large indexes.
5050

5151
Finally, a wildcard can be in the middle of a term. The following will match any documents that contain a term that begins with “f” and ends in “o”:
5252

5353
``` javascript
54-
'f*o'
54+
f*o
5555
```
5656

5757
It is also worth noting that, when a search term contains a wildcard, no stemming is performed on the search term.
@@ -65,51 +65,51 @@ To indicate that a term must be present in matching documents the term should be
6565
The below example searches for documents that must contain “foo”, might contain “bar” and must not contain “baz”:
6666

6767
``` javascript
68-
'+foo bar -baz'
68+
+foo bar -baz
6969
```
7070

7171
To simulate a logical AND search of “foo AND bar” mark both terms as required:
7272

7373
``` javascript
74-
'+foo +bar'
74+
+foo +bar
7575
```
7676

7777
## Fields
7878

7979
By default, Lunr will search all fields in a document for the query term, and it is possible to restrict a term to a specific field. The following example searches for the term “foo” in the field title:
8080

8181
``` javascript
82-
'title:foo'
82+
title:foo
8383
```
8484

8585
The search term is prefixed with the name of the field, followed by a colon (:). The field must be one of the fields defined when building the index. Unrecognised fields will lead to an error.
8686

8787
Field-based searches can be combined with all other term modifiers and wildcards, as well as other terms. For example, to search for words beginning with “foo” in the title or with “bar” in any field the following query can be used:
8888

8989
``` javascript
90-
'title:foo* bar'
90+
title:foo* bar
9191
```
9292

9393
## Boosts
9494

9595
In multi-term searches, a single term may be important than others. For these cases Lunr supports term level boosts. Any document that matches a boosted term will get a higher relevance score, and appear higher up in the results. A boost is applied by appending a caret (^) and then a positive integer to a term.
9696

9797
``` javascript
98-
'foo^10 bar'
98+
foo^10 bar
9999
```
100100

101101
The above example weights the term “foo” 10 times higher than the term “bar”. The boost value can be any positive integer, and different terms can have different boosts:
102102

103103
``` javascript
104-
'foo^10 bar^5 baz'
104+
foo^10 bar^5 baz
105105
```
106106

107107
## Fuzzy Matches
108108

109109
Lunr supports fuzzy matching search terms in documents, which can be helpful if the spelling of a term is unclear, or to increase the number of search results that are returned. The amount of fuzziness to allow when searching can also be controlled. Fuzziness is applied by appending a tilde (~) and then a positive integer to a term. The following search matches all documents that have a word within 1 edit distance of “foo”:
110110

111111
``` javascript
112-
'foo~1'
112+
foo~1
113113
```
114114

115115
An edit distance of 1 allows words to match if either adding, removing, changing or transposing a character in the word would lead to a match. For example “boo” requires a single edit (replacing “f” with “b”) and would match, but “boot” would not as it also requires an additional “t” at the end.

0 commit comments

Comments
 (0)