|
Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://www.sai.msu.su/~megera/wiki/tsearch2UTF8Test
Дата изменения: Unknown Дата индексирования: Sun Apr 10 19:49:15 2016 Кодировка: Поисковые слова: http www.astronomy.com |
This is a completely rewritten parser for tsearch2 with full UTF8 support. Parser uses finite-state automata technique and expected to be flexible and compatible with old tsearch2 parser (fixed some errors).
A list of current issues in parser (available from CVS HEAD).
Multiple consecutive slashes ('////'): broken
test=# select * from parse('~//downloads////qq');
tokid | token
-------+------------
12 | ~
12 | /
19 | /downloads
12 | /
12 | /
12 | /
19 | /qq
(7 rows)
We consider '_' as space symbol
test=# select * from parse('a_b_c');
tokid | token
-------+-------
1 | a
12 | _
1 | b
12 | _
1 | c
XHTML tag: broken (FIXED)
test=# select * from parse('<br/>');
tokid | token
-------+-------
12 | <
1 | br
12 | />
word…: broken (FIXED)
test=# select * from parse('etc...');
tokid | token
-------+-------
19 | etc..
12 | .
~ in path: broken (FIXED)
test=# select * from parse('~/downloads/Harry_Potter.avi');
tokid | token
-------+-----------------------------
12 | ~
19 | /downloads/Harry_Potter.avi
version: broken (FIXED)
test=# select * from parse('-1.2.3');
tokid | token
-------+-------
20 | -1.2
12 | .
22 | 3
but see below:
test=# select * from parse('version-1.2.3');
tokid | token
-------+---------------
15 | version-1.2.3
11 | version
12 | -
8 | 1.2.3
Backslash(\) handling: broken (BRR)
select * from parse('a \ b ');
tokid | token
-------+-------
1 | a
12 |
1 | b
12 |