|
Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://www.sai.msu.su/~megera/oddmuse/index.cgi/2010-01-12
Дата изменения: Unknown Дата индексирования: Mon Apr 11 07:57:34 2016 Кодировка: Поисковые слова: molecular cloud |
The problem: Find what keywords are in text. We have papers and kw (keywords). kw.ts is a plainto_tsquery(kw.name).
Table "public.papers"
Column | Type | Modifiers
-------------------+----------+-----------
id | integer |
oai_id | text |
datestamp | date |
title | text |
authors | text |
description | text |
comment | text |
creation_date | date |
modification_date | date |
fts | tsvector |
Indexes:
"fts_idx" gin (fts)
"fts_types_idx" gin (document_token_types(comment))
"id_idx" btree (id)
Table "public.kw"
Column | Type | Modifiers
-----------+-----------------------+-----------
key_id | integer |
name | character varying(64) |
status_id | integer |
ts | tsquery |
arxiv=# select papers.title, array_agg(kw.name) from papers
join kw on papers.fts @@ kw.ts and papers.id=2
group by papers.title;
title | array_agg
------------------------------------------+---------------------------------
Sparsity-certifying Graph Decompositions | {color,Williams,trees,colorful}
(1 row)
In principle, it's possible to use this approach to find plagiarism.
Other way is to use ts_stat) function, which decomposes tsvector on words.
CREATE OR REPLACE FUNCTION ts_stat(tsvector, weights text, OUT word text, OUT ndoc
integer, OUT nentry integer)
RETURNS SETOF record AS
$$
SELECT ts_stat('SELECT ' || quote_literal( $1::text ) || '::tsvector', quote_literal( $2::text) );
$$ LANGUAGE SQL RETURNS NULL ON NULL INPUT IMMUTABLE;
select ARRAY ( select (ts_stat(fts,'*')).word from papers where id=2);