Relevance maximization for high-recall retrieval problem: finding all needles in a haystack

Citations

WEB OF SCIENCE

8
Citations

SCOPUS

7

초록

High-recall retrieval problem, aiming at finding the full set of relevant documents in a huge result set by effective mining techniques, is particularly useful for patent information retrieval, legal document retrieval, medical document retrieval, market information retrieval, and literature review. The existing high-recall retrieval methods, however, have been far from satisfactory to retrieve all relevant documents due to not only high-recall and precision threshold measurements but also a sheer minimize the number of reviewed documents. To address this gap, we generalize the problem to a novel high-recall retrieval model, which can be represented as finding all needles in a giant haystack. To compute candidate groups consisting ofkrelevant documents efficiently, we propose dynamic diverse retrieval algorithms specialized for the patent-searching method, in which an effective dynamic interactive retrieval can be achieved. In the various types of datasets, the dynamic ranking method shows considerable improvements with respect to time and cost over the conventional static ranking approaches.

키워드

High-recall retrieval problemPatent retrievalDiversity retrievalINDEPENDENCEPARAMETERSDOMINATION
제목
Relevance maximization for high-recall retrieval problem: finding all needles in a haystack
저자
Song, Justin JongSuLee, Wookey
DOI
10.1007/s11227-016-1956-8
발행일
2020-10
유형
Article
저널명
Journal of Supercomputing
76
10
페이지
7734 ~ 7757