The Wayback Machine - https://web.archive.org/web/20200611060252/https://github.com/topics/information-retrieval
Skip to content
#

information-retrieval

Here are 1,034 public repositories matching this topic...

gensim
Benja1972
Benja1972 commented Apr 6, 2020

I would like to process corpus of documents by TFIDF model. My corpus is one txt file where each line is document. It is fine as input for any models from pke, but for TFIDF I need a document frequency matrix which can be generated in pke utilities but it accept only input_dir where files are documents. It would convenient to have option to inject a documents as one file as for all models.
Th

twelveth
twelveth commented Apr 21, 2020

Hi.
This is not an issue, but maybe an enhancement request.
I'm trying to use anserini for custom collection, and it's rather hard for me to figure out how to build pipeline from scratch.

For example, however there are a lot of scripts about how to reproduce some particular results, I couldn't find any information about format of document, which anserini takes as input to index custom collect

pisa
JMMackenzie
JMMackenzie commented Mar 16, 2020

Given recent feedback from HN, we should look at improving how we explain PISA, and offer some benchmarks to common systems like Lucene and Tantivy (perhaps).

We also should document some things such as:

  • Use cases
  • Assumptions (in memory)
  • Target audience and why you would want to use it
  • Limitations
  • Algorithms implemented (in terms of the basics, ie top-k search, Boolean matching

word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding
  • Updated Jun 1, 2020
  • OpenEdge ABL

Improve this page

Add a description, image, and links to the information-retrieval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the information-retrieval topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.