#
text-clustering
Here are 63 public repositories matching this topic...
短文本聚类预处理模块 Short text cluster
-
Updated
Dec 28, 2019 - Python
-
Updated
Jan 4, 2018 - Jupyter Notebook
Library of state-of-the-art models (PyTorch) for NLP tasks
nlp
natural-language-processing
text-classification
machine-translation
pytorch
style-transfer
speech-recognition
text-summarization
nlp-library
text-clustering
punctuation-restoration
-
Updated
Oct 5, 2020 - Python
Easy, fast clustering of texts
-
Updated
Apr 14, 2017 - R
semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).
text-similarity
simhash
transformer
locality-sensitive-hashing
fasttext
bert
text-search
word-vectors
text-clustering
-
Updated
Sep 19, 2020 - Python
Implementation of some algorithms for text clustering
-
Updated
Sep 5, 2018 - Python
Sentence Clustering and visualization. Created Date: 25 Apr 2018
-
Updated
Jan 15, 2020 - Python
Graph clustering and Node embeddings with word2vec
nlp
crawler
clustering
word2vec
word-embeddings
bachelor-thesis
random-walk
graph-clustering
text-clustering
graph-embedding
-
Updated
Mar 2, 2019 - Python
2020 Açık Seminer - Turkish NLP workshop
nlp
natural-language-processing
news
spacy
dataset
named-entity-recognition
ner
turkish-language
k-means-clustering
text-clustering
text-preprocessing
workshop-seminar
-
Updated
May 8, 2020 - Jupyter Notebook
Understanding hateful subreddits through text clustering
-
Updated
Nov 26, 2018 - Python
Cross-lingual Language Model (XLM) pretraining and Model-Agnostic Meta-Learning (MAML) for fast adaptation of deep networks
machine-translation
languages
mlm
tlm
text-processing
pretrained-models
african-languages
bert
denoising-autoencoders
meta-model
clm
maml
text-clustering
xlm
back-translation
parallel-training
bpe-codes
bleu-scores
-
Updated
Mar 26, 2021 - Jupyter Notebook
Using word embeddings, TFIDF and text-hashing to cluster and visualise text documents
clustering
dimensionality-reduction
text-processing
d3js
document-clustering
umap
computational-social-science
text-clustering
text-features
-
Updated
Nov 7, 2019 - Python
Domain Discovery Operations API formalizes the human domain discovery process by defining a set of operations that capture the essential tasks that lead to domain discovery on the Web as we have discovered in interacting with the Subject Matter Experts (SME)s.
information-retrieval
text-mining
text-classification
domain-discovery
topic-discovery
text-clustering
-
Updated
Jul 2, 2021 - Python
Chapter 3: Text and Speech Basics
-
Updated
Jul 23, 2019 - Jupyter Notebook
heuristic matching of large databases by fuzzy criteria like addresses
-
Updated
Feb 18, 2021 - xBase
Python Program for Text Clustering using Bisecting k-means
-
Updated
Dec 12, 2017 - Jupyter Notebook
simple text clustering using kmeans algorithm
-
Updated
Oct 30, 2018 - Python
DBSCAN algorithm from scratch in Python -- to cluster text records.
-
Updated
May 18, 2018 - Python
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
python
nlp
ocr
text-similarity
text-generation
pytorch
topic-modeling
summarization
research-tool
arxiv
research-data-management
scientific-publications
research-and-development
research-software-engineering
scientific-research
text-clustering
arxiv-api
pdf-document-processor
title-generation
-
Updated
Jul 22, 2021 - Python
Topic Modeling and Text Cluster Analysis
-
Updated
Dec 11, 2019 - Jupyter Notebook
It is a very different task, as here I am going to cluster 200 different texts related to games and sports in 2 or more different clusters. we can also use zipf plot to determine how many useful clusters can be formed.
data-mining
data-visualization
data-analysis
elbow
pattern-recognition
kmeans
cluster-analysis
kmeans-clustering
zipf
text-clustering
-
Updated
Jun 8, 2019 - Jupyter Notebook
TFIDF being the most basic and simple topic in NLP, there's alot that can be done using TFIDF only! So, in this repo, I'll be adding the blog, TFIDF basics, wonders done using tfidf etc.
python
nlp
text-similarity
tfidf
text-clustering
textclassification
tfidf-vectorizer
tfidfvectorizer
-
Updated
Jun 15, 2020 - Jupyter Notebook
Information Retrieval project implementation
-
Updated
Oct 31, 2019 - Jupyter Notebook
It's the HAC algorithm that Im using to sort newspaper articles by news. You can adapt it to pretty much any type of text.
news
newspaper
kmeans
hac
clustering-algorithm
kmeans-clustering
hierarchical-clustering
kmeans-algorithm
text-clustering
silhouette-score
-
Updated
Jul 22, 2020 - Python
Using an Autoencoder to encode features for k-Means Clustering on the AG News Dataset
python
nlp
machine-learning
natural-language-processing
deep-neural-networks
deep-learning
clustering
python3
pytorch
artificial-intelligence
autoencoder
artificial-neural-networks
unsupervised-learning
k-means-clustering
text-clustering
ag-news-dataset
-
Updated
Jul 9, 2021 - Python
Text clustering in spark with scala using LDA Model on a TF-IDF matrix
java
scala
spark
apache-spark
clustering
tf-idf
idf
spark-sql
lda-model
tf
text-clustering
lda-topic-modeling
-
Updated
Oct 13, 2020 - Scala
Analysis and Visualizations for COVID-19 Bing search engine queries + Classifier pipeline for predicting country based on search query.
machine-learning
natural-language-processing
deep-learning
text-classification
word2vec
word-embeddings
text-processing
pandemic
bing-search
world-health-organization
text-clustering
text-classifier
text-visualization
text-classification-python
coronavirus
covid-19
-
Updated
Oct 24, 2020 - HTML
Buckshot++ is a new algorithm that finds highly stable clusters efficiently.
-
Updated
Jan 24, 2019 - Jupyter Notebook
Improve this page
Add a description, image, and links to the text-clustering topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the text-clustering topic, visit your repo's landing page and select "manage topics."


It would be great to have more friendly and funny doctest text content (instead of "Aha", "Text", ...). It's also nicer for users if the docstring examples are all similar.
One idea, for instance, is to use famous sentences said by movie Superheroes. Here are a few examples: