The Wayback Machine - https://web.archive.org/web/20210820162005/https://github.com/topics/wordembedding
Skip to content
#

wordembedding

Here are 36 public repositories matching this topic...

A word embedding is a learned representation for text where words that have the same meaning have a similar representation.Word embeddings are in fact a class of techniques where individual words are represented as real-valued vectors in a predefined vector space. Each word is mapped to one vector and the vector values are learned in a way that resembles a neural network
  • Updated Jun 25, 2020
  • Jupyter Notebook

The aim of this project is to discover OOV(out of vocabulary) from Sina Weibo and to understand OOV by using the Word2Vec model. The first step was generated word lists through Mutual information and Left and Right Entropy Measures from news corpus of Sina Weibo was crawled, and OOV was extracted from the word lists through online dictionaries. The second step was extracted the relevant corpus containing OOV from Weibo. The third step, a third-party tool was used to divide the corpus into words and to obtain the distributed representation of words using Word2Vec's CBOW(continues bag of word) and Skip-Gram models. The fourth step was distributed representation information is used to compute words that are similar to the OOV in order to achieve semantic understanding of the OOV.The final result model has a high rate of correct word comprehension and is able to understand most of the OOV.
  • Updated Jul 7, 2021
  • Python

Improve this page

Add a description, image, and links to the wordembedding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the wordembedding topic, visit your repo's landing page and select "manage topics."

Learn more