Word embedding
embedding:
An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words.
Why use an embedding layer?
The vector encoded using the One-hot method will be very high dimensional and sparse. Suppose we come across a dictionary of 2000 words in natural language processing (NLP). When using One-hot encoding, each word is represented by a vector of 2000 integers, of which 1999 numbers are zeros. If the dictionary were any larger, this method would be much less computationally efficient.
Word embeddings can be thought of as an alternate to one-hot encoding along with dimensionality reduction.
Word Embeddings
It is an approach for representing words and documents. Word Embedding or Word Vector is a numeric vector input that represents a word in a lower-dimensional space. It allows words with similar meaning to have a similar representation. They can also approximate meaning.
Goal of Word Embeddings
- To reduce dimensionality
- To use a word to predict the words around it
- Inter word semantics must be captured
How are Word Embeddings used?
- They are used as input to machine learning models.
- Take the words —-> Give their numeric representation —-> Use in training or inference
- To represent or visualize any underlying patterns of usage in the corpus that was used to train them.