Sunday, August 7, 2022

Colab Word2Vec Using Emoji2Vec dataset

 .

Import the required libraries:

import gensim
from gensim.models import word2vec
from gensim.models import KeyedVectors
from sklearn.metrics.pairwise import cosine_similarity

.

Upload Word2Vec emoji2vec dataset into /content folder:

.

Alternatively, get from Archive.org:

!wget 'https://archive.org/download/word-embeddings/emoji2vec.bin' -P '/content'

.

Load Emoji2Vec model

import gensim.models as gsm
e2v = gensim.models.KeyedVectors.load_word2vec_format('/content/emoji2vec.bin', binary=True)
happy_vector = e2v['😂']    # Produces an embedding vector of length 300
print(happy_vector.shape)

>>>>(300,)

.

Find cosine similarity:

v_king = e2v["🤴"
v_queen = e2v["👸"]
print(v_king.shape)
print(v_queen.shape)
cosine_similarity([v_king],[v_queen])

>>>>(300,)

(300,)

array([[0.48802766]], dtype=float32)

.

Reference:

https://github.com/uclnlp/emoji2vec



No comments: