Evaluating Word Similarity Measure of Embeddings Through Binary Classification

October 30, 2022

Evaluating Word Similarity Measure of Embeddings Through Binary Classification

DOI: https://doi.org/10.30564/jcsr.v1i3.1268

Abstract

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.

Bilingual Publishing Group

Evaluating Word Similarity Measure of Embeddings Through Binary Classification

Abstract

Keywords

Full Text:

Comments

Post a Comment

Popular posts from this blog

𝐉𝐨𝐮𝐫𝐧𝐚𝐥 𝐨𝐟 𝐀𝐭𝐦𝐨𝐬𝐩𝐡𝐞𝐫𝐢𝐜 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 | 𝐕𝐨𝐥𝐮𝐦𝐞 𝟎𝟔 | 𝐈𝐬𝐬𝐮𝐞 𝟎𝟐 | 𝐀𝐩𝐫𝐢𝐥 𝟐𝟎𝟐𝟑

Impact of Polymer Coating on the Flexural Strength and Deflection Characteristics of Fiber-Reinforced Concrete Beams

𝐉𝐨𝐮𝐫𝐧𝐚𝐥 𝐨𝐟 𝐀𝐭𝐦𝐨𝐬𝐩𝐡𝐞𝐫𝐢𝐜 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 | 𝐕𝐨𝐥𝐮𝐦𝐞 𝟎𝟔 | 𝐈𝐬𝐬𝐮𝐞 𝟎𝟑 | 𝐉𝐮𝐥𝐲 𝟐𝟎𝟐𝟑