Word2Vec是一种用于将词语嵌入(word embedding)到低维向量空间的方法,它由Tomas Mikolov等人在2013年提出。Word2Vec的目标是通过神经网络模型,将词语表示成实值向量,使得具有相似语义的词语在向量空间中彼此接近。Word2Vec有两种主要的模型结构:连续词袋模型(Continuous Bag of Words, CBOW)和跳字模型(Skip-gram)。
1.Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space.
2.Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems (NIPS).