Skip to content

Word embeddings

EDS-NLP proposes rule-based components exclusively. However, that does not prohibit you from exploiting spaCy's machine learning capabilities! You can mix and match machine learning pipelines, trainable or not, with EDS-NLP rule-based components.

In this tutorial, we will explore how you can use static word vectors trained with Gensim within spaCy.

Training the word embedding, however, is outside the scope of this post. You'll find very well designed resources on the subject in Gensim's documenation.

Using Transformer models

spaCy v3 introduced support for Transformer models through their helper library spacy-transformers that interfaces with HuggingFace's transformers library.

Using transformer models can significantly increase your model's performance.

Adding pre-trained word vectors

spaCy provides a init vectors CLI utility that takes a Gensim-trained binary and transforms it to a spaCy-readable pipeline.

Using it is straightforward :

s

See the documentation for implementation details.

Back to top