Word embeddings
EDS-NLP proposes rule-based components exclusively. However, that does not prohibit you from exploiting spaCy's machine learning capabilities! You can mix and match machine learning pipelines, trainable or not, with EDS-NLP rule-based components.
In this tutorial, we will explore how you can use static word vectors trained with Gensim within spaCy.
Training the word embedding, however, is outside the scope of this post. You'll find very well designed resources on the subject in Gensim's documenation.
Using Transformer models
spaCy v3 introduced support for Transformer models through their helper library spacy-transformers
that interfaces with
HuggingFace's transformers
library.
Using transformer models can significantly increase your model's performance.
Adding pre-trained word vectors
spaCy provides a init vectors
CLI utility that takes a Gensim-trained binary and transforms it to a spaCy-readable pipeline.
Using it is straightforward :
See the documentation for implementation details.