Trainable components overview
In addition to its rule-based pipeline components, EDS-NLP offers new trainable pipelines to fit and run machine learning models for classic biomedical information extraction tasks.
Available components :
Name | Description |
---|---|
eds.nested_ner |
Recognize overlapping or nested entities (replaces spaCy's ner component) |
Writing custom models
spaCy models can be written with Thinc (spaCy's deep learning library), Tensorflow or Pytorch. As Pytorch is predominant in the NLP research field, we recommend writing models with the latter to facilitate interactions with the NLP community. To this end, we have written some Pytorch wrapping utilities like wrap_pytorch_model to allow loss and predictions to be computed directly in the Pytorch module.
Utils
Training
In addition to the spaCy train
CLI, EDS-NLP offers a train
function that can be called in Python directly with an existing spaCy pipeline.
Experimental
This training API is an experimental feature of edsnlp and could change at any time.
Usage
Let us define and train a full pipeline :
from pathlib import Path
import spacy
from edsnlp.connectors.brat import BratConnector
from edsnlp.utils.training import train, make_spacy_corpus_config
tmp_path = Path("/tmp/test-train")
nlp = spacy.blank("eds")
nlp.add_pipe("nested_ner") #
# Train the model, with additional training configuration
nlp = train(
nlp,
output_path=tmp_path / "model",
config=dict(
**make_spacy_corpus_config(
train_data="/path/to/the/training/set/brat/files",
dev_data="/path/to/the/dev/set/brat/files",
nlp=nlp,
data_format="brat",
),
training=dict(
max_steps=4000,
),
),
)
# Finally, we can run the pipeline on a new document
doc = nlp("Arret du folfox si inefficace")
doc.spans["drug"]
# Out: [folfox]
doc.spans["criteria"]
# Out: [si folfox inefficace]
# And export new predictions as Brat annotations
predicted_docs = BratConnector("/path/to/the/new/files", run_pipe=True).brat2docs(nlp)
BratConnector("/path/to/predictions").docs2brat(predicted_docs)