Skip to content

Trainable components overview

In addition to its rule-based pipeline components, EDS-NLP offers new trainable pipelines to fit and run machine learning models for classic biomedical information extraction tasks.

Available components :

Name Description
eds.nested_ner Recognize overlapping or nested entities (replaces spaCy's ner component)

Writing custom models

spaCy models can be written with Thinc (spaCy's deep learning library), Tensorflow or Pytorch. As Pytorch is predominant in the NLP research field, we recommend writing models with the latter to facilitate interactions with the NLP community. To this end, we have written some Pytorch wrapping utilities like wrap_pytorch_model to allow loss and predictions to be computed directly in the Pytorch module.

Utils

Training

In addition to the spaCy train CLI, EDS-NLP offers a train function that can be called in Python directly with an existing spaCy pipeline.

Experimental

This training API is an experimental feature of edsnlp and could change at any time.

Usage

Let us define and train a full pipeline :

from pathlib import Path

import spacy

from edsnlp.connectors.brat import BratConnector
from edsnlp.utils.training import train, make_spacy_corpus_config

tmp_path = Path("/tmp/test-train")

nlp = spacy.blank("eds")
nlp.add_pipe("nested_ner")  # (1)

# Train the model, with additional training configuration
nlp = train(
    nlp,
    output_path=tmp_path / "model",
    config=dict(
        **make_spacy_corpus_config(
            train_data="/path/to/the/training/set/brat/files",
            dev_data="/path/to/the/dev/set/brat/files",
            nlp=nlp,
            data_format="brat",
        ),
        training=dict(
            max_steps=4000,
        ),
    ),
)

# Finally, we can run the pipeline on a new document
doc = nlp("Arret du folfox si inefficace")
doc.spans["drug"]
# Out: [folfox]

doc.spans["criteria"]
# Out: [si folfox inefficace]

# And export new predictions as Brat annotations
predicted_docs = BratConnector("/path/to/the/new/files", run_pipe=True).brat2docs(nlp)
BratConnector("/path/to/predictions").docs2brat(predicted_docs)
  1. you can configure the component using the add_pipe(..., config=...) parameter