Skip to content

Getting started

EDS-NLP is a collaborative NLP framework that aims at extracting information from French clinical notes. At its core, it is a collection of components or pipes, either rule-based functions or deep learning modules. These components are organized into a novel efficient and modular pipeline system, built for hybrid and multitask models. We use spaCy to represent documents and their annotations, and Pytorch as a deep-learning backend for trainable components.

EDS-NLP is versatile and can be used on any textual document. The rule-based components are fully compatible with spaCy's pipelines, and vice versa. This library is a product of collaborative effort, and we encourage further contributions to enhance its capabilities.

Check out our interactive demo !

Quick start

Installation

You can install EDS-NLP via pip. We recommend pinning the library version in your projects, or use a strict package manager like Poetry.

pip install edsnlp==0.14.0

or if you want to use the trainable components (using pytorch)

pip install "edsnlp[ml]==0.14.0"

A first pipeline

Once you've installed the library, let's begin with a very simple example that extracts mentions of COVID19 in a text, and detects whether they are negated.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")  # (1)

terms = dict(
    covid=["covid", "coronavirus"],  # (2)
)

# Sentencizer component, needed for negation detection
nlp.add_pipe(eds.sentences())  # (3)
# Matcher component
nlp.add_pipe(eds.matcher(terms=terms))  # (4)
# Negation detection
nlp.add_pipe(eds.negation())

# Process your text in one call !
doc = nlp("Le patient n'est pas atteint de covid")

doc.ents  # (5)
# Out: (covid,)

doc.ents[0]._.negation  # (6)
# Out: True
  1. 'eds' is the name of the language, which defines the tokenizer.
  2. This example terminology provides a very simple, and by no means exhaustive, list of synonyms for COVID19.
  3. Similarly to spaCy, pipes are added via the nlp.add_pipe method.
  4. See the matching tutorial for mode details.
  5. spaCy stores extracted entities in the Doc.ents attribute.
  6. The eds.negation component has adds a negation custom attribute.

This example is complete, it should run as-is.

Tutorials

To learn more about EDS-NLP, we have prepared a series of tutorials that should cover the main features of the library.

Available pipeline components

See the Core components overview for more information.

Component Description
eds.normalizer Non-destructive input text normalisation
eds.sentences Better sentence boundary detection
eds.matcher A simple yet powerful entity extractor
eds.terminology A simple yet powerful terminology matcher
eds.contextual_matcher A conditional entity extractor
eds.endlines An unsupervised model to classify each end line

See the Qualifiers overview for more information.

Pipeline Description
eds.negation Rule-based negation detection
eds.family Rule-based family context detection
eds.hypothesis Rule-based speculation detection
eds.reported_speech Rule-based reported speech detection
eds.history Rule-based medical history detection

See the Miscellaneous components overview for more information.

Component Description
eds.dates Date extraction and normalisation
eds.consultation_dates Identify consultation dates
eds.quantities Quantity extraction and normalisation
eds.sections Section detection
eds.reason Rule-based hospitalisation reason detection
eds.tables Tables detection
eds.split Doc splitting

See the NER overview for more information.

Component Description
eds.covid A COVID mentions detector
eds.charlson A Charlson score extractor
eds.sofa A SOFA score extractor
eds.elston_ellis An Elston & Ellis code extractor
eds.emergency_priority A priority score extractor
eds.emergency_ccmu A CCMU score extractor
eds.emergency_gemsa A GEMSA score extractor
eds.tnm A TNM score extractor
eds.adicap A ADICAP codes extractor
eds.drugs A drug mentions extractor
eds.cim10 A CIM10 terminology matcher
eds.umls An UMLS terminology matcher
eds.ckd CKD extractor
eds.copd COPD extractor
eds.cerebrovascular_accident Cerebrovascular accident extractor
eds.congestive_heart_failure Congestive heart failure extractor
eds.connective_tissue_disease Connective tissue disease extractor
eds.dementia Dementia extractor
eds.diabetes Diabetes extractor
eds.hemiplegia Hemiplegia extractor
eds.leukemia Leukemia extractor
eds.liver_disease Liver disease extractor
eds.lymphoma Lymphoma extractor
eds.myocardial_infarction Myocardial infarction extractor
eds.peptic_ulcer_disease Peptic ulcer disease extractor
eds.peripheral_vascular_disease Peripheral vascular disease extractor
eds.solid_tumor Solid tumor extractor
eds.alcohol Alcohol consumption extractor
eds.tobacco Tobacco consumption extractor

See the Trainable components overview for more information.

Name Description
eds.transformer Embed text with a transformer model
eds.text_cnn Contextualize embeddings with a CNN
eds.span_pooler A span embedding component that aggregates word embeddings
eds.ner_crf A trainable component to extract entities
eds.span_classifier A trainable component for multi-class multi-label span classification
eds.span_linker A trainable entity linker (i.e. to a list of concepts)

Disclaimer

The performances of an extraction pipeline may depend on the population and documents that are considered.

Contributing to EDS-NLP

We welcome contributions ! Fork the project and propose a pull request. Take a look at the dedicated page for detail.

Citation

If you use EDS-NLP, please cite us as below.

@misc{edsnlp,
  author = {Wajsburt, Perceval and Petit-Jean, Thomas and Dura, Basile and Cohen, Ariel and Jean, Charline and Bey, Romain},
  doi    = {10.5281/zenodo.6424993},
  title  = {EDS-NLP: efficient information extraction from French clinical notes},
  url    = {https://aphp.github.io/edsnlp}
}