Skip to content

Pipes overview

EDS-NLP provides easy-to-use pipeline components (aka pipes).

Available components

See the Core components overview for more information.

Component Description
eds.normalizer Non-destructive input text normalisation
eds.sentences Better sentence boundary detection
eds.matcher A simple yet powerful entity extractor
eds.terminology A simple yet powerful terminology matcher
eds.contextual_matcher A conditional entity extractor
eds.endlines An unsupervised model to classify each end line

See the Qualifiers overview for more information.

Pipeline Description
eds.negation Rule-based negation detection
eds.family Rule-based family context detection
eds.hypothesis Rule-based speculation detection
eds.reported_speech Rule-based reported speech detection
eds.history Rule-based medical history detection

See the Miscellaneous components overview for more information.

Component Description
eds.dates Date extraction and normalisation
eds.consultation_dates Identify consultation dates
eds.measurements Measure extraction and normalisation
eds.sections Section detection
eds.reason Rule-based hospitalisation reason detection
eds.tables Tables detection

See the NER overview for more information.

Component Description
eds.covid A COVID mentions detector
eds.charlson A Charlson score extractor
eds.sofa A SOFA score extractor
eds.elston_ellis An Elston & Ellis code extractor
eds.emergency_priority A priority score extractor
eds.emergency_ccmu A CCMU score extractor
eds.emergency_gemsa A GEMSA score extractor
eds.tnm A TNM score extractor
eds.adicap A ADICAP codes extractor
eds.drugs A drug mentions extractor
eds.cim10 A CIM10 terminology matcher
eds.umls An UMLS terminology matcher
eds.ckd CKD extractor
eds.copd COPD extractor
eds.cerebrovascular_accident Cerebrovascular accident extractor
eds.congestive_heart_failure Congestive heart failure extractor
eds.connective_tissue_disease Connective tissue disease extractor
eds.dementia Dementia extractor
eds.diabetes Diabetes extractor
eds.hemiplegia Hemiplegia extractor
eds.leukemia Leukemia extractor
eds.liver_disease Liver disease extractor
eds.lymphoma Lymphoma extractor
eds.myocardial_infarction Myocardial infarction extractor
eds.peptic_ulcer_disease Peptic ulcer disease extractor
eds.peripheral_vascular_disease Peripheral vascular disease extractor
eds.solid_tumor Solid tumor extractor
eds.alcohol Alcohol consumption extractor
eds.tobacco Tobacco consumption extractor

See the Trainable components overview for more information.

Name Description
eds.transformer Embed text with a transformer model
eds.text_cnn Contextualize embeddings with a CNN
eds.span_pooler A span embedding component that aggregates word embeddings
eds.ner_crf A trainable component to extract entities
eds.span_qualifier A trainable component for multi-class multi-label span qualification

You can add them to your pipeline by simply calling add_pipe, for instance:

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sentences")
nlp.add_pipe("eds.tnm")

Basic architecture

Most components provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:

  1. Implement a normaliser (see eds.normalizer)
  2. Add an entity recognition component (eg the simple but powerful eds.matcher component)
  3. Add zero or more entity qualification components, such as eds.negation, eds.family or eds.hypothesis. These qualifiers typically help detect false-positives.

Extraction components

Extraction components (matchers, the date detector or NER components, for instance) keep their results to the doc.ents and doc.spans attributes directly.

By default, some components do not write their output to doc.ents, such as the eds.sections matcher. This is mainly due to the fact that, since doc.ents cannot contain overlapping entities, we filter spans and keep the largest one by default. Since sections usually cover large spans of text, storing them in ents would remove every other overlapping entities.

Entity tagging

Moreover, most components declare extensions, on the Doc, Span and/or Token objects.

These extensions are especially useful for qualifier components, but can also be used by other components to persist relevant information. For instance, the eds.dates component declares a span._.date extension to store a normalised version of each detected date.