Pipes overview

EDS-NLP provides easy-to-use pipeline components (aka pipes).

Available components

CoreQualifiersMiscellaneousNERTrainableLLM-based

See the Core components overview for more information.

Component	Description
`eds.normalizer`	Non-destructive input text normalisation
`eds.sentences`	Better sentence boundary detection
`eds.matcher`	A simple yet powerful entity extractor
`eds.terminology`	A simple yet powerful terminology matcher
`eds.contextual_matcher`	A conditional entity extractor
`eds.endlines`	An unsupervised model to classify each end line

See the Qualifiers overview for more information.

Pipeline	Description
`eds.negation`	Rule-based negation detection
`eds.family`	Rule-based family context detection
`eds.hypothesis`	Rule-based speculation detection
`eds.reported_speech`	Rule-based reported speech detection
`eds.history`	Rule-based medical history detection

See the Miscellaneous components overview for more information.

Component	Description
`eds.dates`	Date extraction and normalisation
`eds.consultation_dates`	Identify consultation dates
`eds.quantities`	Quantity extraction and normalisation
`eds.sections`	Section detection
`eds.reason`	Rule-based hospitalisation reason detection
`eds.tables`	Tables detection
`eds.split`	Doc splitting
`eds.explode`	Explode entities between multiples copies of a document

See the NER overview for more information.

Component	Description
`eds.covid`	A COVID mentions detector
`eds.charlson`	A Charlson score extractor
`eds.sofa`	A SOFA score extractor
`eds.elston_ellis`	An Elston & Ellis code extractor
`eds.emergency_priority`	A priority score extractor
`eds.emergency_ccmu`	A CCMU score extractor
`eds.emergency_gemsa`	A GEMSA score extractor
`eds.tnm`	A TNM score extractor
`eds.adicap`	A ADICAP codes extractor
`eds.drugs`	A drug mentions extractor
`eds.cim10`	A CIM10 terminology matcher
`eds.umls`	An UMLS terminology matcher
`eds.ckd`	CKD extractor
`eds.copd`	COPD extractor
`eds.cerebrovascular_accident`	Cerebrovascular accident extractor
`eds.congestive_heart_failure`	Congestive heart failure extractor
`eds.connective_tissue_disease`	Connective tissue disease extractor
`eds.dementia`	Dementia extractor
`eds.diabetes`	Diabetes extractor
`eds.hemiplegia`	Hemiplegia extractor
`eds.leukemia`	Leukemia extractor
`eds.liver_disease`	Liver disease extractor
`eds.lymphoma`	Lymphoma extractor
`eds.myocardial_infarction`	Myocardial infarction extractor
`eds.peptic_ulcer_disease`	Peptic ulcer disease extractor
`eds.peripheral_vascular_disease`	Peripheral vascular disease extractor
`eds.solid_tumor`	Solid tumor extractor
`eds.alcohol`	Alcohol consumption extractor
`eds.tobacco`	Tobacco consumption extractor

See the Trainable components overview for more information.

Name	Description
`eds.transformer`	Embed text with a transformer model
`eds.text_cnn`	Contextualize embeddings with a CNN
`eds.span_pooler`	A span embedding component that aggregates word embeddings
`eds.ner_crf`	A trainable component to extract entities
`eds.extractive_qa`	A trainable component for extractive question answering
`eds.span_classifier`	A trainable component for multi-class multi-label span classification
`eds.span_linker`	A trainable entity linker (i.e. to a list of concepts)
`eds.biaffine_dep_parser`	A trainable biaffine dependency parser

See the LLM-based components overview for more information.

Component	Description
`eds.llm_markup_extractor`	Extract structured information using LLMs through markup.
`eds.llm_span_qualifier`	Predict attributes of spans using LLMs.

You can add them to your pipeline by simply calling add_pipe, for instance:

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.tnm())

Basic architecture

Most components provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:

Implement a normaliser (see eds.normalizer)
Add an entity recognition component (eg the simple but powerful eds.matcher component)
Add zero or more entity qualification components, such as eds.negation, eds.family or eds.hypothesis. These qualifiers typically help detect false-positives.

Extraction components

Extraction components (matchers, the date detector or NER components, for instance) keep their results to the doc.ents and doc.spans attributes directly.

By default, some components do not write their output to doc.ents, such as the eds.sections matcher. This is mainly due to the fact that, since doc.ents cannot contain overlapping entities, we filter spans and keep the largest one by default. Since sections usually cover large spans of text, storing them in ents would remove every other overlapping entities.

Entity tagging

Moreover, most components declare extensions, on the Doc, Span and/or Token objects.

These extensions are especially useful for qualifier components, but can also be used by other components to persist relevant information. For instance, the eds.dates component declares a span._.date extension to store a normalised version of each detected date.