Pipes overview
EDS-NLP provides easy-to-use pipeline components (aka pipes).
Available components
See the Core components overview for more information.
Component | Description |
---|---|
eds.normalizer | Non-destructive input text normalisation |
eds.sentences | Better sentence boundary detection |
eds.matcher | A simple yet powerful entity extractor |
eds.terminology | A simple yet powerful terminology matcher |
eds.contextual_matcher | A conditional entity extractor |
eds.endlines | An unsupervised model to classify each end line |
See the Qualifiers overview for more information.
Pipeline | Description |
---|---|
eds.negation | Rule-based negation detection |
eds.family | Rule-based family context detection |
eds.hypothesis | Rule-based speculation detection |
eds.reported_speech | Rule-based reported speech detection |
eds.history | Rule-based medical history detection |
See the Miscellaneous components overview for more information.
Component | Description |
---|---|
eds.dates | Date extraction and normalisation |
eds.consultation_dates | Identify consultation dates |
eds.quantities | Quantity extraction and normalisation |
eds.sections | Section detection |
eds.reason | Rule-based hospitalisation reason detection |
eds.tables | Tables detection |
eds.split | Doc splitting |
See the NER overview for more information.
Component | Description |
---|---|
eds.covid | A COVID mentions detector |
eds.charlson | A Charlson score extractor |
eds.sofa | A SOFA score extractor |
eds.elston_ellis | An Elston & Ellis code extractor |
eds.emergency_priority | A priority score extractor |
eds.emergency_ccmu | A CCMU score extractor |
eds.emergency_gemsa | A GEMSA score extractor |
eds.tnm | A TNM score extractor |
eds.adicap | A ADICAP codes extractor |
eds.drugs | A drug mentions extractor |
eds.cim10 | A CIM10 terminology matcher |
eds.umls | An UMLS terminology matcher |
eds.ckd | CKD extractor |
eds.copd | COPD extractor |
eds.cerebrovascular_accident | Cerebrovascular accident extractor |
eds.congestive_heart_failure | Congestive heart failure extractor |
eds.connective_tissue_disease | Connective tissue disease extractor |
eds.dementia | Dementia extractor |
eds.diabetes | Diabetes extractor |
eds.hemiplegia | Hemiplegia extractor |
eds.leukemia | Leukemia extractor |
eds.liver_disease | Liver disease extractor |
eds.lymphoma | Lymphoma extractor |
eds.myocardial_infarction | Myocardial infarction extractor |
eds.peptic_ulcer_disease | Peptic ulcer disease extractor |
eds.peripheral_vascular_disease | Peripheral vascular disease extractor |
eds.solid_tumor | Solid tumor extractor |
eds.alcohol | Alcohol consumption extractor |
eds.tobacco | Tobacco consumption extractor |
See the Trainable components overview for more information.
Name | Description |
---|---|
eds.transformer | Embed text with a transformer model |
eds.text_cnn | Contextualize embeddings with a CNN |
eds.span_pooler | A span embedding component that aggregates word embeddings |
eds.ner_crf | A trainable component to extract entities |
eds.span_classifier | A trainable component for multi-class multi-label span classification |
eds.span_linker | A trainable entity linker (i.e. to a list of concepts) |
You can add them to your pipeline by simply calling add_pipe
, for instance:
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.tnm())
Basic architecture
Most components provided by EDS-NLP aim to qualify pre-extracted entities. To wit, the basic usage of the library:
- Implement a normaliser (see
eds.normalizer
) - Add an entity recognition component (eg the simple but powerful
eds.matcher
component) - Add zero or more entity qualification components, such as
eds.negation
,eds.family
oreds.hypothesis
. These qualifiers typically help detect false-positives.
Extraction components
Extraction components (matchers, the date detector or NER components, for instance) keep their results to the doc.ents
and doc.spans
attributes directly.
By default, some components do not write their output to doc.ents
, such as the eds.sections
matcher. This is mainly due to the fact that, since doc.ents
cannot contain overlapping entities, we filter spans and keep the largest one by default. Since sections usually cover large spans of text, storing them in ents would remove every other overlapping entities.
Entity tagging
Moreover, most components declare extensions, on the Doc
, Span
and/or Token
objects.
These extensions are especially useful for qualifier components, but can also be used by other components to persist relevant information. For instance, the eds.dates
component declares a span._.date
extension to store a normalised version of each detected date.