Negation
The eds.negation
pipeline uses a simple rule-based algorithm to detect negated spans. It was designed at AP-HP's EDS, following the insights of the NegEx algorithm by Chapman et al1.
Usage
The following snippet matches a simple terminology, and checks the polarity of the extracted entities. It is complete and can be run as is.
import spacy
nlp = spacy.blank("fr")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
"eds.matcher",
config=dict(terms=dict(patient="patient", fracture="fracture")),
)
nlp.add_pipe("eds.negation")
text = (
"Le patient est admis le 23 août 2021 pour une douleur au bras. "
"Le scanner ne détecte aucune fracture."
)
doc = nlp(text)
doc.ents
# Out: [patient, fracture]
doc.ents[0]._.negation # (1)
# Out: False
doc.ents[1]._.negation
# Out: True
- The result of the pipeline is kept in the
negation
custom extension.
Configuration
The pipeline can be configured using the following parameters :
Parameter | Explanation | Default |
---|---|---|
attr |
spaCy attribute to match on (eg NORM , TEXT , LOWER ) |
"NORM" |
pseudo |
Pseudo-negation patterns | None (use pre-defined patterns) |
preceding |
Preceding negation patterns | None (use pre-defined patterns) |
following |
Following negation patterns | None (use pre-defined patterns) |
termination |
Termination patterns (for syntagma/proposition extraction) | None (use pre-defined patterns) |
verbs |
Patterns for verbs that imply a negation | None (use pre-defined patterns) |
on_ents_only |
Whether to qualify pre-extracted entities only | True |
within_ents |
Whether to look for negations within entities | False |
explain |
Whether to keep track of the cues for each entity | False |
Declared extensions
The eds.negation
pipeline declares two spaCy extensions, on both Span
and Token
objects :
- The
negation
attribute is a boolean, set toTrue
if the pipeline predicts that the span/token is negated. - The
negation_
property is a human-readable string, computed from thenegation
attribute. It implements a simple getter function that outputsAFF
orNEG
, depending on the value ofnegation
.
Performance
The pipeline's performance is measured on three datasets :
- The ESSAI2 and CAS3 datasets were developed at the CNRS. The two are concatenated.
- The NegParHyp corpus was specifically developed at AP-HP to test the pipeline on actual clinical notes, using pseudonymised notes from the AP-HP.
Dataset | Negation F1 |
---|---|
CAS/ESSAI | 71% |
NegParHyp | 88% |
NegParHyp corpus
The NegParHyp corpus was built by matching a subset of the MeSH terminology with around 300 documents from AP-HP's clinical data warehouse. Matched entities were then labelled for negation, speculation and family context.
Authors and citation
The eds.negation
pipeline was developed by AP-HP's Data Science team.
-
Wendy W. Chapman, Will Bridewell, Paul Hanbury, Gregory F. Cooper, and Bruce G. Buchanan. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. Journal of Biomedical Informatics, 34(5):301–310, October 2001. URL: https://linkinghub.elsevier.com/retrieve/pii/S1532046401910299 (visited on 2020-12-31), doi:10.1006/jbin.2001.1029. ↩
-
Clément Dalloux, Vincent Claveau, and Natalia Grabar. Détection de la négation : corpus français et apprentissage supervisé. In SIIM 2017 - Symposium sur l'Ingénierie de l'Information Médicale, 1–8. Toulouse, France, November 2017. URL: https://hal.archives-ouvertes.fr/hal-01659637. ↩
-
Natalia Grabar, Vincent Claveau, and Clément Dalloux. CAS: French Corpus with Clinical Cases. In LOUHI 2018 - The Ninth International Workshop on Health Text Mining and Information Analysis, Ninth International Workshop on Health Text Mining and Information Analysis (LOUHI) Proceedings of the Workshop, 1–7. Bruxelles, France, October 2018. URL: https://hal.archives-ouvertes.fr/hal-01937096. ↩