Negation
The eds.negation
component uses a simple rule-based algorithm to detect negated spans. It was designed at AP-HP's EDS, following the insights of the NegEx algorithm by Chapman et al., 2001.
The component looks for five kinds of expressions in the text :
- preceding negations, i.e., cues that precede a negated expression
- following negations, i.e., cues that follow a negated expression
- pseudo negations : contain a negation cue, but are not negations (eg "pas de doute"/"no doubt")
- negation verbs, i.e., verbs that indicate a negation
- terminations, i.e., words that delimit propositions. The negation spans from the preceding cue to the termination.
Examples
The following snippet matches a simple terminology, and checks the polarity of the extracted entities. It is complete and can be run as is.
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
# Dummy matcher
nlp.add_pipe(eds.matcher(terms=dict(patient="patient", fracture="fracture")))
nlp.add_pipe(eds.negation())
text = (
"Le patient est admis le 23 août 2021 pour une douleur au bras. "
"Le scanner ne détecte aucune fracture."
)
doc = nlp(text)
doc.ents
# Out: (patient, fracture)
doc.ents[0]._.negation # (1)
# Out: False
doc.ents[1]._.negation
# Out: True
- The result of the component is kept in the
negation
custom extension.
Extensions
The eds.negation
component declares two extensions, on both Span
and Token
objects :
- The
negation
attribute is a boolean, set toTrue
if the component predicts that the span/token is negated. - The
negation_
property is a human-readable string, computed from thenegation
attribute. It implements a simple getter function that outputsAFF
orNEG
, depending on the value ofnegation
.
Performance
The component's performance is measured on three datasets :
- The ESSAI (Dalloux et al., 2017) and CAS (Grabar et al., 2018) datasets were developed at the CNRS. The two are concatenated.
- The NegParHyp corpus was specifically developed at AP-HP to test the component on actual clinical notes, using pseudonymised notes from the AP-HP.
Dataset | Negation F1 |
---|---|
CAS/ESSAI | 71% |
NegParHyp | 88% |
NegParHyp corpus
The NegParHyp corpus was built by matching a subset of the MeSH terminology with around 300 documents from AP-HP's clinical data warehouse. Matched entities were then labelled for negation, speculation and family context.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object. TYPE: |
name | The component name. TYPE: |
attr | spaCy's attribute to use TYPE: |
pseudo | List of pseudo negation cues. TYPE: |
preceding | List of preceding negation cues TYPE: |
preceding_regex | List of preceding negation cues, but as regexes. TYPE: |
following | List of following negation cues. TYPE: |
verbs | List of negation verbs. TYPE: |
termination | List of termination terms. TYPE: |
span_getter | Which entities should be classified. By default, TYPE: |
on_ents_only | Deprecated, use Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE: |
within_ents | Whether to consider cues within entities. TYPE: |
explain | Whether to keep track of cues for each entity. TYPE: |
Authors and citation
The eds.negation
component was developed by AP-HP's Data Science team.
Chapman W.W., Bridewell W., Hanbury P., Cooper G.F. and Buchanan B.G., 2001. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. Journal of Biomedical Informatics. 34, pp.301--310. 10.1006/jbin.2001.1029
Dalloux C., Claveau V. and Grabar N., 2017. Détection de la négation : corpus français et apprentissage supervisé. https://hal.archives-ouvertes.fr/hal-01659637
Grabar N., Claveau V. and Dalloux C., 2018. CAS: French Corpus with Clinical Cases. https://hal.archives-ouvertes.fr/hal-01937096