Skip to content

Negation

The eds.negation pipeline uses a simple rule-based algorithm to detect negated spans. It was designed at AP-HP's EDS, following the insights of the NegEx algorithm by Chapman et al1.

Usage

The following snippet matches a simple terminology, and checks the polarity of the extracted entities. It is complete and can be run as is.

import spacy

nlp = spacy.blank("fr")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
    "eds.matcher",
    config=dict(terms=dict(patient="patient", fracture="fracture")),
)
nlp.add_pipe("eds.negation")

text = (
    "Le patient est admis le 23 août 2021 pour une douleur au bras. "
    "Le scanner ne détecte aucune fracture."
)

doc = nlp(text)

doc.ents
# Out: (patient, fracture)

doc.ents[0]._.negation  # (1)
# Out: False

doc.ents[1]._.negation
# Out: True
  1. The result of the pipeline is kept in the negation custom extension.

Configuration

The pipeline can be configured using the following parameters :

Parameter Explanation Default
attr spaCy attribute to match on (eg NORM, TEXT, LOWER) "NORM"
pseudo Pseudo-negation patterns None (use pre-defined patterns)
preceding Preceding negation patterns None (use pre-defined patterns)
following Following negation patterns None (use pre-defined patterns)
termination Termination patterns (for syntagma/proposition extraction) None (use pre-defined patterns)
verbs Patterns for verbs that imply a negation None (use pre-defined patterns)
on_ents_only Whether to qualify pre-extracted entities only True
within_ents Whether to look for negations within entities False
explain Whether to keep track of the cues for each entity False

Declared extensions

The eds.negation pipeline declares two spaCy extensions, on both Span and Token objects :

  1. The negation attribute is a boolean, set to True if the pipeline predicts that the span/token is negated.
  2. The negation_ property is a human-readable string, computed from the negation attribute. It implements a simple getter function that outputs AFF or NEG, depending on the value of negation.

Performance

The pipeline's performance is measured on three datasets :

  • The ESSAI2 and CAS3 datasets were developed at the CNRS. The two are concatenated.
  • The NegParHyp corpus was specifically developed at AP-HP to test the pipeline on actual clinical notes, using pseudonymised notes from the AP-HP.
Dataset Negation F1
CAS/ESSAI 71%
NegParHyp 88%

NegParHyp corpus

The NegParHyp corpus was built by matching a subset of the MeSH terminology with around 300 documents from AP-HP's clinical data warehouse. Matched entities were then labelled for negation, speculation and family context.

Authors and citation

The eds.negation pipeline was developed by AP-HP's Data Science team.


  1. Wendy W. Chapman, Will Bridewell, Paul Hanbury, Gregory F. Cooper, and Bruce G. Buchanan. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. Journal of Biomedical Informatics, 34(5):301–310, October 2001. URL: https://linkinghub.elsevier.com/retrieve/pii/S1532046401910299 (visited on 2020-12-31), doi:10.1006/jbin.2001.1029

  2. Clément Dalloux, Vincent Claveau, and Natalia Grabar. Détection de la négation : corpus français et apprentissage supervisé. In SIIM 2017 - Symposium sur l'Ingénierie de l'Information Médicale, 1–8. Toulouse, France, November 2017. URL: https://hal.archives-ouvertes.fr/hal-01659637

  3. Natalia Grabar, Vincent Claveau, and Clément Dalloux. CAS: French Corpus with Clinical Cases. In LOUHI 2018 - The Ninth International Workshop on Health Text Mining and Information Analysis, Ninth International Workshop on Health Text Mining and Information Analysis (LOUHI) Proceedings of the Workshop, 1–7. Bruxelles, France, October 2018. URL: https://hal.archives-ouvertes.fr/hal-01937096