Skip to content

Negation[source]

The eds.negation component uses a simple rule-based algorithm to detect negated spans. It was designed at AP-HP's EDS, following the insights of the NegEx algorithm by Chapman et al., 2001.

The component looks for five kinds of expressions in the text :

  • preceding negations, i.e., cues that precede a negated expression
  • following negations, i.e., cues that follow a negated expression
  • pseudo negations : contain a negation cue, but are not negations (eg "pas de doute"/"no doubt")
  • negation verbs, i.e., verbs that indicate a negation
  • terminations, i.e., words that delimit propositions. The negation spans from the preceding cue to the termination.

Examples

The following snippet matches a simple terminology, and checks the polarity of the extracted entities. It is complete and can be run as is.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
# Dummy matcher
nlp.add_pipe(eds.matcher(terms=dict(patient="patient", fracture="fracture")))
nlp.add_pipe(eds.negation())

text = (
    "Le patient est admis le 23 août 2021 pour une douleur au bras. "
    "Le scanner ne détecte aucune fracture."
)

doc = nlp(text)

doc.ents
# Out: (patient, fracture)

doc.ents[0]._.negation  # (1)
# Out: False

doc.ents[1]._.negation
# Out: True
  1. The result of the component is kept in the negation custom extension.

Extensions

The eds.negation component declares two extensions, on both Span and Token objects :

  1. The negation attribute is a boolean, set to True if the component predicts that the span/token is negated.
  2. The negation_ property is a human-readable string, computed from the negation attribute. It implements a simple getter function that outputs AFF or NEG, depending on the value of negation.

Performance

The component's performance is measured on three datasets :

  • The ESSAI (Dalloux et al., 2017) and CAS (Grabar et al., 2018) datasets were developed at the CNRS. The two are concatenated.
  • The NegParHyp corpus was specifically developed at AP-HP to test the component on actual clinical notes, using pseudonymised notes from the AP-HP.
Dataset Negation F1
CAS/ESSAI 71%
NegParHyp 88%

NegParHyp corpus

The NegParHyp corpus was built by matching a subset of the MeSH terminology with around 300 documents from AP-HP's clinical data warehouse. Matched entities were then labelled for negation, speculation and family context.

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object.

TYPE: PipelineProtocol

name

The component name.

TYPE: Optional[str]

attr

spaCy's attribute to use

TYPE: str DEFAULT: NORM

pseudo

List of pseudo negation cues.

TYPE: Optional[List[str]] DEFAULT: None

preceding

List of preceding negation cues

TYPE: Optional[List[str]] DEFAULT: None

preceding_regex

List of preceding negation cues, but as regexes.

TYPE: Optional[List[str]] DEFAULT: None

following

List of following negation cues.

TYPE: Optional[List[str]] DEFAULT: None

verbs

List of negation verbs.

TYPE: Optional[List[str]] DEFAULT: None

termination

List of termination terms.

TYPE: Optional[List[str]] DEFAULT: None

span_getter

Which entities should be classified. By default, doc.ents

TYPE: SpanGetterArg DEFAULT: None

on_ents_only

Deprecated, use span_getter instead.

Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.

  • If True, will look in all ents located in doc.ents only
  • If an iterable of string is passed, will additionally look in doc.spans[key] for each key in the iterable

TYPE: Union[bool, str, List[str], Set[str]] DEFAULT: None

within_ents

Whether to consider cues within entities.

TYPE: bool DEFAULT: False

explain

Whether to keep track of cues for each entity.

TYPE: bool DEFAULT: False

Authors and citation

The eds.negation component was developed by AP-HP's Data Science team.


  1. Chapman W.W., Bridewell W., Hanbury P., Cooper G.F. and Buchanan B.G., 2001. A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. Journal of Biomedical Informatics. 34, pp.301--310. 10.1006/jbin.2001.1029

  2. Dalloux C., Claveau V. and Grabar N., 2017. Détection de la négation : corpus français et apprentissage supervisé. https://hal.archives-ouvertes.fr/hal-01659637

  3. Grabar N., Claveau V. and Dalloux C., 2018. CAS: French Corpus with Clinical Cases. https://hal.archives-ouvertes.fr/hal-01937096