Hypothesis

The eds.hypothesis pipeline uses a simple rule-based algorithm to detect spans that are speculations rather than certain statements.

The component looks for five kinds of expressions in the text :

preceding hypothesis, ie cues that precede a hypothetical expression
following hypothesis, ie cues that follow a hypothetical expression
pseudo hypothesis : contain a hypothesis cue, but are not hypothesis (eg "pas de doute"/"no doubt")
hypothetical verbs : verbs indicating hypothesis (eg "douter")
classic verbs conjugated to the conditional, thus indicating hypothesis

Examples

The following snippet matches a simple terminology, and checks whether the extracted entities are part of a speculation. It is complete and can be run as is.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
# Dummy matcher
nlp.add_pipe(eds.matcher(terms=dict(douleur="douleur", fracture="fracture")))
nlp.add_pipe(eds.hypothesis())

text = (
    "Le patient est admis le 23 août 2021 pour une douleur au bras. "
    "Possible fracture du radius."
)

doc = nlp(text)

doc.ents
# Out: (douleur, fracture)

doc.ents[0]._.hypothesis
# Out: False

doc.ents[1]._.hypothesis
# Out: True

Extensions

The eds.hypothesis component declares two extensions, on both Span and Token objects :

The hypothesis attribute is a boolean, set to True if the component predicts that the span/token is a speculation.
The hypothesis_ property is a human-readable string, computed from the hypothesis attribute. It implements a simple getter function that outputs HYP or CERT, depending on the value of hypothesis.

Performance

The component's performance is measured on three datasets :

The ESSAI (Dalloux et al., 2017) and CAS (Grabar et al., 2018) datasets were developed at the CNRS. The two are concatenated.
The NegParHyp corpus was specifically developed at APHP's CDW to test the component on actual clinical notes, using pseudonymised notes from the APHP's CDW.

Dataset	Hypothesis F1
CAS/ESSAI	49%
NegParHyp	52%

NegParHyp corpus

The NegParHyp corpus was built by matching a subset of the MeSH terminology with around 300 documents from AP-HP's clinical data warehouse. Matched entities were then labelled for negation, speculation and family context.

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object. TYPE: `PipelineProtocol`
`name`	The component name. TYPE: `Optional[str]`
`attr`	spaCy's attribute to use TYPE: `str` DEFAULT: `NORM`
`pseudo`	List of pseudo hypothesis cues. TYPE: `Optional[List[str]]` DEFAULT: `None`
`preceding`	List of preceding hypothesis cues TYPE: `Optional[List[str]]` DEFAULT: `None`
`following`	List of following hypothesis cues. TYPE: `Optional[List[str]]` DEFAULT: `None`
`verbs_hyp`	List of hypothetical verbs. TYPE: `Optional[List[str]]` DEFAULT: `None`
`verbs_eds`	List of mainstream verbs. TYPE: `Optional[List[str]]` DEFAULT: `None`
`termination`	List of termination terms. TYPE: `Optional[List[str]]` DEFAULT: `None`
`attr`	spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' TYPE: `str` DEFAULT: `NORM`
`span_getter`	Which entities should be classified. By default, `doc.ents` TYPE: `SpanGetterArg` DEFAULT: `None`
`on_ents_only`	Deprecated, use `span_getter` instead. Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks. If True, will look in all ents located in `doc.ents` only If an iterable of string is passed, will additionally look in `doc.spans[key]` for each key in the iterable TYPE: `Union[bool, str, List[str], Set[str]]` DEFAULT: `None`
`within_ents`	Whether to consider cues within entities. TYPE: `bool` DEFAULT: `False`
`explain`	Whether to keep track of cues for each entity. TYPE: `bool` DEFAULT: `False`

Authors and citation

The eds.hypothesis pipeline was developed by AP-HP's Data Science team.

Dalloux C., Claveau V. and Grabar N., 2017. Détection de la négation : corpus français et apprentissage supervisé. https://hal.archives-ouvertes.fr/hal-01659637
Grabar N., Claveau V. and Dalloux C., 2018. CAS: French Corpus with Clinical Cases. https://hal.archives-ouvertes.fr/hal-01937096