Hypothesis
The eds.hypothesis
pipeline uses a simple rule-based algorithm to detect spans that are speculations rather than certain statements.
The component looks for five kinds of expressions in the text :
- preceding hypothesis, ie cues that precede a hypothetical expression
- following hypothesis, ie cues that follow a hypothetical expression
- pseudo hypothesis : contain a hypothesis cue, but are not hypothesis (eg "pas de doute"/"no doubt")
- hypothetical verbs : verbs indicating hypothesis (eg "douter")
- classic verbs conjugated to the conditional, thus indicating hypothesis
Examples
The following snippet matches a simple terminology, and checks whether the extracted entities are part of a speculation. It is complete and can be run as is.
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
# Dummy matcher
nlp.add_pipe(
"eds.matcher",
config=dict(terms=dict(douleur="douleur", fracture="fracture")),
)
nlp.add_pipe("eds.hypothesis")
text = (
"Le patient est admis le 23 août 2021 pour une douleur au bras. "
"Possible fracture du radius."
)
doc = nlp(text)
doc.ents
# Out: (douleur, fracture)
doc.ents[0]._.hypothesis
# Out: False
doc.ents[1]._.hypothesis
# Out: True
Extensions
The eds.hypothesis
component declares two extensions, on both Span
and Token
objects :
- The
hypothesis
attribute is a boolean, set toTrue
if the component predicts that the span/token is a speculation. - The
hypothesis_
property is a human-readable string, computed from thehypothesis
attribute. It implements a simple getter function that outputsHYP
orCERT
, depending on the value ofhypothesis
.
Performance
The component's performance is measured on three datasets :
- The ESSAI (Dalloux et al., 2017) and CAS (Grabar et al., 2018) datasets were developed at the CNRS. The two are concatenated.
- The NegParHyp corpus was specifically developed at APHP's CDW to test the component on actual clinical notes, using pseudonymised notes from the APHP's CDW.
Dataset | Hypothesis F1 |
---|---|
CAS/ESSAI | 49% |
NegParHyp | 52% |
NegParHyp corpus
The NegParHyp corpus was built by matching a subset of the MeSH terminology with around 300 documents from AP-HP's clinical data warehouse. Matched entities were then labelled for negation, speculation and family context.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object. TYPE: |
name | The component name. TYPE: |
attr | spaCy's attribute to use TYPE: |
pseudo | List of pseudo hypothesis cues. TYPE: |
preceding | List of preceding hypothesis cues TYPE: |
following | List of following hypothesis cues. TYPE: |
verbs_hyp | List of hypothetical verbs. TYPE: |
verbs_eds | List of mainstream verbs. TYPE: |
termination | List of termination terms. TYPE: |
attr | spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' TYPE: |
span_getter | Which entities should be classified. By default, TYPE: |
on_ents_only | Deprecated, use Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE: |
within_ents | Whether to consider cues within entities. TYPE: |
explain | Whether to keep track of cues for each entity. TYPE: |
Authors and citation
The eds.hypothesis
pipeline was developed by AP-HP's Data Science team.
Dalloux C., Claveau V. and Grabar N., 2017. Détection de la négation : corpus français et apprentissage supervisé.
Grabar N., Claveau V. and Dalloux C., 2018. CAS: French Corpus with Clinical Cases.