Medical History
The eds.history pipeline uses a simple rule-based algorithm to detect spans that describe medical history rather than the diagnostic of a given visit.
The mere definition of an medical history is not straightforward. Hence, this component only tags entities that are explicitly described as part of the medical history, eg preceded by a synonym of "medical history".
This component may also use the output of:
- the
eds.sectionspipeline. In that case, the entireantécédentsection is tagged as a medical history.
Sections
Be careful, the eds.sections component may oversize the antécédents section. Indeed, it detects section titles
and tags the entire text between a title and the next as a section. Hence, should a section title goes undetected after
the antécédents title, some parts of the document will erroneously be tagged as a medical history.
To curb that possibility, using the output of the eds.sections component is deactivated by default.
- the
eds.datespipeline. In that case, it will take the dates into account to tag extracted entities as a medical history or not.
Dates
To take the most of the eds.dates component, you may add the note_datetime context (cf. Adding context). It allows the pipeline to compute the duration of absolute dates (eg le 28 août 2022/August 28, 2022). The birth_datetime context allows the pipeline to exclude the birth date from the extracted dates.
Usage
The following snippet matches a simple terminology, and checks whether the extracted entities are history or not. It is complete and can be run as is.
import spacy
nlp = spacy.blank("fr")
nlp.add_pipe("eds.sentences")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sections")
nlp.add_pipe("eds.dates")
nlp.add_pipe(
"eds.matcher",
config=dict(terms=dict(douleur="douleur", malaise="malaises")),
)
nlp.add_pipe(
"eds.history",
config=dict(
use_sections=True,
use_dates=True,
),
)
text = (
"Le patient est admis le 23 août 2021 pour une douleur au bras. "
"Il a des antécédents de malaises."
"ANTÉCÉDENTS : "
"- le patient a déjà eu des malaises. "
"- le patient a eu une douleur à la jambe il y a 10 jours"
)
doc = nlp(text)
doc.ents
# Out: (douleur, malaises, malaises, douleur)
doc.ents[0]._.history
# Out: False
doc.ents[1]._.history
# Out: True
doc.ents[2]._.history # (1)
# Out: True
doc.ents[3]._.history # (2)
# Out: False
- The entity is in the section
antécédent. - The entity is in the section
antécédent, however the extractedrelative_daterefers to an event that took place within 14 days.
Configuration
The pipeline can be configured using the following parameters :
| Parameter | Explanation | Default |
|---|---|---|
attr |
spaCy attribute to match on (eg NORM, TEXT, LOWER) |
"NORM" |
history |
History patterns | None (use pre-defined patterns) |
termination |
Termination patterns (for syntagma/proposition extraction) | None (use pre-defined patterns) |
use_sections |
Whether to use pre-annotated sections (requires the sections pipeline) |
False |
use_dates |
Whether to use dates pipeline (requires the dates pipeline and note_datetime context is recommended) |
False |
history_limit |
If use_dates = True. The number of days after which the event is considered as history. |
14 (2 weeks) |
exclude_birthdate |
If use_dates = True. Whether to exclude the birth date from history dates. |
True |
closest_dates_only |
If use_dates = True. Whether to include the closest dates only. If False, it includes all dates in the sentence. |
True |
on_ents_only |
Whether to qualify pre-extracted entities only | True |
explain |
Whether to keep track of the cues for each entity | False |
Declared extensions
The eds.history pipeline declares two spaCy extensions, on both Span and Token objects :
- The
historyattribute is a boolean, set toTrueif the pipeline predicts that the span/token is a medical history. - The
history_property is a human-readable string, computed from thehistoryattribute. It implements a simple getter function that outputsCURRENTorATCD, depending on the value ofhistory.
Authors and citation
The eds.history pipeline was developed by AP-HP's Data Science team.