Medical History
The eds.history
pipeline uses a simple rule-based algorithm to detect spans that describe medical history rather than the diagnostic of a given visit.
The mere definition of a medical history is not straightforward. Hence, this component only tags entities that are explicitly described as part of the medical history, e.g., preceded by a synonym of "medical history".
This component may also use the output of:
- the
eds.sections
component In that case, the entireantécédent
section is tagged as a medical history.
Sections
Be careful, the eds.sections
component may oversize the antécédents
section. Indeed, it detects section titles and tags the entire text between a title and the next as a section. Hence, should a section title goes undetected after the antécédents
title, some parts of the document will erroneously be tagged as a medical history.
To curb that possibility, using the output of the eds.sections
component is deactivated by default.
- the
eds.dates
component. In that case, it will take the dates into account to tag extracted entities as a medical history or not.
Dates
To take the most of the eds.dates
component, you may add the note_datetime
context (cf. [Adding context][using-eds-nlps-helper-functions]). It allows the component to compute the duration of absolute dates (e.g., le 28 août 2022/August 28, 2022). The birth_datetime
context allows the component to exclude the birthdate from the extracted dates.
Examples
The following snippet matches a simple terminology, and checks whether the extracted entities are history or not. It is complete and can be run as is.
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sections())
nlp.add_pipe(eds.dates())
nlp.add_pipe(eds.matcher(terms=dict(douleur="douleur", malaise="malaises")))
nlp.add_pipe(
eds.history(
use_sections=True,
use_dates=True,
),
)
text = (
"Le patient est admis le 23 août 2021 pour une douleur au bras. "
"Il a des antécédents de malaises."
"ANTÉCÉDENTS : "
"- le patient a déjà eu des malaises. "
"- le patient a eu une douleur à la jambe il y a 10 jours"
)
doc = nlp(text)
doc.ents
# Out: (douleur, malaises, malaises, douleur)
doc.ents[0]._.history
# Out: False
doc.ents[1]._.history
# Out: True
doc.ents[2]._.history # (1)
# Out: True
doc.ents[3]._.history # (2)
# Out: False
- The entity is in the section
antécédent
. - The entity is in the section
antécédent
, however the extractedrelative_date
refers to an event that took place within 14 days.
Extensions
The eds.history
component declares two extensions, on both Span
and Token
objects :
- The
history
attribute is a boolean, set toTrue
if the component predicts that the span/token is a medical history. - The
history_
property is a human-readable string, computed from thehistory
attribute. It implements a simple getter function that outputsCURRENT
orATCD
, depending on the value ofhistory
.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object. TYPE: |
name | The component name. TYPE: |
history | List of terms indicating medical history reference. TYPE: |
termination | List of syntagms termination terms. TYPE: |
use_sections | Whether to use section pipeline to detect medical history section. TYPE: |
use_dates | Whether to use dates pipeline to detect if the event occurs a long time before the document date. TYPE: |
attr | spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex. TYPE: |
history_limit | The number of days after which the event is considered as history. TYPE: |
exclude_birthdate | Whether to exclude the birthdate from history dates. TYPE: |
closest_dates_only | Whether to include the closest dates only. TYPE: |
span_getter | Which entities should be classified. By default, TYPE: |
on_ents_only | Deprecated, use Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE: |
explain | Whether to keep track of cues for each entity. TYPE: |
tz | The timezone to use. Defaults to "Europe/Paris". TYPE: |
Authors and citation
The eds.history
component was developed by AP-HP's Data Science team.