Skip to content

Medical History

The eds.history pipeline uses a simple rule-based algorithm to detect spans that describe medical history rather than the diagnostic of a given visit.

The mere definition of a medical history is not straightforward. Hence, this component only tags entities that are explicitly described as part of the medical history, e.g., preceded by a synonym of "medical history".

This component may also use the output of:

  • the eds.sections component In that case, the entire antécédent section is tagged as a medical history.

Sections

Be careful, the eds.sections component may oversize the antécédents section. Indeed, it detects section titles and tags the entire text between a title and the next as a section. Hence, should a section title goes undetected after the antécédents title, some parts of the document will erroneously be tagged as a medical history.

To curb that possibility, using the output of the eds.sections component is deactivated by default.

  • the eds.dates component. In that case, it will take the dates into account to tag extracted entities as a medical history or not.

Dates

To take the most of the eds.dates component, you may add the note_datetime context (cf. [Adding context][using-eds-nlps-helper-functions]). It allows the component to compute the duration of absolute dates (e.g., le 28 août 2022/August 28, 2022). The birth_datetime context allows the component to exclude the birthdate from the extracted dates.

Examples

The following snippet matches a simple terminology, and checks whether the extracted entities are history or not. It is complete and can be run as is.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sections())
nlp.add_pipe(eds.dates())
nlp.add_pipe(eds.matcher(terms=dict(douleur="douleur", malaise="malaises")))
nlp.add_pipe(
    eds.history(
        use_sections=True,
        use_dates=True,
    ),
)

text = (
    "Le patient est admis le 23 août 2021 pour une douleur au bras. "
    "Il a des antécédents de malaises."
    "ANTÉCÉDENTS : "
    "- le patient a déjà eu des malaises. "
    "- le patient a eu une douleur à la jambe il y a 10 jours"
)

doc = nlp(text)

doc.ents
# Out: (douleur, malaises, malaises, douleur)

doc.ents[0]._.history
# Out: False

doc.ents[1]._.history
# Out: True

doc.ents[2]._.history  # (1)
# Out: True

doc.ents[3]._.history  # (2)
# Out: False
  1. The entity is in the section antécédent.
  2. The entity is in the section antécédent, however the extracted relative_date refers to an event that took place within 14 days.

Extensions

The eds.history component declares two extensions, on both Span and Token objects :

  1. The history attribute is a boolean, set to True if the component predicts that the span/token is a medical history.
  2. The history_ property is a human-readable string, computed from the history attribute. It implements a simple getter function that outputs CURRENT or ATCD, depending on the value of history.

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object.

TYPE: PipelineProtocol

name

The component name.

TYPE: Optional[str]

history

List of terms indicating medical history reference.

TYPE: Optional[List[str]] DEFAULT: None

termination

List of syntagms termination terms.

TYPE: Optional[List[str]] DEFAULT: None

use_sections

Whether to use section pipeline to detect medical history section.

TYPE: bool DEFAULT: False

use_dates

Whether to use dates pipeline to detect if the event occurs a long time before the document date.

TYPE: bool DEFAULT: False

attr

spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.

TYPE: str DEFAULT: NORM

history_limit

The number of days after which the event is considered as history.

TYPE: Union[int, timedelta] DEFAULT: 14

exclude_birthdate

Whether to exclude the birthdate from history dates.

TYPE: bool DEFAULT: True

closest_dates_only

Whether to include the closest dates only.

TYPE: bool DEFAULT: True

span_getter

Which entities should be classified. By default, doc.ents

TYPE: SpanGetterArg DEFAULT: None

on_ents_only

Deprecated, use span_getter instead.

Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.

  • If True, will look in all ents located in doc.ents only
  • If an iterable of string is passed, will additionally look in doc.spans[key] for each key in the iterable

TYPE: Union[bool, str, List[str], Set[str]] DEFAULT: None

explain

Whether to keep track of cues for each entity.

TYPE: bool DEFAULT: False

tz

The timezone to use. Defaults to "Europe/Paris".

TYPE: Optional[Union[str, tzinfo]] DEFAULT: None

Authors and citation

The eds.history component was developed by AP-HP's Data Science team.