Medical History

The eds.history pipeline uses a simple rule-based algorithm to detect spans that describe medical history rather than the diagnostic of a given visit.

The mere definition of a medical history is not straightforward. Hence, this component only tags entities that are explicitly described as part of the medical history, e.g., preceded by a synonym of "medical history".

This component may also use the output of:

the eds.sections component In that case, the entire antécédent section is tagged as a medical history.

Sections

Be careful, the eds.sections component may oversize the antécédents section. Indeed, it detects section titles and tags the entire text between a title and the next as a section. Hence, should a section title goes undetected after the antécédents title, some parts of the document will erroneously be tagged as a medical history.

To curb that possibility, using the output of the eds.sections component is deactivated by default.

the eds.dates component. In that case, it will take the dates into account to tag extracted entities as a medical history or not.

Dates

To take the most of the eds.dates component, you may add the note_datetime context (cf. [Adding context][using-eds-nlps-helper-functions]). It allows the component to compute the duration of absolute dates (e.g., le 28 août 2022/August 28, 2022). The birth_datetime context allows the component to exclude the birthdate from the extracted dates.

Examples

The following snippet matches a simple terminology, and checks whether the extracted entities are history or not. It is complete and can be run as is.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sections())
nlp.add_pipe(eds.dates())
nlp.add_pipe(eds.matcher(terms=dict(douleur="douleur", malaise="malaises")))
nlp.add_pipe(
    eds.history(
        use_sections=True,
        use_dates=True,
    ),
)

text = (
    "Le patient est admis le 23 août 2021 pour une douleur au bras. "
    "Il a des antécédents de malaises."
    "ANTÉCÉDENTS : "
    "- le patient a déjà eu des malaises. "
    "- le patient a eu une douleur à la jambe il y a 10 jours"
)

doc = nlp(text)

doc.ents
# Out: (douleur, malaises, malaises, douleur)

doc.ents[0]._.history
# Out: False

doc.ents[1]._.history
# Out: True

doc.ents[2]._.history  # (1)
# Out: True

doc.ents[3]._.history  # (2)
# Out: False

The entity is in the section antécédent.
The entity is in the section antécédent, however the extracted relative_date refers to an event that took place within 14 days.

Extensions

The eds.history component declares two extensions, on both Span and Token objects :

The history attribute is a boolean, set to True if the component predicts that the span/token is a medical history.
The history_ property is a human-readable string, computed from the history attribute. It implements a simple getter function that outputs CURRENT or ATCD, depending on the value of history.

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object. TYPE: `PipelineProtocol`
`name`	The component name. TYPE: `Optional[str]`
`history`	List of terms indicating medical history reference. TYPE: `Optional[List[str]]` DEFAULT: `None`
`termination`	List of syntagms termination terms. TYPE: `Optional[List[str]]` DEFAULT: `None`
`use_sections`	Whether to use section pipeline to detect medical history section. TYPE: `bool` DEFAULT: `False`
`use_dates`	Whether to use dates pipeline to detect if the event occurs a long time before the document date. TYPE: `bool` DEFAULT: `False`
`attr`	spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex. TYPE: `str` DEFAULT: `NORM`
`history_limit`	The number of days after which the event is considered as history. TYPE: `Union[int, timedelta]` DEFAULT: `14`
`exclude_birthdate`	Whether to exclude the birthdate from history dates. TYPE: `bool` DEFAULT: `True`
`closest_dates_only`	Whether to include the closest dates only. TYPE: `bool` DEFAULT: `True`
`span_getter`	Which entities should be classified. By default, `doc.ents` TYPE: `SpanGetterArg` DEFAULT: `None`
`on_ents_only`	Deprecated, use `span_getter` instead. Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks. If True, will look in all ents located in `doc.ents` only If an iterable of string is passed, will additionally look in `doc.spans[key]` for each key in the iterable TYPE: `Union[bool, str, List[str], Set[str]]` DEFAULT: `None`
`explain`	Whether to keep track of cues for each entity. TYPE: `bool` DEFAULT: `False`

Authors and citation

The eds.history component was developed by AP-HP's Data Science team.