Skip to content

Dates

The eds.dates pipeline's role is to detect and normalise dates within a medical document. We use simple regular expressions to extract date mentions.

Scope

The eds.dates pipeline finds absolute (eg 23/08/2021) and relative (eg hier, la semaine dernière) dates alike. It also handles mentions of duration.

Type Example
absolute 3 mai, 03/05/2020
relative hier, la semaine dernière
duration pendant quatre jours

See the tutorial for a presentation of a full pipeline featuring the eds.dates component.

Usage

import spacy

import pendulum

nlp = spacy.blank("fr")
nlp.add_pipe("eds.dates")

text = (
    "Le patient est admis le 23 août 2021 pour une douleur à l'estomac. "
    "Il lui était arrivé la même chose il y a un an pendant une semaine. "
    "Il a été diagnostiqué en mai 1995."
)

doc = nlp(text)

dates = doc.spans["dates"]
dates
# Out: [23 août 2021, il y a un an, pendant une semaine, mai 1995]

dates[0]._.date.to_datetime()
# Out: 2021-08-23T00:00:00+02:00

dates[1]._.date.to_datetime()
# Out: -1 year

note_datetime = pendulum.datetime(2021, 8, 27, tz="Europe/Paris")

dates[1]._.date.to_datetime(note_datetime=note_datetime)
# Out: DateTime(2020, 8, 27, 0, 0, 0, tzinfo=Timezone('Europe/Paris'))

date_3_output = dates[3]._.date.to_datetime(
    note_datetime=note_datetime,
    infer_from_context=True,
    tz="Europe/Paris",
    default_day=15,
)
date_3_output
# Out: DateTime(1995, 5, 15, 0, 0, 0, tzinfo=Timezone('Europe/Paris'))

Declared extensions

The eds.dates pipeline declares one spaCy extension on the Span object: the date attribute contains a parsed version of the date.

Configuration

The pipeline can be configured using the following parameters :

Parameter Explanation Default
absolute Absolute date patterns, eg le 5 août 2020 None (use pre-defined patterns)
relative Relative date patterns, eg hier) None (use pre-defined patterns)
durations Duration patterns, eg pendant trois mois) None (use pre-defined patterns)
false_positive Some false positive patterns to exclude None (use pre-defined patterns)
detect_periods Whether to look for periods False
detect_time Whether to look for time around dates True
on_ents_only Whether to look for dates around entities only False
as_ents Whether to save detected dates as entities False
attr spaCy attribute to match on, eg NORM or TEXT "NORM"

Authors and citation

The eds.dates pipeline was developed by AP-HP's Data Science team.