Dates
The eds.dates
matcher detects and normalize dates within a medical document. We use simple regular expressions to extract date mentions.
Scope
The eds.dates
pipeline finds absolute (eg 23/08/2021
) and relative (eg hier
, la semaine dernière
) dates alike. It also handles mentions of duration.
Type | Example |
---|---|
absolute | 3 mai , 03/05/2020 |
relative | hier , la semaine dernière |
duration | pendant quatre jours |
See the tutorial for a presentation of a full pipeline featuring the eds.dates
component.
Usage
import edsnlp
import pendulum
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.dates")
text = (
"Le patient est admis le 23 août 2021 pour une douleur à l'estomac. "
"Il lui était arrivé la même chose il y a un an pendant une semaine. "
"Il a été diagnostiqué en mai 1995."
)
doc = nlp(text)
dates = doc.spans["dates"]
dates
# Out: [23 août 2021, il y a un an, mai 1995]
dates[0]._.date.to_datetime()
# Out: 2021-08-23T00:00:00+02:00
dates[1]._.date.to_datetime()
# Out: None
note_datetime = pendulum.datetime(2021, 8, 27, tz="Europe/Paris")
dates[1]._.date.to_datetime(note_datetime=note_datetime)
# Out: 2020-08-27T00:00:00+02:00
date_2_output = dates[2]._.date.to_datetime(
note_datetime=note_datetime,
infer_from_context=True,
tz="Europe/Paris",
default_day=15,
)
date_2_output
# Out: 1995-05-15T00:00:00+02:00
doc.spans["durations"]
# Out: [pendant une semaine]
Extensions
The eds.dates
pipeline declares two extensions on the Span
object:
- the
span._.date
attribute of a date contains a parsed version of the date. - the
span._.duration
attribute of a duration contains a parsed version of the duration.
As with other components, you can use the span._.value
attribute to get either the parsed date or the duration depending on the span.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | Name of the pipeline component TYPE: |
absolute | List of regular expressions for absolute dates. TYPE: |
relative | List of regular expressions for relative dates (eg TYPE: |
duration | List of regular expressions for durations (eg TYPE: |
false_positive | List of regular expressions for false positive (eg phone numbers, etc). TYPE: |
span_getter | Where to look for dates in the doc. By default, look in the whole doc. You can combine this with the TYPE: |
merge_mode | How to merge matched dates with the spans from
TYPE: |
on_ents_only | Deprecated, use
TYPE: |
detect_periods | Whether to detect periods (experimental) TYPE: |
detect_time | Whether to detect time inside dates DEFAULT: |
period_proximity_threshold | Max number of words between two dates to extract a period. TYPE: |
as_ents | Deprecated, use span_setter instead. Whether to treat dates as entities TYPE: |
attr | spaCy attribute to use TYPE: |
date_label | Label to use for dates TYPE: |
duration_label | Label to use for durations TYPE: |
period_label | Label to use for periods TYPE: |
span_setter | How to set matches in the doc. TYPE: |
Authors and citation
The eds.dates
pipeline was developed by AP-HP's Data Science team.