Dates[source]
The eds.dates
matcher detects and normalize dates within a medical document. We use simple regular expressions to extract date mentions.
Scope
The eds.dates
pipeline finds absolute (eg 23/08/2021
) and relative (eg hier
, la semaine dernière
) dates alike. It also handles mentions of duration.
Type | Example |
---|---|
absolute | 3 mai , 03/05/2020 |
relative | hier , la semaine dernière |
duration | pendant quatre jours |
See the tutorial for a presentation of a full pipeline featuring the eds.dates
component.
Usage
import edsnlp, edsnlp.pipes as eds
import datetime
import pytz
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.dates())
text = (
"Le patient est admis le 23 août 2021 pour une douleur à l'estomac. "
"Il lui était arrivé la même chose il y a un an pendant une semaine. "
"Il a été diagnostiqué en mai 1995."
)
doc = nlp(text)
dates = doc.spans["dates"]
dates
# Out: [23 août 2021, il y a un an, mai 1995]
dates[0]._.date.to_datetime()
# Out: 2021-08-23T00:00:00+02:00
dates[1]._.date.to_datetime()
# Out: None
note_datetime = datetime.datetime(2021, 8, 27, tzinfo=pytz.timezone("Europe/Paris"))
doc._.note_datetime = note_datetime
dates[1]._.date.to_datetime()
# Out: 2020-08-27T00:00:00+02:00
date_2_output = dates[2]._.date.to_datetime(
note_datetime=note_datetime,
infer_from_context=True,
tz="Europe/Paris",
default_day=15,
)
date_2_output
# Out: 1995-05-15T00:00:00+02:00
doc.spans["durations"]
# Out: [pendant une semaine]
Example on a collection of documents stored in the OMOP schema :
import edsnlp, edsnlp.pipes as eds
# with cols "note_id", "note_text" and optionally "note_datetime"
my_omop_df = ...
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.dates(as_ents=True))
docs = edsnlp.data.from_pandas(my_omop_df)
docs = docs.map_pipeline(nlp)
docs = docs.to_pandas(
converter="ents",
span_attributes=["date.datetime"],
)
print(docs)
# note_id start end label lexical_variant span_type datetime
# ...
Extensions
The eds.dates
pipeline declares two extensions on the Span
object:
- the
span._.date
attribute of a date contains a parsed version of the date. - the
span._.duration
attribute of a duration contains a parsed version of the duration.
As with other components, you can use the span._.value
attribute to get either the parsed date or the duration depending on the span.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | Name of the pipeline component TYPE: |
absolute | List of regular expressions for absolute dates. TYPE: |
relative | List of regular expressions for relative dates (eg TYPE: |
duration | List of regular expressions for durations (eg TYPE: |
false_positive | List of regular expressions for false positive (eg phone numbers, etc). TYPE: |
span_getter | Where to look for dates in the doc. By default, look in the whole doc. You can combine this with the TYPE: |
merge_mode | How to merge matched dates with the spans from
TYPE: |
on_ents_only | Deprecated, use
TYPE: |
detect_periods | Whether to detect periods (experimental) TYPE: |
detect_time | Whether to detect time inside dates DEFAULT: |
period_proximity_threshold | Max number of words between two dates to extract a period. TYPE: |
as_ents | Deprecated, use span_setter instead. Whether to treat dates as entities TYPE: |
attr | spaCy attribute to use TYPE: |
date_label | Label to use for dates TYPE: |
duration_label | Label to use for durations TYPE: |
period_label | Label to use for periods TYPE: |
span_setter | How to set matches in the doc. TYPE: |
explain | Whether to keep track of regex cues for each entity. TYPE: |
Authors and citation
The eds.dates
pipeline was developed by AP-HP's Data Science team.