Visit merging
Merging visits into stays
Presentation of the problem
In order to have a precise view of each patient's course of care, it can be useful to merge together visit occurrences into stays.
A crude way of doing so is by using the preceding_visit_occurrence_id
column in the visit_occurrence
table. However, this column isn't always filled, and a lot of visits would be missed by using only this method.
The method proposed here relies on how close two visits are in order to put them in the same stay. This is the role of the merge_visits()
functions.
The figure below shows how the merging of visits into stays would occurs
The merge_visits()
function
from eds_scikit.io import HiveData
data = HiveData(DBNAME)
from eds_scikit.period.stays import merge_visits
visit_occurrence = merge_visits(visit_occurrence)
Warning
The snippet above should run as is, however the merge_visits()
function provides a lot of parameters that you should check in order to use it properly. Those parameters are described below or in the corresponding code reference
Merge "close" visit occurrences to consider them as a single stay
by adding a STAY_ID
and CONTIGUOUS_STAY_ID
columns to the DataFrame.
The value of these columns will be the visit_occurrence_id
of the first (meaning the oldest)
visit of the stay.
From a temporal point of view, we consider that two visits belong to the same stay if either:
- They intersect
- The time difference between the end of the most recent and the start of the oldest
is lower than
max_timedelta
(forSTAY_ID
) or 0 (forCONTIGUOUS_STAY_ID
)
Additionally, other parameters are available to further adjust the merging rules. See below.
PARAMETER | DESCRIPTION |
---|---|
vo |
The
TYPE:
|
remove_deleted_visits |
Wether to remove deleted visits from the merging procedure.
Deleted visits are extracted via the
TYPE:
|
long_stay_filtering |
Filtering method for long and/or non-closed visits. First of all, visits with no starting date
won't be merged with any other visit, and visits with no ending date will have a temporary
"theoretical" ending date set by
Long stays are determined by the
TYPE:
|
long_stay_threshold |
Minimum visit duration value to consider a visit as candidate for "long visits filtering"
TYPE:
|
open_stay_end_datetime |
Datetime to use in order to fill the
TYPE:
|
max_timedelta |
Maximum time difference between the end of a visit and the start of another to consider
them as belonging to the same stay. This duration is internally converted in seconds before
comparing. Thus, if you want e.g. to merge visits happening in two consecutive days, you should use
TYPE:
|
merge_different_hospitals |
Wether to allow visits occurring in different hospitals to be merged into a same stay
TYPE:
|
merge_different_source_values |
Wether to allow visits with different
Warning: You should avoid merging visits where
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
vo
|
Visit occurrence DataFrame with additional
TYPE:
|
Examples:
>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> data = {
1 : ['A', 999, datetime(2021,1,1), datetime(2021,1,5), 'hospitalisés'],
2 : ['B', 999, datetime(2021,1,4), datetime(2021,1,8), 'hospitalisés'],
3 : ['C', 999, datetime(2021,1,12), datetime(2021,1,18), 'hospitalisés'],
4 : ['D', 999, datetime(2021,1,13), datetime(2021,1,14), 'urgence'],
5 : ['E', 999, datetime(2021,1,19), datetime(2021,1,21), 'hospitalisés'],
6 : ['F', 999, datetime(2021,1,25), datetime(2021,1,27), 'hospitalisés'],
7 : ['G', 999, datetime(2017,1,1), None, "hospitalisés"]
}
>>> vo = pd.DataFrame.from_dict(
data,
orient="index",
columns=[
"visit_occurrence_id",
"person_id",
"visit_start_datetime",
"visit_end_datetime",
"visit_source_value",
],
)
>>> vo
visit_occurrence_id person_id visit_start_datetime visit_end_datetime visit_source_value
1 A 999 2021-01-01 2021-01-05 hospitalisés
2 B 999 2021-01-04 2021-01-08 hospitalisés
3 C 999 2021-01-12 2021-01-18 hospitalisés
4 D 999 2021-01-13 2021-01-14 urgence
5 E 999 2021-01-19 2021-01-21 hospitalisés
6 F 999 2021-01-25 2021-01-27 hospitalisés
7 G 999 2017-01-01 NaT hospitalisés
>>> vo = merge_visits(
vo,
remove_deleted_visits=True,
long_stay_threshold=timedelta(days=365),
long_stay_filtering="all",
max_timedelta=timedelta(hours=24),
merge_different_hospitals=False,
merge_different_source_values=["hospitalisés", "urgence"],
)
>>> vo
visit_occurrence_id person_id visit_start_datetime visit_end_datetime visit_source_value STAY_ID CONTIGUOUS_STAY_ID
1 A 999 2021-01-01 2021-01-05 hospitalisés A A
2 B 999 2021-01-04 2021-01-08 hospitalisés A A
3 C 999 2021-01-12 2021-01-18 hospitalisés C C
4 D 999 2021-01-13 2021-01-14 urgence C C
5 E 999 2021-01-19 2021-01-21 hospitalisés C E
6 F 999 2021-01-25 2021-01-27 hospitalisés F F
7 G 999 2017-01-01 NaT hospitalisés G G
Computing stay duration
Presentation of the problem
Once that visits are grouped into stays, you might want to compute stays duration.
The get_stays_duration()
function
from eds_scikit.period.stays import get_stays_duration
This function should be used once you called the merge_visits()
functions. It adds a STAY_DURATION
column.
vo = get_stays_duration(
vo,
algo="visits_date_difference",
missing_end_date_handling="fill",
)
There are actually two ways to compute those stays durations. Pick the "algo"
value that suits your needs.
Availables algorithms (values for "algo"
)
The stay duration corresponds to the difference between the end datetime of the stay's last visit and the start datetime of the stay's first visit.
The stay duration corresponds to the sum of the duration of all visits of the stay (and by handling overlapping)
Please check the documentation for additional parameters.