Visit merging

Merging visits into stays

Presentation of the problem

In order to have a precise view of each patient's course of care, it can be useful to merge together visit occurrences into stays.

A crude way of doing so is by using the preceding_visit_occurrence_id column in the visit_occurrence table. However, this column isn't always filled, and a lot of visits would be missed by using only this method.

The method proposed here relies on how close two visits are in order to put them in the same stay. This is the role of the merge_visits() functions.

The figure below shows how the merging of visits into stays would occurs

The `merge_visits()` function

from eds_scikit.io import HiveData
data = HiveData(DBNAME)

from eds_scikit.period.stays import merge_visits

visit_occurrence = merge_visits(visit_occurrence)

Warning

The snippet above should run as is, however the merge_visits() function provides a lot of parameters that you should check in order to use it properly. Those parameters are described below or in the corresponding code reference

Merge "close" visit occurrences to consider them as a single stay by adding a STAY_ID and CONTIGUOUS_STAY_ID columns to the DataFrame.

The value of these columns will be the visit_occurrence_id of the first (meaning the oldest) visit of the stay.

From a temporal point of view, we consider that two visits belong to the same stay if either:

They intersect
The time difference between the end of the most recent and the start of the oldest is lower than max_timedelta (for STAY_ID) or 0 (for CONTIGUOUS_STAY_ID)

Additionally, other parameters are available to further adjust the merging rules. See below.

PARAMETER	DESCRIPTION
`vo`	The `visit_occurrence` DataFrame, with at least the following columns: - visit_occurrence_id - person_id - visit_start_datetime_calc (from preprocessing) - visit_end_datetime (from preprocessing) Depending on the input parameters, additional columns may be required: - care_site_id (if `merge_different_hospitals == True`) - visit_source_value (if `merge_different_source_values != False`) - row_status_source_value (if `remove_deleted_visits= True`) TYPE: `DataFrame`
`remove_deleted_visits`	Wether to remove deleted visits from the merging procedure. Deleted visits are extracted via the `row_status_source_value` column TYPE: `bool` DEFAULT: `True`
`long_stay_filtering`	Filtering method for long and/or non-closed visits. First of all, visits with no starting date won't be merged with any other visit, and visits with no ending date will have a temporary "theoretical" ending date set by `datetime.now()`. That being said, some visits are sometimes years long because they weren't closed at time. If other visits occurred during this timespan, they could be all merged into the same stay. To avoid this issue, filtering can be done depending on the `long_stay_filtering` value: `all`: All long stays (closed and open) are removed from the merging procedure `open`: Only long open stays are removed from the merging procedure `None`: No filtering is done on long visits Long stays are determined by the `long_stay_threshold` value. TYPE: `Optional[str]` DEFAULT: `'all'`
`long_stay_threshold`	Minimum visit duration value to consider a visit as candidate for "long visits filtering" TYPE: `timedelta` DEFAULT: `timedelta(days=365)`
`open_stay_end_datetime`	Datetime to use in order to fill the `visit_end_datetime` of open visits. This is necessary in order to compute stay duration and to filter long stays. If not provided `datetime.now()` will be used. You might provide the extraction date of your data here. TYPE: `Optional[datetime]` DEFAULT: `None`
`max_timedelta`	Maximum time difference between the end of a visit and the start of another to consider them as belonging to the same stay. This duration is internally converted in seconds before comparing. Thus, if you want e.g. to merge visits happening in two consecutive days, you should use `timedelta(days=2)` and NOT `timedelta(days=1)` in order to take into account extreme cases such as an first visit ending on Monday at 00h01 AM and another one starting at 23h59 PM on Tuesday TYPE: `timedelta` DEFAULT: `timedelta(days=2)`
`merge_different_hospitals`	Wether to allow visits occurring in different hospitals to be merged into a same stay TYPE: `bool` DEFAULT: `False`
`merge_different_source_values`	Wether to allow visits with different `visit_source_value` to be merged into a same stay. Values can be: `True`: the `visit_source_value` isn't taken into account for the merging `False`: only visits with the same `visit_source_value` can be merged into a same stay `List[str]`: only visits which `visit_source_value` is in the provided list can be merged together. Warning: You should avoid merging visits where `visit_source_value == "hospitalisation incomplète"`, because those stays are often never closed. TYPE: `Union[bool, List[str]]` DEFAULT: `['hospitalisés', 'urgence']`

RETURNS	DESCRIPTION
`vo`	Visit occurrence DataFrame with additional `STAY_ID` column TYPE: `DataFrame`

Examples:

>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> data = {
    1 : ['A', 999, datetime(2021,1,1), datetime(2021,1,5), 'hospitalisés'],
    2 : ['B', 999, datetime(2021,1,4), datetime(2021,1,8), 'hospitalisés'],
    3 : ['C', 999, datetime(2021,1,12), datetime(2021,1,18), 'hospitalisés'],
    4 : ['D', 999, datetime(2021,1,13), datetime(2021,1,14), 'urgence'],
    5 : ['E', 999, datetime(2021,1,19), datetime(2021,1,21), 'hospitalisés'],
    6 : ['F', 999, datetime(2021,1,25), datetime(2021,1,27), 'hospitalisés'],
    7 : ['G', 999, datetime(2017,1,1), None, "hospitalisés"]
}
>>> vo = pd.DataFrame.from_dict(
    data,
    orient="index",
    columns=[
        "visit_occurrence_id",
        "person_id",
        "visit_start_datetime",
        "visit_end_datetime",
        "visit_source_value",
    ],
)
>>> vo
  visit_occurrence_id  person_id visit_start_datetime visit_end_datetime visit_source_value
1                   A        999           2021-01-01         2021-01-05       hospitalisés
2                   B        999           2021-01-04         2021-01-08       hospitalisés
3                   C        999           2021-01-12         2021-01-18       hospitalisés
4                   D        999           2021-01-13         2021-01-14            urgence
5                   E        999           2021-01-19         2021-01-21       hospitalisés
6                   F        999           2021-01-25         2021-01-27       hospitalisés
7                   G        999           2017-01-01                NaT       hospitalisés

>>> vo = merge_visits(
        vo,
        remove_deleted_visits=True,
        long_stay_threshold=timedelta(days=365),
        long_stay_filtering="all",
        max_timedelta=timedelta(hours=24),
        merge_different_hospitals=False,
        merge_different_source_values=["hospitalisés", "urgence"],
)
>>> vo
  visit_occurrence_id  person_id visit_start_datetime visit_end_datetime visit_source_value STAY_ID CONTIGUOUS_STAY_ID
1                   A        999           2021-01-01         2021-01-05       hospitalisés       A                  A
2                   B        999           2021-01-04         2021-01-08       hospitalisés       A                  A
3                   C        999           2021-01-12         2021-01-18       hospitalisés       C                  C
4                   D        999           2021-01-13         2021-01-14            urgence       C                  C
5                   E        999           2021-01-19         2021-01-21       hospitalisés       C                  E
6                   F        999           2021-01-25         2021-01-27       hospitalisés       F                  F
7                   G        999           2017-01-01                NaT       hospitalisés       G                  G

Computing stay duration

Presentation of the problem

Once that visits are grouped into stays, you might want to compute stays duration.

The `get_stays_duration()` function

from eds_scikit.period.stays import get_stays_duration

This function should be used once you called the merge_visits() functions. It adds a STAY_DURATION column.

vo = get_stays_duration(
    vo,
    algo="visits_date_difference",
    missing_end_date_handling="fill",
)

There are actually two ways to compute those stays durations. Pick the "algo" value that suits your needs.

Availables algorithms (values for "algo")

'visits_date_difference''sum_of_visits_duration'

The stay duration corresponds to the difference between the end datetime of the stay's last visit and the start datetime of the stay's first visit.

The stay duration corresponds to the sum of the duration of all visits of the stay (and by handling overlapping)

Please check the documentation for additional parameters.

Visit merging

Merging visits into stays

Presentation of the problem

The merge_visits() function

Computing stay duration

Presentation of the problem

The get_stays_duration() function

The `merge_visits()` function

The `get_stays_duration()` function