Skip to content

Visit merging

Merging visits into stays

Presentation of the problem

In order to have a precise view of each patient's course of care, it can be useful to merge together visit occurrences into stays.

A crude way of doing so is by using the preceding_visit_occurrence_id column in the visit_occurrence table. However, this column isn't always filled, and a lot of visits would be missed by using only this method.

The method proposed here relies on how close two visits are in order to put them in the same stay. This is the role of the merge_visits() functions.

The figure below shows how the merging of visits into stays would occurs

Image title

The merge_visits() function

from eds_scikit.io import HiveData
data = HiveData(DBNAME)
from eds_scikit.period.stays import merge_visits

visit_occurrence = merge_visits(visit_occurrence)

Warning

The snippet above should run as is, however the merge_visits() function provides a lot of parameters that you should check in order to use it properly. Those parameters are described below or in the corresponding code reference

Merge "close" visit occurrences to consider them as a single stay by adding a STAY_ID and CONTIGUOUS_STAY_ID columns to the DataFrame.

The value of these columns will be the visit_occurrence_id of the first (meaning the oldest) visit of the stay.

From a temporal point of view, we consider that two visits belong to the same stay if either:

  • They intersect
  • The time difference between the end of the most recent and the start of the oldest is lower than max_timedelta (for STAY_ID) or 0 (for CONTIGUOUS_STAY_ID)

Additionally, other parameters are available to further adjust the merging rules. See below.

PARAMETER DESCRIPTION
vo

The visit_occurrence DataFrame, with at least the following columns: - visit_occurrence_id - person_id - visit_start_datetime_calc (from preprocessing) - visit_end_datetime (from preprocessing) Depending on the input parameters, additional columns may be required: - care_site_id (if merge_different_hospitals == True) - visit_source_value (if merge_different_source_values != False) - row_status_source_value (if remove_deleted_visits= True)

TYPE: DataFrame

remove_deleted_visits

Wether to remove deleted visits from the merging procedure. Deleted visits are extracted via the row_status_source_value column

TYPE: bool DEFAULT: True

long_stay_filtering

Filtering method for long and/or non-closed visits. First of all, visits with no starting date won't be merged with any other visit, and visits with no ending date will have a temporary "theoretical" ending date set by datetime.now(). That being said, some visits are sometimes years long because they weren't closed at time. If other visits occurred during this timespan, they could be all merged into the same stay. To avoid this issue, filtering can be done depending on the long_stay_filtering value:

  • all: All long stays (closed and open) are removed from the merging procedure
  • open: Only long open stays are removed from the merging procedure
  • None: No filtering is done on long visits

Long stays are determined by the long_stay_threshold value.

TYPE: Optional[str] DEFAULT: 'all'

long_stay_threshold

Minimum visit duration value to consider a visit as candidate for "long visits filtering"

TYPE: timedelta DEFAULT: timedelta(days=365)

open_stay_end_datetime

Datetime to use in order to fill the visit_end_datetime of open visits. This is necessary in order to compute stay duration and to filter long stays. If not provided datetime.now() will be used. You might provide the extraction date of your data here.

TYPE: Optional[datetime] DEFAULT: None

max_timedelta

Maximum time difference between the end of a visit and the start of another to consider them as belonging to the same stay. This duration is internally converted in seconds before comparing. Thus, if you want e.g. to merge visits happening in two consecutive days, you should use timedelta(days=2) and NOT timedelta(days=1) in order to take into account extreme cases such as an first visit ending on Monday at 00h01 AM and another one starting at 23h59 PM on Tuesday

TYPE: timedelta DEFAULT: timedelta(days=2)

merge_different_hospitals

Wether to allow visits occurring in different hospitals to be merged into a same stay

TYPE: bool DEFAULT: False

merge_different_source_values

Wether to allow visits with different visit_source_value to be merged into a same stay. Values can be:

  • True: the visit_source_value isn't taken into account for the merging
  • False: only visits with the same visit_source_value can be merged into a same stay
  • List[str]: only visits which visit_source_value is in the provided list can be merged together.

Warning: You should avoid merging visits where visit_source_value == "hospitalisation incomplète", because those stays are often never closed.

TYPE: Union[bool, List[str]] DEFAULT: ['hospitalisés', 'urgence']

RETURNS DESCRIPTION
vo

Visit occurrence DataFrame with additional STAY_ID column

TYPE: DataFrame

Examples:

>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> data = {
    1 : ['A', 999, datetime(2021,1,1), datetime(2021,1,5), 'hospitalisés'],
    2 : ['B', 999, datetime(2021,1,4), datetime(2021,1,8), 'hospitalisés'],
    3 : ['C', 999, datetime(2021,1,12), datetime(2021,1,18), 'hospitalisés'],
    4 : ['D', 999, datetime(2021,1,13), datetime(2021,1,14), 'urgence'],
    5 : ['E', 999, datetime(2021,1,19), datetime(2021,1,21), 'hospitalisés'],
    6 : ['F', 999, datetime(2021,1,25), datetime(2021,1,27), 'hospitalisés'],
    7 : ['G', 999, datetime(2017,1,1), None, "hospitalisés"]
}
>>> vo = pd.DataFrame.from_dict(
    data,
    orient="index",
    columns=[
        "visit_occurrence_id",
        "person_id",
        "visit_start_datetime",
        "visit_end_datetime",
        "visit_source_value",
    ],
)
>>> vo
  visit_occurrence_id  person_id visit_start_datetime visit_end_datetime visit_source_value
1                   A        999           2021-01-01         2021-01-05       hospitalisés
2                   B        999           2021-01-04         2021-01-08       hospitalisés
3                   C        999           2021-01-12         2021-01-18       hospitalisés
4                   D        999           2021-01-13         2021-01-14            urgence
5                   E        999           2021-01-19         2021-01-21       hospitalisés
6                   F        999           2021-01-25         2021-01-27       hospitalisés
7                   G        999           2017-01-01                NaT       hospitalisés
>>> vo = merge_visits(
        vo,
        remove_deleted_visits=True,
        long_stay_threshold=timedelta(days=365),
        long_stay_filtering="all",
        max_timedelta=timedelta(hours=24),
        merge_different_hospitals=False,
        merge_different_source_values=["hospitalisés", "urgence"],
)
>>> vo
  visit_occurrence_id  person_id visit_start_datetime visit_end_datetime visit_source_value STAY_ID CONTIGUOUS_STAY_ID
1                   A        999           2021-01-01         2021-01-05       hospitalisés       A                  A
2                   B        999           2021-01-04         2021-01-08       hospitalisés       A                  A
3                   C        999           2021-01-12         2021-01-18       hospitalisés       C                  C
4                   D        999           2021-01-13         2021-01-14            urgence       C                  C
5                   E        999           2021-01-19         2021-01-21       hospitalisés       C                  E
6                   F        999           2021-01-25         2021-01-27       hospitalisés       F                  F
7                   G        999           2017-01-01                NaT       hospitalisés       G                  G

Computing stay duration

Presentation of the problem

Once that visits are grouped into stays, you might want to compute stays duration.

The get_stays_duration() function

from eds_scikit.period.stays import get_stays_duration

This function should be used once you called the merge_visits() functions. It adds a STAY_DURATION column.

vo = get_stays_duration(
    vo,
    algo="visits_date_difference",
    missing_end_date_handling="fill",
)

There are actually two ways to compute those stays durations. Pick the "algo" value that suits your needs.

Availables algorithms (values for "algo")

The stay duration corresponds to the difference between the end datetime of the stay's last visit and the start datetime of the stay's first visit.

The stay duration corresponds to the sum of the duration of all visits of the stay (and by handling overlapping)

Please check the documentation for additional parameters.