Skip to content

EDS-TeVa

EDS-TeVa Documentation

Documentation PyPI Codecov Black Poetry Supported Python versions Ruff


Documentation: https://aphp.github.io/edsteva/latest/

Source Code: https://github.com/aphp/edsteva


Getting Started

EDS-TeVa provides a set of tools to characterize the temporal variability of data induced by the dynamics of the clinical IT system.

Context

Real world data is subject to important temporal drifts that may be caused by a variety of factors1. In particular, data availability fluctuates with the deployment of clinical softwares and their clinical use. The dynamics of software deployment and adoption is not trivial as it depends on the care site and on the category of data that are considered.

Installation

Requirements

EDS-TeVa stands on the shoulders of Spark 2.4 which runs on Java 8 and Python ~3.7.1, it is essential to:

You can install EDS-TeVa through pip:

fast →pip install edstevaSuccessfully installed edsteva

restart ↻

We recommend pinning the library version in your projects, or use a strict package manager like Poetry.

pip install edsteva==0.2.8

Working example: administrative records relative to visits

Let's consider a basic category of data: administrative records relative to visits. A visit is characterized by a care site, a length of stay, a stay type (full hospitalisation, emergency, consultation, etc.) and other characteristics. In this example, the objective is to estimate the availability of visits records with respect to time, care site and stay type.

1. Load your data

As detailled in the dedicated section, EDS-TeVa is expecting to work with Pandas or Koalas DataFrames. We provide various connectors to facilitate data fetching, namely a Hive connector, a Postgres connector and a LocalData.

from edsteva.io import HiveData

db_name = "my_db"
tables_to_load = [
    "visit_occurrence",
    "visit_detail",
    "care_site",
    "fact_relationship",
]
data = HiveData(db_name, tables_to_load=tables_to_load)
data.visit_occurrence  # 
from edsteva.io import PostgresData

db_name = "my_db"
schema = "my_schema"
user = "my_username"
data = PostgresData(db_name, schema=schema, user=user)  # 
data.visit_occurrence  # 
import os
from edsteva.io import LocalData

folder = os.path.abspath(MY_FOLDER_PATH)

data = LocalData(folder)  # 
data.visit_occurrence  # 

2. Choose a Probe or create a new Probe

Probe

A Probe is a python class designed to compute a completeness predictor c(t) that characterizes data availability of a target variable over time t.

In this example, c(t) predicts the availability of administrative records relative to visits. It is defined for each characteristic (care site, stay type, age range, length of stay, etc.) as the number of visits nvisit(t) per month t, normalized by the maximum number of records per month nmax=maxt(nvisit(t)) computed over the entire study period:

c(t)=nvisit(t)nmax

If the maximum number of records per month nmax is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

The VisitProbe is already available by default in the library:

2.1 Compute your Probe

The compute() method takes a Data object as input and stores the computed completeness predictor c(t) in the predictor attribute of a Probe:

from edsteva.probes import VisitProbe

probe_path = "my_path/visit.pkl"

visit = VisitProbe()
visit.compute(
    data,
    care_site_levels=["Hospital", "Pole", "UF"],  # 
    stay_types={
        "All": ".*",
        "Urg_Hospit": "urgence|hospitalisés",  # 
    },
    care_site_specialties=None,  # 
    stay_sources=None,  # 
    length_of_stays=None,  # 
    provenance_sources=None,  # 
    age_ranges=None,  # 
)
visit.save(path=probe_path)  # 
visit.predictor.head()

Saved to /my_path/visit.pkl

care_site_level care_site_id care_site_short_name stay_type date n_visit c
Unité Fonctionnelle (UF) 8312056386 Care site 1 'Urg_Hospit' 2019-05-01 233.0 0.841
Unité Fonctionnelle (UF) 8312056386 Care site 1 'All' 2021-04-01 393.0 0.640
Pôle/DMU 8312027648 Care site 2 'Urg_Hospit' 2017-03-01 204.0 0.497
Pôle/DMU 8312027648 Care site 2 'All' 2018-08-01 22.0 0.274
Hôpital 8312022130 Care site 3 'Urg_Hospit' 2022-02-01 9746.0 0.769

2.2 Filter your Probe

In this example, we are interested in three hospitals. We consequently filter data before any further analysis.

from edsteva.probes import VisitProbe

care_site_short_name = ["Hôpital-1", "Hôpital-2", "Hôpital-3"]

filtered_visit = VisitProbe()
filtered_visit.load(path=probe_path)
filtered_visit.filter_care_site(care_site_short_names=care_site_short_name)  # 

2.3 Visualize your Probe

Interactive dashboard

Interactive dashboards can be used to visualize the average completeness predictor c(t) of the selected care sites and stay types.

from edsteva.viz.dashboards import probe_dashboard

probe_dashboard(
    probe=filtered_visit,
)
Interactive dashboard is available here

Static plot

If you need a static plot for a report, a paper or anything else, you can use the probe_plot() function. It returns the top plot of the dashboard without the interactive filters. Consequently, you have to specify the filters in the inputs of the function.

from edsteva.viz.plots import probe_plot

plot_path = "my_path/visit.html"
stay_type = "All"

probe_plot(
    probe=filtered_visit,
    care_site_level="Hospital",
    stay_type=stay_type,
    save_path=plot_path,  # 
)

{ "schema-url": "assets/charts/visit.json" }

3. Choose a Model or create a new Model

Model

A Model is a python class designed to fit a function fΘ(t) to each completeness predictor c(t) of a Probe. The fit process estimates the coefficients Θ with metrics to characterize the temporal variability of data availability.

In this example, the model fits a step function ft0,c0(t) to the completeness predictor c(t) with coefficients Θ=(t0,c0):

ft0,c0(t)=c0 1tt0(t)
  • the characteristic time t0 estimates the time after which the data is available.
  • the characteristic value c0 estimates the stabilized routine completeness.

It also computes the following error metric that estimates the stability of the data after t0:

error=t0ttmaxϵ(t)2tmaxt0ϵ(t)=ft0,c0(t)c(t)

This step function Model is available in the library.

3.1 Fit your Model

The fit method takes a Probe as input, it estimates the coefficients, for example by minimizing a quadratic loss function and computes the metrics. Finally, it stores the estimated coefficients and the computed metrics in the estimates attribute of the Model.

from edsteva.models.step_function import StepFunction

model_path = "my_path/fitted_visit.pkl"

step_function_model = StepFunction()
step_function_model.fit(probe=filtered_visit)
step_function_model.save(model_path)  # 
step_function_model.estimates.head()

Saved to /my_path/fitted_visit.pkl

care_site_level care_site_id stay_type t_0 c_0 error
Pôle/DMU 8312056386 'Urg_Hospit' 2019-05-01 0.397 0.040
Pôle/DMU 8312056386 'All' 2017-04-01 0.583 0.028
Pôle/DMU 8312027648 'Urg_Hospit' 2021-03-01 0.677 0.022
Pôle/DMU 8312027648 'All' 2018-08-01 0.764 0.014
Pôle/DMU 8312022130 'Urg_Hospit' 2022-02-01 0.652 0.027

3.2 Visualize your fitted Probe

Interactive dashboard

Interactive dashboards can be used to visualize the average completeness predictor c(t) along with the fitted step function of the selected care sites and stay types.

from edsteva.viz.dashboards import probe_dashboard

probe_dashboard(
    probe=filtered_visit,
    fitted_model=step_function_model,
)
Interactive dashboard is available here.

Static plot

If you need a static plot for a report, a paper or anything else, you can use the probe_plot() function. It returns the top plot of the dashboard without the interactive filters. Consequently, you have to specify the filters in the inputs of the function.

from edsteva.viz.plots import probe_plot

plot_path = "my_path/fitted_visit.html"
stay_type = "All"

probe_plot(
    probe=filtered_visit,
    fitted_model=step_function_model,
    care_site_level="Hospital",
    stay_type=stay_type,
    save_path=plot_path,  # (1)
)
1. If a save_path is specified, it'll save your plot in the specified path.

{ "schema-url": "assets/charts/fitted_visit.json" }

4. Set the thresholds to fix the deployment bias

Now, that we have estimated t0, c0 and error for each care site and each stay type, one can set a threshold for each estimate in order to select only the care sites where the visits are available over the period of interest.

4.1 Visualize estimates distributions

Visualizing the density plots and the medians of the estimates can help you setting the thresholds' values.

from edsteva.viz.plots import estimates_densities_plot

estimates_densities_plot(
    probe=filtered_visit,
    fitted_model=step_function_model,
)
{ "schema-url": "assets/charts/estimates_densities.json" }

4.2 Set the thresholds

The estimates dashboard provides a representation of the overall deviation from the Model on the top and interactive sliders on the bottom that allows you to vary the thresholds. The idea is to set the thresholds that keep the most care sites while having an acceptable overall deviation.

from edsteva.viz.dashboards import estimates_dashboard

estimates_dashboard(
    probe=filtered_visit,
    fitted_model=step_function_model,
)

The threshold dashboard is available here.

4.3 Fix the deployment bias

Once you set the thresholds, you can extract for each stay type the care sites for which data availability is estimated to be stable over the entire study period.

t_0_max = "2020-01-01"  # 
c_0_min = 0.6  # 
error_max = 0.05  # 

estimates = step_function_model.estimates
selected_care_site = estimates[
    (estimates["t_0"] <= t_0_max)
    & (estimates["c_0"] >= c_0_min)
    & (estimates["error"] <= error_max)
]
print(selected_care_site["care_site_id"].unique())
[8312056386, 8457691845, 8745619784, 8314578956, 8314548764, 8542137845]

In this example, c0 and error thresholds have been set around the median (cf. distribution). However, this method is arbitrary and you have to find the appropriate method for your study with the help of the estimate dashboard.

Limitations

EDS-TeVa provides modelling tools to characterize the temporal variability of your data, it does not intend to provide direct methods to fix the deployment bias. As an open-source library, EDS-TeVa is also here to host a discussion in order to facilitate collective methodological convergence on flexible solutions. The default methods proposed in this example is intended to be reviewed and challenged by the user community.

Make it your own

The working example above describes the canonical usage workflow. However, you would probably need different Probes, Models, Visualizations and methods to set the thresholds for your projects. The components already available in the library are listed below but if it doesn't meet your requirements, you are encouraged to create your own.

Contribution

If you managed to implement your own component, or even if you just thought about a new component do not hesitate to share it with the community by following the contribution guidelines. Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

Available components

The VisitProbe computes cvisit(t) the availability of administrative stays:

c(t)=nvisit(t)nmax

Where nvisit(t) is the number of administrative stays, t is the month and nmax=maxt(nvisit(t)).

If the maximum number of records per month nmax is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

from edsteva.probes import VisitProbe

visit = VisitProbe()
visit.compute(
    data,
    stay_types={
        "Urg": "urgence",
        "Hospit": "hospitalisés",
        "Urg_Hospit": "urgence|hospitalisés",
    },
)
visit.predictor.head()
care_site_level care_site_id care_site_short_name stay_type date n_visit c
Unité Fonctionnelle (UF) 8312056386 Care site 1 'Urg' 2019-05-01 233.0 0.841
Unité Fonctionnelle (UF) 8312056386 Care site 1 'Urg' 2021-04-01 393.0 0.640
Pôle/DMU 8312027648 Care site 2 'Hospit' 2017-03-01 204.0 0.497
Pôle/DMU 8312027648 Care site 2 'Urg' 2018-08-01 22.0 0.274
Hôpital 8312022130 Care site 3 'Urg_Hospit' 2022-02-01 9746.0 0.769

The NoteProbe computes cnote(t) the availability of clinical documents:

The per_visit_default algorithm computes c(t) the availability of clinical documents linked to patients' administrative stays:

c(t)=nwithdoc(t)nvisit(t)

Where nvisit(t) is the number of administrative stays, nwithdoc the number of visits having at least one document and t is the month.

If the number of visits nvisit(t) is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

from edsteva.probes import NoteProbe

note = Note(completeness_predictor="per_visit_default")
note.compute(
    data,
    stay_types={
        "Urg": "urgence",
        "Hospit": "hospitalisés",
        "Urg_Hospit": "urgence|hospitalisés",
    },
    note_types={
        "All": ".*",
        "CRH": "crh",
        "Ordonnance": "ordo",
        "CR Passage Urgences": "urge",
    },
)
note.predictor.head()
care_site_level care_site_id care_site_short_name stay_type note_type date n_visit n_visit_with_note c
Unité Fonctionnelle (UF) 8312056386 Care site 1 'Urg' 'All' 2019-05-01 233.0 196.0 0.841
Unité Fonctionnelle (UF) 8653815660 Care site 1 'Hospit' 'CRH' 2017-04-01 393.0 252.0 0.640
Pôle/DMU 8312027648 Care site 2 'Hospit' 'CRH' 2021-03-01 204.0 101.0 0.497
Pôle/DMU 8312056379 Care site 2 'Urg' 'Ordonnance' 2018-08-01 22.0 6.0 0.274
Hôpital 8312022130 Care site 3 'Urg_Hospit' 'CR Passage Urgences' 2022-02-01 9746.0 7495.0 0.769

The per_note_default algorithm computes c(t) the availability of clinical documents as follow:

c(t)=nnote(t)nmax

Where nnote(t) is the number of clinical documents, t is the month and nmax=maxt(nnote(t)).

If the maximum number of recorded notes per month nmax is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

from edsteva.probes import NoteProbe

note = Note(completeness_predictor="per_note_default")
note.compute(
    data,
    stay_types={
        "Urg": "urgence",
        "Hospit": "hospitalisés",
        "Urg_Hospit": "urgence|hospitalisés",
    },
    note_types={
        "All": ".*",
        "CRH": "crh",
        "Ordonnance": "ordo",
        "CR Passage Urgences": "urge",
    },
)
note.predictor.head()
care_site_level care_site_id care_site_short_name stay_type note_type date n_note c
Unité Fonctionnelle (UF) 8312056386 Care site 1 'Urg' 'All' 2019-05-01 233.0 0.841
Unité Fonctionnelle (UF) 8653815660 Care site 1 'Hospit' 'CRH' 2017-04-01 393.0 0.640
Pôle/DMU 8312027648 Care site 2 'Hospit' 'CRH' 2021-03-01 204.0 0.497
Pôle/DMU 8312056379 Care site 2 'Urg' 'Ordonnance' 2018-08-01 22.0 0.274
Hôpital 8312022130 Care site 3 'Urg_Hospit' 'CR Passage Urgences' 2022-02-01 9746.0 0.769

The ConditionProbe computes ccondition(t) the availability of claim data:

The per_visit_default algorithm computes c(t) the availability of claim data linked to patients' administrative stays:

c(t)=nwithcondition(t)nvisit(t)

Where nvisit(t) is the number of administrative stays, nwithcondition the number of stays having at least one claim code (e.g. ICD-10) recorded and t is the month.

If the number of visits nvisit(t) is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

Care site level

AREM claim data are only available at hospital level.

from edsteva.probes import ConditionProbe

condition = ConditionProbe(completeness_predictor="per_visit_default")
condition.compute(
    data,
    stay_types={
        "Hospit": "hospitalisés",
    },
    diag_types={
        "All": ".*",
        "DP/DR": "DP|DR",
    },
    condition_types={
        "All": ".*",
        "Pulmonary_embolism": "I26",
    },
    source_systems=["AREM", "ORBIS"],
)
condition.predictor.head()
care_site_level care_site_id care_site_short_name stay_type diag_type condition_type source_systems date n_visit n_visit_with_condition c
Hôpital 8312057527 Care site 1 'Hospit' 'All' 'Pulmonary_embolism' AREM 2019-05-01 233.0 196.0 0.841
Hôpital 8312057527 Care site 1 'Hospit' 'DP/DR' 'Pulmonary_embolism' AREM 2021-04-01 393.0 252.0 0.640
Hôpital 8312027648 Care site 2 'Hospit' 'All' 'Pulmonary_embolism' AREM 2017-03-01 204.0 101.0 0.497
Unité Fonctionnelle (UF) 8312027648 Care site 2 'Hospit' 'All' 'All' ORBIS 2018-08-01 22.0 6.0 0.274
Pôle/DMU 8312022130 Care site 3 'Hospit' 'DP/DR' 'Pulmonary_embolism' ORBIS 2022-02-01 9746.0 7495.0 0.769

The per_condition_default algorithm computes c(t) the availability of claim data as follow:

c(t)=ncondition(t)nmax

Where ncondition(t) is the number of claim codes (e.g. ICD-10) recorded, t is the month and nmax=maxt(ncondition(t)).

If the maximum number of recorded diagnosis per month nmax is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

from edsteva.probes import ConditionProbe

condition = ConditionProbe(completeness_predictor="per_condition_default")
condition.compute(
    data,
    stay_types={
        "All": ".*",
        "Hospit": "hospitalisés",
    },
    diag_types={
        "All": ".*",
        "DP/DR": "DP|DR",
    },
    condition_types={
        "All": ".*",
        "Pulmonary_embolism": "I26",
    },
    source_systems=["AREM", "ORBIS"],
)
condition.predictor.head()
care_site_level care_site_id care_site_short_name stay_type diag_type condition_type source_systems date n_condition c
Hôpital 8312057527 Care site 1 'Hospit' 'All' 'Pulmonary_embolism' AREM 2019-05-01 233.0 0.841
Hôpital 8312057527 Care site 1 'Hospit' 'DP/DR' 'Pulmonary_embolism' AREM 2021-04-01 393.0 0.640
Hôpital 8312027648 Care site 2 'Hospit' 'All' 'Pulmonary_embolism' AREM 2017-03-01 204.0 0.497
Unité Fonctionnelle (UF) 8312027648 Care site 2 'Hospit' 'All' 'All' ORBIS 2018-08-01 22.0 0.274
Pôle/DMU 8312022130 Care site 3 'Hospit' 'DP/DR' 'Pulmonary_embolism' ORBIS 2022-02-01 9746.0 0.769

The BiologyProbe computes c(t) the availability of laboratory data:

The per_visit_default algorithm computes c(t) the availability of laboratory data linked to patients' administrative stays:

c(t)=nwithbiology(t)nvisit(t)

Where nvisit(t) is the number of administrative stays, nwithbiology the number of stays having at least one biological measurement recorded and t is the month.

If the number of visits nvisit(t) is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

Care site level

Laboratory data are only available at hospital level.

from edsteva.probes import BiologyProbe

biology = BiologyProbe(completeness_predictor="per_visit_default")
biology.compute(
    data,
    stay_types={
        "Hospit": "hospitalisés",
    },
    concepts_sets={
        "Créatinine": "E3180|G1974|J1002|A7813|A0094|G1975|J1172|G7834|F9409|F9410|C0697|H4038|F2621",
        "Leucocytes": "A0174|K3232|H6740|E4358|C9784|C8824|E6953",
    },
)
biology.predictor.head()
care_site_level care_site_id care_site_short_name stay_type concepts_sets date n_visit n_visit_with_measurement c
Hôpital 8312057527 Care site 1 'Hospit' 'Créatinine' 2019-05-01 233.0 196.0 0.841
Hôpital 8312057527 Care site 1 'Hospit' 'Leucocytes' 2021-04-01 393.0 252.0 0.640
Hôpital 8312027648 Care site 2 'Hospit' 'Créatinine' 2017-03-01 204.0 101.0 0.497
Hôpital 8312027648 Care site 2 'Hospit' 'Leucocytes' 2018-08-01 22.0 6.0 0.274
Hôpital 8312022130 Care site 3 'Hospit' 'Leucocytes' 2022-02-01 9746.0 7495.0 0.769

The per_measurement_default algorithm computes c(t) the availability of biological measurements:

c(t)=nbiology(t)nmax

Where nbiology(t) is the number of biological measurements, t is the month and nmax=maxt(nbiology(t)).

If the maximum number of recorded biological measurements per month nmax is equal to 0, we consider that the completeness predictor c(t) is also equal to 0.

Care site level

Laboratory data are only available at hospital level.

from edsteva.probes import BiologyProbe

biology = BiologyProbe(completeness_predictor="per_measurement_default")
biology.compute(
    data,
    stay_types={
        "Hospit": "hospitalisés",
    },
    concepts_sets={
        "Créatinine": "E3180|G1974|J1002|A7813|A0094|G1975|J1172|G7834|F9409|F9410|C0697|H4038|F2621",
        "Leucocytes": "A0174|K3232|H6740|E4358|C9784|C8824|E6953",
    },
)
biology.predictor.head()
care_site_level care_site_id care_site_short_name stay_type concepts_sets date n_measurement c
Hôpital 8312057527 Care site 1 'Hospit' 'Créatinine' 2019-05-01 233.0 0.841
Hôpital 8312057527 Care site 1 'Hospit' 'Leucocytes' 2021-04-01 393.0 0.640
Hôpital 8312027648 Care site 2 'Hospit' 'Créatinine' 2017-03-01 204.0 0.497
Unité Fonctionnelle (UF) 8312027648 Care site 2 'Hospit' 'Leucocytes' 2018-08-01 22.0 0.274
Pôle/DMU 8312022130 Care site 3 'Hospit' 'Leucocytes' 2022-02-01 9746.0 0.769

The StepFunction fits a step function ft0,c0(t) with coefficients Θ=(t0,c0) on a completeness predictor c(t):

ft0,c0(t)=c0 1tt0(t)c(t)=ft0,c0(t)+ϵ(t)
  • the characteristic time t0 estimates the time after which the data is available.
  • the characteristic value c0 estimates the stabilized routine completeness.

The default metric computed is the mean squared error after t0:

error=t0ttmaxϵ(t)2tmaxt0
  • error estimates the stability of the data after t0.

Custom metric

You can define your own metric if this one doesn't meet your requirements.

The available algorithms used to fit the step function are listed below:

Custom algo

You can define your own algorithm if they don't meet your requirements.

This algorithm computes the estimated coefficients t0^ and c0^ by minimizing the loss function L(t0,c0):

L(t0,c0)=t=tmintmaxl(c(t),ft0,c0(t))tmaxtmin(t0^,c0^)=argmint0,c0(L(t0,c0))

Default loss function l

The loss function is l2 by default: l(c(t),ft0,c0(t))=|c(t)ft0,c0(t)|2

Optimal estimates

For complexity purposes, this algorithm has been implemented with a dependency relation between c0 and t0 derived from the optimal estimates using the l2 loss function. For more informations, you can have a look on the source code.

In this algorithm, c0^ is directly estimated as the xth quantile of the completeness predictor c(t), where x is a number between 0 and 1. Then, t0^ is the first time c(t) reaches c0^.

c0^=xth quantile of c(t)t0^=argmint(c(t)c0^)

Default quantile x

The default quantile is x=0.8.

from edsteva.models.step_function import StepFunction

step_function_model = StepFunction()
step_function_model.fit(probe)
step_function_model.estimates.head()
care_site_level care_site_id stay_type t_0 c_0 error
Unité Fonctionnelle (UF) 8312056386 'Urg' 2019-05-01 0.397 0.040
Unité Fonctionnelle (UF) 8312056386 'All' 2017-04-01 0.583 0.028
Pôle/DMU 8312027648 'Hospit' 2021-03-01 0.677 0.022
Pôle/DMU 8312027648 'All' 2018-08-01 0.764 0.014
Hôpital 8312022130 'Hospit' 2022-02-01 0.652 0.027

The RectangleFunction fits a step function ft0,c0,t1(t) with coefficients Θ=(t0,c0,t1) on a completeness predictor c(t):

ft0,c0,t1(t)=c0 1t0tt1(t)c(t)=ft0,c0,t1(t)+ϵ(t)
  • the characteristic time t0 estimates the time after which the data is available.
  • the characteristic time t1 estimates the time after which the data is not available anymore.
  • the characteristic value c0 estimates the completeness between t0 and t1.

The default metric computed is the mean squared error between t0 and t1:

error=t0tt1ϵ(t)2t1t0
  • error estimates the stability of the data between t0 and t1.

Custom metric

You can define your own metric if this one doesn't meet your requirements.

The available algorithms used to fit the step function are listed below:

Custom algo

You can define your own algorithm if they don't meet your requirements.

This algorithm computes the estimated coefficients t0^, c0^ and t1^ by minimizing the loss function L(t0,c0,t1):

L(t0,c0,t1)=t=tmintmaxl(c(t),ft0,c0,t1(t))tmaxtmin(t0^,t1^,c0^)=argmint0,c0,t1(L(t0,c0,t1))

Default loss function l

The loss function is l2 by default: l(c(t),ft0,c0,t1(t))=|c(t)ft0,c0,t1(t)|2

Optimal estimates

For complexity purposes, this algorithm has been implemented with a dependency relation between c0 and t0 derived from the optimal estimates using the l2 loss function. For more informations, you can have a look on the source code.

from edsteva.models.rectangle_function import RectangleFunction

rectangle_function_model = RectangleFunction()
rectangle_function_model.fit(probe)
rectangle_function_model.estimates.head()
care_site_level care_site_id stay_type t_0 c_0 t_1 error
Unité Fonctionnelle (UF) 8312056386 'Urg' 2019-05-01 0.397 2020-05-01 0.040
Unité Fonctionnelle (UF) 8312056386 'All' 2017-04-01 0.583 2013-04-01 0.028
Pôle/DMU 8312027648 'Hospit' 2021-03-01 0.677 2022-03-01 0.022
Pôle/DMU 8312027648 'All' 2018-08-01 0.764 2019-08-01 0.014
Hôpital 8312022130 'Hospit' 2022-02-01 0.652 2022-08-01 0.027

The library provides interactive dashboards that let you set any combination of care sites, stay types and other columns if included in the Probe. You can only export a dashboard in HTML format.

The probe_dashboard() returns:

  • On the top, the aggregated variable is the average completeness predictor c(t) over time t with the prediction c^(t) if the fitted Model is specified.
  • On the bottom, the interactive filters are all the columns included in the Probe (such as time, care site, number of visits...etc.).

from edsteva.viz.dashboards import probe_dashboard

probe_dashboard(
    probe=probe,
    fitted_model=step_function_model,
    care_site_level=care_site_level,
)
An example is available here.

The normalized_probe_dashboard() returns a representation of the overall deviation from the Model:

  • On the top, the aggregated variable is a normalized completeness predictor c(t)c0 over normalized time tt0.
  • On the bottom, the interactive filters are all the columns included in the Probe (such as time, care site, number of visits...etc.) with all the Model coefficients and metrics included in the Model.
from edsteva.viz.dashboards import normalized_probe_dashboard

normalized_probe_dashboard(
    probe=probe,
    fitted_model=step_function_model,
    care_site_level=care_site_level,
)

An example is available here.

The library provides static plots that you can export in png or svg. As it is less interactive, you may specify the filters in the inputs of the functions.

The probe_plot() returns the top plot of the probe_dashboard(): the normalized completeness predictor c(t)c0 over normalized time tt0.

from edsteva.viz.plots import probe_plot

probe_plot(
    probe=probe,
    fitted_model=step_function_model,
    care_site_level=care_site_level,
    stay_type=stay_type,
    save_path=plot_path,
)

{ "schema-url": "assets/charts/fitted_visit.json" }

The normalized_probe_plot() returns the top plot of the normalized_probe_dashboard(). Consequently, you have to specify the filters in the inputs of the function.

from edsteva.viz.plots import normalized_probe_plot

normalized_probe_plot(
    probe=probe,
    fitted_model=step_function_model,
    t_min=-15,
    t_max=15,
    save_path=plot_path,
)
{ "schema-url": "assets/charts/normalized_probe.json" }

The estimates_densities_plot() returns the density plot and the median of each estimate. It can help you to set the thresholds.

from edsteva.viz.plots import estimates_densities_plot

estimates_densities_plot(
    fitted_model=step_function_model,
)
{ "schema-url": "assets/charts/estimates_densities.json" }


  1. Samuel G Finlayson, Adarsh Subbaswamy, Karandeep Singh, John Bowers, Annabel Kupke, Jonathan Zittrain, Isaac S Kohane, and Suchi Saria. The clinician and dataset shift in artificial intelligence. The New England journal of medicine, 385(3):283, 2021.