Model

Choosing or customizing a Model is the third step in the EDS-TeVa usage workflow.

Definition

A Model is a python class designed to characterize the temporal variability of data availability. It estimates the coefficients $Θ$ and the metrics from a Probe.

Input

The Model class is expecting a Probe object in order to estimate the Model coefficients $Θ$ and some metrics if desired.

Attributes

estimates is a Pandas.DataFrame computed by the fit() method. It contains the estimated coefficients $Θ$ and metrics for each column given by the Probe._index (e.g. care site, stay type, etc.).
_coefs is the list of the Model coefficients $Θ$ that are estimated by the fit() method.

Methods

fit() method calls the fit_process() method to compute the estimated coefficients $Θ$ and metrics and store them in the estimates attribute.
fit_process() method computes the estimated coefficients $Θ$ and metrics from a Probe.predictor DataFrame.
predict() method applies the predict_process() on a Probe.predictor DataFrame and returns a Pandas.DataFrame of the estimated prediction $\hat{c} (t)$ for each columns given by Probe._index.
predict_process() method computes the estimated completeness predictor $\hat{c} (t)$ for each column given by Probe._index.
save() method saves the Model in the desired path. By default it is saved in the cache directory (~/.cache/edsteva/models).
load() method loads the Model from the desired path. By default it is loaded from the cache directory (~/.cache/edsteva/models).

Prediction

predict() method must be called on a fitted Model.

Estimates schema

Data stored in the estimates attribute follows a specific schema:

Indexes

The estimates are computed for each column given by the Probe._index. For example, if you fit your Model on the VisitProbe, the estimates will be computed for each:

care_site_level: care site hierarchic level (uf, pole, hospital).
care_site_id: care site unique identifier.
stay_type: type of stay (hospitalisés, urgence, hospitalisation incomplète, consultation externe).

Model coefficients

It depends on the Model used, for instance the step function Model has 2 coefficients:

$t_{0}$ the characteristic time that estimates the time the after which the data is available.
$c_{0}$ the characteristic completeness that estimates the stabilized routine completeness after $t_{0}$ .

Metrics

It depends on the metrics you specify in the fit() method. For instance, you can specify an $e r r o r$ metric:

e r r o r = \frac{\sum_{t_{0} \leq t \leq t_{m a x}} ϵ (t)^{2}}{t_{m a x} - t_{0}}

$e r r o r$ estimates the stability of the data after $t_{0}$ .

Example

When considering the StepFunction.estimates fitted on a VisitProbe, it may for instance look like this:

care_site_level	care_site_id	stay_type	t_0	c_0	error
Unité Fonctionnelle (UF)	8312056386	'Urg'	2019-05-01	0.397	0.040
Pôle/DMU	8653815660	'All'	2011-04-01	0.583	0.028
Unité Fonctionnelle (UF)	8312027648	'Hospit'	2021-03-01	0.677	0.022
Unité Fonctionnelle (UF)	8312056379	'All'	2018-08-01	0.764	0.014
Hôpital	8312022130	'Hospit'	2022-02-01	0.652	0.027

Saving and loading a fitted Model

In order to ease the future loading of a Model that has been fitted with the fit() method, one can pickle it using the save() method. This enables a rapid loading of the Model from local disk using the load() method.

from edsteva.models import StepFunction

model = StepFunction()

model.fit(probe)  # Computation of the estimates (long).

model.save()  # Saving of the fitted Model on the local disk.


model_2 = StepFunction()
model_2.load()  # Rapid loading of the fitted Model fom the local disk.

Defining a custom Model

If none of the available Models meets your requirements, you may want to create your own. To define a custom Model class CustomModel that inherits from the abstract class BaseModel you'll have to implement the fit_process() and predict_process() methods (these methods are respectively called by the fit() method and the predict() method inherited by the BaseModel class). You'll also have to define the _coefs attribute which is the list of the Model coefficients.

from edsteva.models import BaseModel
from edsteva.probes import BaseProbe


# Definition of a new Model class
class CustomProbe(BaseModel):
    _coefs = ["my_model_coefficient_1", "my_model_coefficient_2"]

    def fit_process(self, probe: BaseProbe):
        # fit process
        return custom_predictor

    def predict_process(self, probe: BaseProbe):
        # predict process
        return custom_predictor

fit_process() and predict_process() methods take a Probe as the first argument. All other parameters must be keyword arguments. For a detailed example of the implementation of a Model, please have a look on the implemented StepFunction Model.

Contributions

If you managed to create your own Model do not hesitate to share it with the community by following the contribution guidelines. Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

Available Models

We detail hereafter the step function Model that has already been implemented in the library.

StepFunctionRectangleFunction

CoefficientsMetricsAlgosExample

The StepFunction fits a step function $f_{t_{0}, c_{0}} (t)$ with coefficients $Θ = (t_{0}, c_{0})$ on a completeness predictor $c (t)$ :

\begin{aligned} f_{t_{0}, c_{0}} (t) & = c_{0} 1_{t \geq t_{0}} (t) \\ c (t) & = f_{t_{0}, c_{0}} (t) + ϵ (t) \end{aligned}

the characteristic time $t_{0}$ estimates the time after which the data is available.
the characteristic value $c_{0}$ estimates the stabilized routine completeness.

The default metric computed is the mean squared error after $t_{0}$ :

e r r o r = \frac{\sum_{t_{0} \leq t \leq t_{m a x}} ϵ (t)^{2}}{t_{m a x} - t_{0}}

$e r r o r$ estimates the stability of the data after $t_{0}$ .

Custom metric

You can define your own metric if this one doesn't meet your requirements.

The available algorithms used to fit the step function are listed below:

Custom algo

You can define your own algo if they don't meet your requirements.

Loss minimizationQuantile

This algorithm computes the estimated coefficients $\hat{t_{0}}$ and $\hat{c_{0}}$ by minimizing the loss function $L (t_{0}, c_{0})$ :

\begin{aligned} L (t_{0}, c_{0}) & = \frac{\sum_{t = t_{m i n}}^{t_{m a x}} l (c (t), f_{t_{0}, c_{0}} (t))}{t_{m a x} - t_{m i n}} \\ (\hat{t_{0}}, \hat{c_{0}}) & = \underset{t_{0}, c_{0}}{argmin} (L (t_{0}, c_{0})) \end{aligned}

Default loss function $l$

The loss function is $l_{2}$ by default: $l (c (t), f_{t_{0}, c_{0}} (t)) = | c (t) - f_{t_{0}, c_{0}} (t) |^{2}$

Optimal estimates

For complexity purposes, this algorithm has been implemented to compute the optimal estimates only with the $l_{2}$ loss function. For more informations, you can have a look on the source code.

In this algorithm, $\hat{c_{0}}$ is directly estimated as the $x^{t h}$ quantile of the completeness predictor $c (t)$ , where $x$ is a number between 0 and 1. Then, $\hat{t_{0}}$ is the first time $c (t)$ reaches $\hat{c_{0}}$ .

\begin{aligned} \hat{c_{0}} & = x^{t h} quantile of c (t) \\ \hat{t_{0}} & = \underset{t}{argmin} (c (t) \geq \hat{c_{0}}) \end{aligned}

Default quantile $x$

The default quantile is $x = 0.8$ .

from edsteva.models.step_function import StepFunction

step_function_model = StepFunction()
step_function_model.fit(probe)
step_function_model.estimates.head()

care_site_level	care_site_id	stay_type	t_0	c_0	error
Unité Fonctionnelle (UF)	8312056386	'Urg'	2019-05-01	0.397	0.040
Unité Fonctionnelle (UF)	8312056386	'All'	2011-04-01	0.583	0.028
Pôle/DMU	8312027648	'Hospit'	2021-03-01	0.677	0.022
Pôle/DMU	8312027648	'All'	2018-08-01	0.764	0.014
Hôpital	8312022130	'Hospit'	2022-02-01	0.652	0.027

CoefficientsMetricsAlgosExample

The RectangleFunction fits a step function $f_{t_{0}, c_{0}, t_{1}} (t)$ with coefficients $Θ = (t_{0}, c_{0}, t_{1})$ on a completeness predictor $c (t)$ :

\begin{aligned} f_{t_{0}, c_{0}, t_{1}} (t) & = c_{0} 1_{t_{0} \leq t \leq t_{1}} (t) \\ c (t) & = f_{t_{0}, c_{0}, t_{1}} (t) + ϵ (t) \end{aligned}

the characteristic time $t_{0}$ estimates the time after which the data is available.
the characteristic time $t_{1}$ estimates the time after which the data is not available anymore.
the characteristic value $c_{0}$ estimates the completeness between $t_{0}$ and $t_{1}$ .

The default metric computed is the mean squared error between $t_{0}$ and $t_{1}$ :

e r r o r = \frac{\sum_{t_{0} \leq t \leq t_{1}} ϵ (t)^{2}}{t_{1} - t_{0}}

$e r r o r$ estimates the stability of the data between $t_{0}$ and $t_{1}$ .

Custom metric

You can define your own metric if this one doesn't meet your requirements.

The available algorithms used to fit the step function are listed below:

Custom algo

You can define your own algorithm if they don't meet your requirements.

Loss minimization

This algorithm computes the estimated coefficients $\hat{t_{0}}$ , $\hat{c_{0}}$ and $\hat{t_{1}}$ by minimizing the loss function $L (t_{0}, c_{0}, t_{1})$ :

\begin{aligned} L (t_{0}, c_{0}, t_{1}) & = \frac{\sum_{t = t_{m i n}}^{t_{m a x}} l (c (t), f_{t_{0}, c_{0}, t_{1}} (t))}{t_{m a x} - t_{m i n}} \\ (\hat{t_{0}}, \hat{t_{1}}, \hat{c_{0}}) & = \underset{t_{0}, c_{0}, t_{1}}{argmin} (L (t_{0}, c_{0}, t_{1})) \end{aligned}

Default loss function $l$

The loss function is $l_{2}$ by default: $l (c (t), f_{t_{0}, c_{0}, t_{1}} (t)) = | c (t) - f_{t_{0}, c_{0}, t_{1}} (t) |^{2}$

Optimal estimates

For complexity purposes, this algorithm has been implemented with a dependency relation between $c_{0}$ and $t_{0}$ derived from the optimal estimates using the $l_{2}$ loss function. For more informations, you can have a look on the source code.

from edsteva.models.rectangle_function import RectangleFunction

rectangle_function_model = RectangleFunction()
rectangle_function_model.fit(probe)
rectangle_function_model.estimates.head()

care_site_level	care_site_id	stay_type	t_0	c_0	t_1	error
Unité Fonctionnelle (UF)	8312056386	'Urg'	2019-05-01	0.397	2020-05-01	0.040
Unité Fonctionnelle (UF)	8312056386	'All'	2011-04-01	0.583	2013-04-01	0.028
Pôle/DMU	8312027648	'Hospit'	2021-03-01	0.677	2022-03-01	0.022
Pôle/DMU	8312027648	'All'	2018-08-01	0.764	2019-08-01	0.014
Hôpital	8312022130	'Hospit'	2022-02-01	0.652	2022-08-01	0.027