Skip to content

edsteva.metrics.error_between_t0_t1

error_between_t0_t1

error_between_t0_t1(
    predictor: pd.DataFrame,
    estimates: pd.DataFrame,
    index: List[str],
    loss_function: Callable = loss_functions.l2_loss,
    y: str = "c",
    y_0: str = "c_0",
    t_0: str = "t_0",
    t_1: str = "t_1",
    x: str = "date",
    name: str = "error",
)

Compute the error between the predictor \(c(t)\) and the prediction \(\hat{c}(t)\) after \(t_0\) as follow:

\[ error = \frac{\sum_{t_0 \leq t \leq t_{max}} \mathcal{l}(c(t), \hat{c}(t))}{t_{max} - t_0} \]

Where the loss function \(\mathcal{l}\) can be the L1 distance or the L2 distance.

PARAMETER DESCRIPTION
predictor

\(c(t)\) computed in the Probe

TYPE: pd.DataFrame

estimates

\(\hat{c}(t)\) computed in the Model

TYPE: pd.DataFrame

index

Variable from which data is grouped

TYPE: List[str]

loss_function

The loss function \(\mathcal{l}\)

TYPE: Callable DEFAULT: loss_functions.l2_loss

y

Column name for the completeness variable \(c(t)\)

TYPE: str DEFAULT: 'c'

y_0

Column name for the predicted completeness variable \(\hat{c}(t)\)

TYPE: str DEFAULT: 'c_0'

t_0

Column name for the predicted threshold \(t_0\)

TYPE: str DEFAULT: 't_0'

t_1

Column name for the predicted threshold \(t_1\)

TYPE: str DEFAULT: 't_1'

x

Column name for the time variable \(t\)

TYPE: str DEFAULT: 'date'

name

Column name for the metric output

TYPE: str DEFAULT: 'error'

Example
care_site_level care_site_id stay_type error
Unité Fonctionnelle (UF) 8312056386 'Urg' 0.040
Unité Fonctionnelle (UF) 8312056386 'All' 0.028
Pôle/DMU 8312027648 'Urg' 0.022
Pôle/DMU 8312027648 'All' 0.014
Hôpital 8312022130 'Urg' 0.027
Source code in edsteva/metrics/error_between_t0_t1.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def error_between_t0_t1(
    predictor: pd.DataFrame,
    estimates: pd.DataFrame,
    index: List[str],
    loss_function: Callable = loss_functions.l2_loss,
    y: str = "c",
    y_0: str = "c_0",
    t_0: str = "t_0",
    t_1: str = "t_1",
    x: str = "date",
    name: str = "error",
):
    r"""Compute the error between the predictor $c(t)$ and the prediction $\hat{c}(t)$ after $t_0$ as follow:

    $$
    error = \frac{\sum_{t_0 \leq  t \leq t_{max}} \mathcal{l}(c(t), \hat{c}(t))}{t_{max} - t_0}
    $$

    Where the loss function $\mathcal{l}$ can be the L1 distance or the L2 distance.

    Parameters
    ----------
    predictor : pd.DataFrame
        $c(t)$ computed in the Probe
    estimates : pd.DataFrame
        $\hat{c}(t)$ computed in the Model
    index : List[str]
        Variable from which data is grouped
    loss_function : Callable, optional
        The loss function $\mathcal{l}$
    y : str, optional
        Column name for the completeness variable $c(t)$
    y_0 : str, optional
        Column name for the predicted completeness variable $\hat{c}(t)$
    t_0 : str, optional
        Column name for the predicted threshold $t_0$
    t_1 : str, optional
        Column name for the predicted threshold $t_1$
    x : str, optional
        Column name for the time variable $t$
    name : str, optional
        Column name for the metric output

    Example
    -------

    | care_site_level          | care_site_id | stay_type | error |
    | :----------------------- | :----------- | :---------| :---- |
    | Unité Fonctionnelle (UF) | 8312056386   | 'Urg'     | 0.040 |
    | Unité Fonctionnelle (UF) | 8312056386   | 'All'     | 0.028 |
    | Pôle/DMU                 | 8312027648   | 'Urg'     | 0.022 |
    | Pôle/DMU                 | 8312027648   | 'All'     | 0.014 |
    | Hôpital                  | 8312022130   | 'Urg'     | 0.027 |
    """
    check_columns(df=estimates, required_columns=[*index, y_0, t_0, t_1])
    check_columns(df=predictor, required_columns=[*index, x, y])

    fitted_predictor = predictor.merge(estimates, on=index)

    fitted_predictor = fitted_predictor.dropna(subset=[t_0, t_1])

    fitted_predictor["loss"] = loss_function(
        fitted_predictor[y] - fitted_predictor[y_0]
    )

    mask_between_t0_t1 = (fitted_predictor[x] >= fitted_predictor[t_0]) & (
        fitted_predictor[x] <= fitted_predictor[t_1]
    )
    fitted_predictor = fitted_predictor.loc[mask_between_t0_t1]

    error = fitted_predictor.groupby(index)["loss"].mean().rename(name)

    return error.reset_index()