Skip to content

edsteva.metrics.error

error

error(
    predictor: pd.DataFrame,
    estimates: pd.DataFrame,
    index: List[str],
    loss_function: Callable = loss_functions.l2_loss,
    y: str = "c",
    y_0: str = "c_0",
    x: str = "date",
    name: str = "error",
)

Compute the error between the predictor \(c(t)\) and the prediction \(\hat{c}(t)\) as follow:

\[ error = \frac{\sum_{t_{min} \leq t \leq t_{max}} \mathcal{l}(c(t), \hat{c}(t))}{t_{max} - t_{min}} \]

Where the loss function \(\mathcal{l}\) can be the L1 distance or the L2 distance.

PARAMETER DESCRIPTION
predictor

\(c(t)\) computed in the Probe

TYPE: pd.DataFrame

estimates

\(\hat{c}(t)\) computed in the Model

TYPE: pd.DataFrame

index

Variable from which data is grouped

TYPE: List[str]

loss_function

The loss function \(\mathcal{l}\)

TYPE: str DEFAULT: loss_functions.l2_loss

y

Target column name of \(c(t)\)

TYPE: str DEFAULT: 'c'

y_0

Target column name of \(\hat{c}(t)\)

TYPE: str DEFAULT: 'c_0'

x

Target column name of \(t\)

TYPE: str DEFAULT: 'date'

name

Column name of the output

TYPE: str DEFAULT: 'error'

Example
care_site_level care_site_id stay_type error
Unité Fonctionnelle (UF) 8312056386 'Urg_Hospit' 0.040
Unité Fonctionnelle (UF) 8312056386 'All' 0.028
Pôle/DMU 8312027648 'Urg_Hospit' 0.022
Pôle/DMU 8312027648 'All' 0.014
Hôpital 8312022130 'Urg_Hospit' 0.027
Source code in edsteva/metrics/error.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def error(
    predictor: pd.DataFrame,
    estimates: pd.DataFrame,
    index: List[str],
    loss_function: Callable = loss_functions.l2_loss,
    y: str = "c",
    y_0: str = "c_0",
    x: str = "date",
    name: str = "error",
):
    r"""Compute the error between the predictor $c(t)$ and the prediction $\hat{c}(t)$ as follow:

    $$
    error = \frac{\sum_{t_{min} \leq  t \leq t_{max}} \mathcal{l}(c(t), \hat{c}(t))}{t_{max} - t_{min}}
    $$

    Where the loss function $\mathcal{l}$ can be the L1 distance or the L2 distance.

    Parameters
    ----------
    predictor : pd.DataFrame
        $c(t)$ computed in the Probe
    estimates : pd.DataFrame
        $\hat{c}(t)$ computed in the Model
    index : List[str]
        Variable from which data is grouped
    loss_function : str, optional
        The loss function $\mathcal{l}$
    y : str, optional
        Target column name of $c(t)$
    y_0 : str, optional
        Target column name of $\hat{c}(t)$
    x : str, optional
        Target column name of $t$
    name : str, optional
        Column name of the output

    Example
    -------

    | care_site_level          | care_site_id | stay_type    | error |
    | :----------------------- | :----------- | :----------- | :---- |
    | Unité Fonctionnelle (UF) | 8312056386   | 'Urg_Hospit' | 0.040 |
    | Unité Fonctionnelle (UF) | 8312056386   | 'All'        | 0.028 |
    | Pôle/DMU                 | 8312027648   | 'Urg_Hospit' | 0.022 |
    | Pôle/DMU                 | 8312027648   | 'All'        | 0.014 |
    | Hôpital                  | 8312022130   | 'Urg_Hospit' | 0.027 |
    """
    check_columns(df=estimates, required_columns=[*index, y_0])
    check_columns(df=predictor, required_columns=[*index, x, y])

    fitted_predictor = predictor.merge(estimates, on=index)

    fitted_predictor["loss"] = loss_function(
        fitted_predictor[y] - fitted_predictor[y_0]
    )

    error = fitted_predictor.groupby(index)["loss"].mean().rename(name)

    return error.reset_index()