`edsteva.models.step_function.algos.loss_minimization`

loss_minimization

loss_minimization(
    predictor: pd.DataFrame,
    index: List[str],
    x_col: str = "date",
    y_col: str = "c",
    loss_function: Callable = l2_loss,
) -> pd.DataFrame

Computes the threshold $t_{0}$ of a predictor $c (t)$ by minimizing the following loss function:

\begin{aligned} L (t_{0}) & = \frac{\sum_{t = t_{m i n}}^{t_{m a x}} l (c (t), f_{t_{0}} (t))}{t_{m a x} - t_{m i n}} \\ \hat{t_{0}} & = \underset{t_{0}}{argmin} (L (t_{0})) \end{aligned}

Where the loss function $l$ is by default the L2 distance and the estimated completeness $c_{0}$ is the mean completeness after $t_{0}$ .

\begin{aligned} l (c (t), f_{t_{0}} (t)) & = | c (t) - f_{t_{0}} (t) |^{2} \\ c_{0} & = \frac{\sum_{t = t_{0}}^{t_{m a x}} c (t)}{t_{m a x} - t_{0}} \end{aligned}

PARAMETER	DESCRIPTION
`predictor`	$c (t)$ computed in the Probe TYPE: `pd.DataFrame`
`index`	Variable from which data is grouped EXAMPLE: `["care_site_level", "stay_type", "note_type", "care_site_id"]` TYPE: `List[str]`
`x_col`	Column name for the time variable $t$ TYPE: `str` DEFAULT: `'date'`
`y_col`	Column name for the completeness variable $c (t)$ TYPE: `str` DEFAULT: `'c'`
`loss_function`	The loss function $L$ TYPE: `Callable` DEFAULT: `l2_loss`

Source code in edsteva/models/step_function/algos/loss_minimization.py

def loss_minimization(
    predictor: pd.DataFrame,
    index: List[str],
    x_col: str = "date",
    y_col: str = "c",
    loss_function: Callable = l2_loss,
) -> pd.DataFrame:
    r"""Computes the threshold $t_0$ of a predictor $c(t)$ by minimizing the following loss function:

    $$
    \begin{aligned}
    \mathcal{L}(t_0) & = \frac{\sum_{t = t_{min}}^{t_{max}} \mathcal{l}(c(t), f_{t_0}(t))}{t_{max} - t_{min}} \\
    \hat{t_0} & = \underset{t_0}{\mathrm{argmin}}(\mathcal{L}(t_0))
    \end{aligned}
    $$

    Where the loss function $\mathcal{l}$ is by default the L2 distance and the estimated completeness $c_0$ is the mean completeness after $t_0$.

    $$
    \begin{aligned}
    \mathcal{l}(c(t), f_{t_0}(t)) & = |c(t) - f_{t_0}(t)|^2 \\
    c_0 & = \frac{\sum_{t = t_0}^{t_{max}} c(t)}{t_{max} - t_0}
    \end{aligned}
    $$


    Parameters
    ----------
    predictor : pd.DataFrame
        $c(t)$ computed in the Probe
    index : List[str]
        Variable from which data is grouped

        **EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
    x_col : str, optional
        Column name for the time variable $t$
    y_col : str, optional
        Column name  for the completeness variable $c(t)$
    loss_function : Callable, optional
        The loss function $\mathcal{L}$
    """
    check_columns(df=predictor, required_columns=[*index, x_col, y_col])
    predictor = predictor.sort_values(x_col)
    cols = [*index, x_col, y_col]
    iter = predictor[cols].groupby(index)
    results = []
    for partition, group in tqdm.tqdm(iter):
        if not isinstance(partition, tuple):
            partition = tuple([partition])
        row = dict(zip(index, partition))
        t_0, c_0 = _compute_one_threshold(
            group,
            x_col,
            y_col,
            loss_function,
        )
        row["t_0"] = t_0
        row["c_0"] = c_0
        results.append(row)

    return pd.DataFrame(results)