Probe
Choosing or customizing a Probe is the second step in the EDS-TeVa usage workflow.
Definition
A Probe is a python class designed to characterize data availability of a target variable over time \(t\). It aggregates the loaded data to obtain a completeness predictor \(c(t)\).
Input
As detailled in the dedicated section, the Probe class is expecting a Data
object with Pandas or Koalas DataFrames. We provide various connectors to facilitate data fetching, namely a Hive connector, a Postgres connector and a LocalData.
Attributes
predictor
is aPandas.DataFrame
computed by thecompute()
method. It contains the desired completeness predictor \(c(t)\) for each column in the_index
attribute (care site, stay type and any other needed column)._index
is the list of columns that are used to aggregate the data in thecompute()
method.
Methods
compute()
method calls thecompute_process()
method to compute the completeness predictors \(c(t)\) and store them in thepredictor
attribute.compute_process()
method aggregates the input data to compute the completeness predictors \(c(t)\).filter_care_site()
method filterspredictor
attribute on the selected care sites including upper and lower levels care sites.save()
method saves theProbe
in the desired path. By default it is saved in the the cache directory (~/.cache/edsteva/probes).load()
method loads theProbe
from the desired path. By default it is loaded from the the cache directory (~/.cache/edsteva/probes).
Predictor schema
Data stored in predictor
attribute follows a specific schema:
Predictors
It must include a completeness predictor \(c(t)\):
c
: value of the completeness predictor \(c(t)\).
Then, it can have any other extra predictor you find useful such as:
n_visit
: the number of visits.
Extra predictor
The extra predictors must be additive to be aggregated properly in the dashboards. For instance, the number of visits is additive but the \(99^{th}\) percentile is not.
Indexes
It must include one and only one time related column:
date
: date of the event associated with the target variable (by default, the dates are truncated to the month in which the event occurs).
Then, it can have any other string type column such as:
care_site_level
: care site hierarchic level (uf
,pole
,hospital
).care_site_id
: care site unique identifier.stay_type
: type of stay (hospitalisés
,urgence
,hospitalisation incomplète
,consultation externe
).note_type
: type of note (CRH
,Ordonnance
,CR Passage Urgences
).
Example
When considering the availability of clinical notes, a NoteProbe.predictor
may for instance look like this:
care_site_level | care_site_id | care_site_short_name | stay_type | note_type | date | n_visit | c |
---|---|---|---|---|---|---|---|
Unité Fonctionnelle (UF) | 8312056386 | Care site 1 | 'Urg_Hospit' | 'All' | 2019-05-01 | 233.0 | '0.841 |
Unité Fonctionnelle (UF) | 8653815660 | Care site 1 | 'All' | 'CRH' | 2011-04-01 | 393.0 | 0.640 |
Pôle/DMU | 8312027648 | Care site 2 | 'Urg_Hospit' | 'CRH' | 2021-03-01 | 204.0 | 0.497 |
Pôle/DMU | 8312056379 | Care site 2 | 'All' | 'Ordonnance' | 2018-08-01 | 22.0 | 0.274 |
Hôpital | 8312022130 | Care site 3 | 'Urg_Hospit' | 'CR Passage Urgences' | 2022-02-01 | 9746.0 | 0.769 |
Saving and loading a computed Probe
In order to ease the future loading of a Probe that has been computed with the compute()
method, one can pickle it using the save()
method. This enables a rapid loading of the Probe from local disk using the load()
method.
from edsteva.probes import NoteProbe
note = NoteProbe()
note.compute(data) # (1)
note.save() # (2)
note_2 = NoteProbe()
note_2.load() # (3)
- Computation of the Probe querying the database (long).
- Saving of the Probe on the local disk.
- Rapid loading of the Probe fom the local disk.
Defining a custom Probe
If none of the available Probes meets your requirements, you may want to create your own. To define a custom Probe class CustomProbe
that inherits from the abstract class BaseProbe
you'll have to implement the compute_process()
method (this method is natively called by the compute()
method inherited by the BaseProbe
class). You'll also have to define the _index
attribute which is the list of columns that are used to aggregate the data in the compute_process()
method.
from edsteva.probes import BaseProbe
# Definition of a new Probe class
class CustomProbe(BaseProbe):
def __init__(
self,
):
self._index = ["my_custom_column_1", "my_custom_column_2"]
super().__init__(
index=self._index,
)
def compute_process(
self,
data: Data,
**kwargs,
):
# query using Pandas API
return custom_predictor
compute_process()
can take as much as argument as you need but it must include a data
argument and must return a Pandas.DataFrame
which contains at least the columns of the standard schema of a predictor. For a detailed example of the implementation of a Probe, please have a look on the implemented Probes such as VisitProbe
or NoteProbe
.
Contributions
If you managed to create your own Probe do not hesitate to share it with the community by following the contribution guidelines. Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
Available Probes
We list hereafter the Probes that have already been implemented in the library.
The VisitProbe
computes \(c_{visit}(t)\) the availability of administrative stays:
Where \(n_{visit}(t)\) is the number of administrative stays, \(t\) is the month and \(n_{max} = \max_{t}(n_{visit}(t))\).
If the maximum number of records per month \(n_{max}\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
from edsteva.probes import VisitProbe
visit = VisitProbe()
visit.compute(
data,
stay_types={
"Urg": "urgence",
"Hospit": "hospitalisés",
"Urg_Hospit": "urgence|hospitalisés",
},
)
visit.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | date | n_visit | c |
---|---|---|---|---|---|---|
Unité Fonctionnelle (UF) | 8312056386 | Care site 1 | 'Urg' | 2019-05-01 | 233.0 | 0.841 |
Unité Fonctionnelle (UF) | 8312056386 | Care site 1 | 'Urg' | 2021-04-01 | 393.0 | 0.640 |
Pôle/DMU | 8312027648 | Care site 2 | 'Hospit' | 2011-03-01 | 204.0 | 0.497 |
Pôle/DMU | 8312027648 | Care site 2 | 'Urg' | 2018-08-01 | 22.0 | 0.274 |
Hôpital | 8312022130 | Care site 3 | 'Urg_Hospit' | 2022-02-01 | 9746.0 | 0.769 |
The NoteProbe
computes \(c_{note}(t)\) the availability of clinical documents:
The per_visit_default
algorithm computes \(c_(t)\) the availability of clinical documents linked to patients' administrative stays:
Where \(n_{visit}(t)\) is the number of administrative stays, \(n_{with\,doc}\) the number of visits having at least one document and \(t\) is the month.
If the number of visits \(n_{visit}(t)\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
from edsteva.probes import NoteProbe
note = Note(completeness_predictor="per_visit_default")
note.compute(
data,
stay_types={
"Urg": "urgence",
"Hospit": "hospitalisés",
"Urg_Hospit": "urgence|hospitalisés",
},
note_types={
"All": ".*",
"CRH": "crh",
"Ordonnance": "ordo",
"CR Passage Urgences": "urge",
},
)
note.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | note_type | date | n_visit | n_visit_with_note | c |
---|---|---|---|---|---|---|---|---|
Unité Fonctionnelle (UF) | 8312056386 | Care site 1 | 'Urg' | 'All' | 2019-05-01 | 233.0 | 196.0 | 0.841 |
Unité Fonctionnelle (UF) | 8653815660 | Care site 1 | 'Hospit' | 'CRH' | 2011-04-01 | 393.0 | 252.0 | 0.640 |
Pôle/DMU | 8312027648 | Care site 2 | 'Hospit' | 'CRH' | 2021-03-01 | 204.0 | 101.0 | 0.497 |
Pôle/DMU | 8312056379 | Care site 2 | 'Urg' | 'Ordonnance' | 2018-08-01 | 22.0 | 6.0 | 0.274 |
Hôpital | 8312022130 | Care site 3 | 'Urg_Hospit' | 'CR Passage Urgences' | 2022-02-01 | 9746.0 | 7495.0 | 0.769 |
The per_note_default
algorithm computes \(c_(t)\) the availability of clinical documents as follow:
Where \(n_{note}(t)\) is the number of clinical documents, \(t\) is the month and \(n_{max} = \max_{t}(n_{note}(t))\).
If the maximum number of recorded notes per month \(n_{max}\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
from edsteva.probes import NoteProbe
note = Note(completeness_predictor="per_note_default")
note.compute(
data,
stay_types={
"Urg": "urgence",
"Hospit": "hospitalisés",
"Urg_Hospit": "urgence|hospitalisés",
},
note_types={
"All": ".*",
"CRH": "crh",
"Ordonnance": "ordo",
"CR Passage Urgences": "urge",
},
)
note.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | note_type | date | n_note | c |
---|---|---|---|---|---|---|---|
Unité Fonctionnelle (UF) | 8312056386 | Care site 1 | 'Urg' | 'All' | 2019-05-01 | 233.0 | 0.841 |
Unité Fonctionnelle (UF) | 8653815660 | Care site 1 | 'Hospit' | 'CRH' | 2011-04-01 | 393.0 | 0.640 |
Pôle/DMU | 8312027648 | Care site 2 | 'Hospit' | 'CRH' | 2021-03-01 | 204.0 | 0.497 |
Pôle/DMU | 8312056379 | Care site 2 | 'Urg' | 'Ordonnance' | 2018-08-01 | 22.0 | 0.274 |
Hôpital | 8312022130 | Care site 3 | 'Urg_Hospit' | 'CR Passage Urgences' | 2022-02-01 | 9746.0 | 0.769 |
The ConditionProbe
computes \(c_{condition}(t)\) the availability of claim data:
The per_visit_default
algorithm computes \(c_(t)\) the availability of claim data linked to patients' administrative stays:
Where \(n_{visit}(t)\) is the number of administrative stays, \(n_{with\,condition}\) the number of stays having at least one claim code (e.g. ICD-10) recorded and \(t\) is the month.
If the number of visits \(n_{visit}(t)\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
Care site level
AREM claim data are only available at hospital level.
from edsteva.probes import ConditionProbe
condition = ConditionProbe(completeness_predictor="per_visit_default")
condition.compute(
data,
stay_types={
"Hospit": "hospitalisés",
},
diag_types={
"All": ".*",
"DP/DR": "DP|DR",
},
condition_types={
"All": ".*",
"Pulmonary_embolism": "I26",
},
source_systems=["AREM", "ORBIS"],
)
condition.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | diag_type | condition_type | source_systems | date | n_visit | n_visit_with_condition | c |
---|---|---|---|---|---|---|---|---|---|---|
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'All' | 'Pulmonary_embolism' | AREM | 2019-05-01 | 233.0 | 196.0 | 0.841 |
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'DP/DR' | 'Pulmonary_embolism' | AREM | 2021-04-01 | 393.0 | 252.0 | 0.640 |
Hôpital | 8312027648 | Care site 2 | 'Hospit' | 'All' | 'Pulmonary_embolism' | AREM | 2011-03-01 | 204.0 | 101.0 | 0.497 |
Unité Fonctionnelle (UF) | 8312027648 | Care site 2 | 'Hospit' | 'All' | 'All' | ORBIS | 2018-08-01 | 22.0 | 6.0 | 0.274 |
Pôle/DMU | 8312022130 | Care site 3 | 'Hospit' | 'DP/DR' | 'Pulmonary_embolism' | ORBIS | 2022-02-01 | 9746.0 | 7495.0 | 0.769 |
The per_condition_default
algorithm computes \(c_(t)\) the availability of claim data as follow:
Where \(n_{condition}(t)\) is the number of claim codes (e.g. ICD-10) recorded, \(t\) is the month and \(n_{max} = \max_{t}(n_{condition}(t))\).
If the maximum number of recorded diagnosis per month \(n_{max}\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
from edsteva.probes import ConditionProbe
condition = ConditionProbe(completeness_predictor="per_condition_default")
condition.compute(
data,
stay_types={
"All": ".*",
"Hospit": "hospitalisés",
},
diag_types={
"All": ".*",
"DP/DR": "DP|DR",
},
condition_types={
"All": ".*",
"Pulmonary_embolism": "I26",
},
source_systems=["AREM", "ORBIS"],
)
condition.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | diag_type | condition_type | source_systems | date | n_condition | c |
---|---|---|---|---|---|---|---|---|---|
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'All' | 'Pulmonary_embolism' | AREM | 2019-05-01 | 233.0 | 0.841 |
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'DP/DR' | 'Pulmonary_embolism' | AREM | 2021-04-01 | 393.0 | 0.640 |
Hôpital | 8312027648 | Care site 2 | 'Hospit' | 'All' | 'Pulmonary_embolism' | AREM | 2011-03-01 | 204.0 | 0.497 |
Unité Fonctionnelle (UF) | 8312027648 | Care site 2 | 'Hospit' | 'All' | 'All' | ORBIS | 2018-08-01 | 22.0 | 0.274 |
Pôle/DMU | 8312022130 | Care site 3 | 'Hospit' | 'DP/DR' | 'Pulmonary_embolism' | ORBIS | 2022-02-01 | 9746.0 | 0.769 |
The BiologyProbe
computes \(c_(t)\) the availability of laboratory data:
The per_visit_default
algorithm computes \(c_(t)\) the availability of laboratory data linked to patients' administrative stays:
Where \(n_{visit}(t)\) is the number of administrative stays, \(n_{with\,biology}\) the number of stays having at least one biological measurement recorded and \(t\) is the month.
If the number of visits \(n_{visit}(t)\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
Care site level
Laboratory data are only available at hospital level.
from edsteva.probes import BiologyProbe
biology = BiologyProbe(completeness_predictor="per_visit_default")
biology.compute(
data,
stay_types={
"Hospit": "hospitalisés",
},
concepts_sets={
"Créatinine": "E3180|G1974|J1002|A7813|A0094|G1975|J1172|G7834|F9409|F9410|C0697|H4038|F2621",
"Leucocytes": "A0174|K3232|H6740|E4358|C9784|C8824|E6953",
},
)
biology.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | concepts_sets | date | n_visit | n_visit_with_measurement | c |
---|---|---|---|---|---|---|---|---|
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'Créatinine' | 2019-05-01 | 233.0 | 196.0 | 0.841 |
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'Leucocytes' | 2021-04-01 | 393.0 | 252.0 | 0.640 |
Hôpital | 8312027648 | Care site 2 | 'Hospit' | 'Créatinine' | 2011-03-01 | 204.0 | 101.0 | 0.497 |
Hôpital | 8312027648 | Care site 2 | 'Hospit' | 'Leucocytes' | 2018-08-01 | 22.0 | 6.0 | 0.274 |
Hôpital | 8312022130 | Care site 3 | 'Hospit' | 'Leucocytes' | 2022-02-01 | 9746.0 | 7495.0 | 0.769 |
The per_measurement_default
algorithm computes \(c_(t)\) the availability of biological measurements:
Where \(n_{biology}(t)\) is the number of biological measurements, \(t\) is the month and \(n_{max} = \max_{t}(n_{biology}(t))\).
If the maximum number of recorded biological measurements per month \(n_{max}\) is equal to 0, we consider that the completeness predictor \(c(t)\) is also equal to 0.
Care site level
Laboratory data are only available at hospital level.
from edsteva.probes import BiologyProbe
biology = BiologyProbe(completeness_predictor="per_measurement_default")
biology.compute(
data,
stay_types={
"Hospit": "hospitalisés",
},
concepts_sets={
"Créatinine": "E3180|G1974|J1002|A7813|A0094|G1975|J1172|G7834|F9409|F9410|C0697|H4038|F2621",
"Leucocytes": "A0174|K3232|H6740|E4358|C9784|C8824|E6953",
},
)
biology.predictor.head()
care_site_level | care_site_id | care_site_short_name | stay_type | concepts_sets | date | n_measurement | c |
---|---|---|---|---|---|---|---|
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'Créatinine' | 2019-05-01 | 233.0 | 0.841 |
Hôpital | 8312057527 | Care site 1 | 'Hospit' | 'Leucocytes' | 2021-04-01 | 393.0 | 0.640 |
Hôpital | 8312027648 | Care site 2 | 'Hospit' | 'Créatinine' | 2011-03-01 | 204.0 | 0.497 |
Unité Fonctionnelle (UF) | 8312027648 | Care site 2 | 'Hospit' | 'Leucocytes' | 2018-08-01 | 22.0 | 0.274 |
Pôle/DMU | 8312022130 | Care site 3 | 'Hospit' | 'Leucocytes' | 2022-02-01 | 9746.0 | 0.769 |