Skip to content

eds_scikit.biology.utils.prepare_measurement

prepare_measurement_table

prepare_measurement_table(data: Data, start_date: datetime = None, end_date: datetime = None, concept_sets: List[ConceptsSet] = None, get_all_terminologies = True, convert_units = False, compute_table = False) -> DataFrame

Returns filtered measurement table based on validity, date and concept_sets.

The output format is identical to data.measurement but adding following columns : - range_high_anomaly, range_low_anomaly - {terminology}_code based on concept_sets terminologies - concept_sets - normalized_units and normalized_values if convert_units==True

PARAMETER DESCRIPTION
data

Instantiated HiveData, PostgresData or PandasData

TYPE: Data

start_date

EXAMPLE: "2019-05-01"

TYPE: datetime, optional DEFAULT: None

end_date

EXAMPLE: "2022-05-01"

TYPE: datetime, optional DEFAULT: None

concept_sets

List of concepts-sets to select

TYPE: List[ConceptsSet], optional DEFAULT: None

get_all_terminologies

If True, all terminologies from settings terminologies will be added, by default True

TYPE: bool, optional DEFAULT: True

convert_units

If True, convert units based on ConceptsSets Units object. Eager execution., by default False

TYPE: bool, optional DEFAULT: False

compute_table

If True, compute table then cache it. Useful to prevent spark issues, especially when running in notebooks.

TYPE: bool, optional DEFAULT: False

RETURNS DESCRIPTION
DataFrame

Preprocessed measurement dataframe

Source code in eds_scikit/biology/utils/prepare_measurement.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
def prepare_measurement_table(
    data: Data,
    start_date: datetime = None,
    end_date: datetime = None,
    concept_sets: List[ConceptsSet] = None,
    get_all_terminologies=True,
    convert_units=False,
    compute_table=False,
) -> DataFrame:
    """Returns filtered measurement table based on validity, date and concept_sets.

    The output format is identical to data.measurement but adding following columns :
    - range_high_anomaly, range_low_anomaly
    - {terminology}_code based on concept_sets terminologies
    - concept_sets
    - normalized_units and normalized_values if convert_units==True

    Parameters
    ----------
    data : Data
        Instantiated [``HiveData``][eds_scikit.io.hive.HiveData], [``PostgresData``][eds_scikit.io.postgres.PostgresData] or [``PandasData``][eds_scikit.io.files.PandasData]
    start_date : datetime, optional
        **EXAMPLE**: `"2019-05-01"`
    end_date : datetime, optional
        **EXAMPLE**: `"2022-05-01"`
    concept_sets : List[ConceptsSet], optional
        List of concepts-sets to select
    get_all_terminologies : bool, optional
        If True, all terminologies from settings terminologies will be added, by default True
    convert_units : bool, optional
        If True, convert units based on ConceptsSets Units object. Eager execution., by default False
    compute_table : bool, optional
        If True, compute table then cache it. Useful to prevent spark issues, especially when running in notebooks.

    Returns
    -------
    DataFrame
        Preprocessed measurement dataframe
    """

    measurement, _, _ = check_data_and_select_columns_measurement(data)

    # measurement preprocessing
    measurement = filter_measurement_valid(measurement)
    measurement = filter_measurement_by_date(measurement, start_date, end_date)
    measurement = normalize_unit(measurement)
    measurement = tag_measurement_anomaly(measurement)

    # measurement codes mapping
    biology_relationship_table = prepare_biology_relationship_table(
        data, concept_sets, get_all_terminologies
    )
    measurement = measurement.merge(
        biology_relationship_table,
        left_on="measurement_source_concept_id",
        right_on=f"{mapping[0][0]}_concept_id",
    )

    if convert_units:
        logger.info(
            "Lazy preparation not available if convert_units=True. Table will be computed then cached."
        )
        measurement = convert_measurement_units(measurement, concept_sets)

    measurement = cache(measurement)
    if compute_table or convert_units:
        measurement.shape

    if is_koalas(measurement):
        logger.info("Done. Once computed, measurement will be cached.")

    return measurement
Back to top