Skip to content

Detailed use

This tutorial demonstrates the workflow to prepare the measurement table.

Big volume

Measurement table can be large. Do not forget to set proper spark config before loading data.

Mapping measurement table to ANABIO codes

Defining Concept-Set

Here we work with the Glucose pre-defined concept set. See quick-use for an example on how to create a custom concept set.

from eds_scikit.biology import prepare_measurement_table, ConceptsSet

glucose_blood = ConceptsSet("Glucose_Blood")

Preparing measurement table

First, we prepare measurements with convert_units = False (as we do not yet know which units are contained in the table).

from eds_scikit.biology import measurement_values_summary

measurement = prepare_measurement_table(
    data,
    start_date="2022-01-01",
    end_date="2022-05-01",
    concept_sets=[glucose_blood],
    convert_units=False,
    get_all_terminologies=False,
)

Statistical summary

A statistical summary by codes allows us to gain insight into value distributions and detect possible heterogeneous units.

from eds_scikit.biology import measurement_values_summary

stats_summary = measurement_values_summary(
    measurement,
    category_cols=["concept_set", "GLIMS_ANABIO_concept_code"],
    value_column="value_as_number",
    unit_column="unit_source_value",
)

stats_summary
concept_set ANABIO_concept_code no_units unit_source_value range_low_anomaly_count range_high_anomaly_count measurement_count value_as_number_count value_as_number_mean value_as_number_std value_as_number_min value_as_number_25% value_as_number_50% value_as_number_75% value_as_number_max
Glucose_Blood XXXXX 100 mmol/l 15 5 1000 1000 5 2 0 2 5 8 9
Glucose_Blood YYYYY 50 mg/ml 20 10 5000 5000 25 10 0 20 25 37 45
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Units correction

To map all units to a common unit base we can use add_conversion and add_target_unit from ConceptSet class.

glucose_blood.add_conversion("mol", "g", 180)
glucose_blood.add_target_unit("mmol/l")

We can check the new summary table after units conversion.

stats_summary = measurement_values_summary(
    measurement,
    category_cols=["concept_set", "GLIMS_ANABIO_concept_code"],
    value_column="value_as_number_normalized",
    unit_column="unit_source_value_normalized",
)

stats_summary
concept_set ANABIO_concept_code no_units unit_source_value range_low_anomaly_count range_high_anomaly_count measurement_count value_as_number_count value_as_number_mean value_as_number_std value_as_number_min value_as_number_25% value_as_number_50% value_as_number_75% value_as_number_max
Glucose_Blood XXXXX 100 mmol/l 15 5 1000 1000 5 2 0 2 5 8 9
Glucose_Blood YYYYY 50 mmol/l 20 10 5000 5000 5 2 0 4 5 7 9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Plot summary

Once all units are homogeneous, we can generate more detailed dashboard for biology investigation.

from eds_scikit.biology import plot_biology_summary

measurement = measurement.merge(
    data.visit_occurrence[["care_site_id", "visit_occurrence_id"]],
    on="visit_occurrence_id",
)
measurement = measurement.merge(
    data.care_site[["care_site_id", "care_site_short_name"]], on="care_site_id"
)

plot_biology_summary(measurement, value_column="value_as_number_normalized")

Volumetry dashboard Distribution dashboard

Back to top