Skip to content

Presentation

This dataset is the default configuration used in the bioclean() function. Each row corresponds to a given biological concept and a given unit and the columns contain various informations.

Configuration

This default configuration is based on statistical summaries of AP-HP's biological measurements.

It can be generated from the create_config_from_stats function.

To list all available configurations, use list_all_configs().

Structure and usage

Internally, the dataset is returned by calling the function get_biology_config():

from eds_scikit.resources import registry
df = registry.get("data", function_name="get_biology_config")()

Use your own data.

The simplest way to generate your own configuration file is to use the create_config_from_stats function. Simply provide a name via the config_name parameter:

from eds_scikit.biology.utils.config import create_config_from_stats
...
create_config_from_stats(..., config_name="my_custom_config", ...)

You can now provide this config_name to every function that accepts it, especially the bioclean() function.