Custom Teva
OMOP-Teva module can also be applied to any dataframe. User must use reduce_table
and visualize_table
from eds_scikit.plot.table_viz
.
Make sure to specify categorical columns with less then 50 values.
Use the function eds_scikit.plot.table_viz.map_column
to reduce columns volumetry.
Creating synthetic dataset
import numpy as np
import pandas as pd
data = pd.DataFrame(
{
"id": str(np.arange(1, 1001)),
"category_1": np.random.choice(["A", "B", "C"], size=1000, p=[0.4, 0.3, 0.3]),
"category_2": np.array([str(i) for i in range(500)] * 2),
"location": np.random.choice(
["location 1", "location 2"], size=1000, p=[0.6, 0.4]
),
"date": pd.to_datetime(
np.random.choice(
pd.date_range(start="2021-01-01", end="2022-01-01"), size=1000
)
),
}
)
from eds_scikit.plot import reduce_table, visualize_table
data_reduced = reduce_table(
data,
category_columns=["location", "category_1", "category_2"],
date_column="date",
start_date="2021-01-01",
end_date="2021-12-01",
mapper={"category_2": {"even": r"[02468]$", "odd": r"[13579]$"}},
)
chart = visualize_table(
data_reduced, title="synthetic dataframe table", description=True
)