Skip to content

eds_scikit.structures.attributes

ATTRIBUTE_REGEX_PATTERNS module-attribute

ATTRIBUTE_REGEX_PATTERNS = [{'attribute': 'IS_EMERGENCY', 'pattern': '\\bURG|\\bSAU\\b|\\bUHCD\\b|\\bZHTCD\\b', 'true_examples': ['URG', 'URGENCES', 'SAU'], 'false_examples': ['CHIRURGIE']}, {'attribute': 'IS_ICU', 'pattern': '\\bUSI|\\bREA[N\\s]|\\bREA\\b|\\bUSC\\b|SOINS.*INTENSIF|SURV.{0,15}CONT|\\bSI\\b|\\bSC\\b', 'true_examples': ['REA', 'REA NEURO', 'REANIMATION'], 'false_examples': ['CARREAU']}]

Default argument of 🇵🇾func:~eds_scikit.structures.attributes.add_care_site_attributes.

:meta private:

Examples:

::

ATTRIBUTE_REGEX_PATTERNS = [
    {
        # required elements: name of attribute and pattern of regular expression
        "attribute": "IS_EMERGENCY",
        "pattern": r"URG|SAU|UHCD|ZHTCD",

        # optional elements: list of test strings to validate the regular expression
        "true_examples": ["URG", "URGENCES", "SAU"],
        "false_examples": ["CHIRURGIE"],
    },
    ...
]

add_care_site_attributes

add_care_site_attributes(care_site: DataFrame, only_attributes: Optional[List[str]] = None, attribute_regex_patterns: Optional[List[str]] = None) -> DataFrame

Add boolean attributes as columns to care_site dataframe.

This algo applies simple regular expressions to the care_site_name in order to compute boolean attributes of the care site. Implemented attributes are:

  • IS_EMERGENCY
  • IS_ICU

In order to make the detection of attributes more robust, the column care_site_name is first transformed to a DESCRIPTION. This is done by 🇵🇾func:~eds_scikit.structures.description.add_care_site_description.

PARAMETER DESCRIPTION
care_site

TYPE: DataFrame

only_attributes

if only a subset of all possible attributes should be computed

TYPE: list of str DEFAULT: None

attribute_regex_patterns

If None, the default value is 🇵🇾data:~eds_scikit.structures.attributes.ATTRIBUTE_REGEX_PATTERNS

TYPE: list(None) DEFAULT: None

RETURNS DESCRIPTION
care_site

same as input with additional columns corresponding to boolean attributes. the column DESCRIPTION is also added : it contains of cleaner version of care_site_name.

TYPE: DataFrame

Examples:

>>> care_site.head(2)
care_site_id, care_site_name
21, HOSP ACCUEIL URG PED (UF)
22, HOSP CHIRURGIE DIGESTIVE
23, HOSP PEDIATRIE GEN ET SAU
>>> care_site = add_care_site_attributes(care_site, only_attributes=["IS_EMERGENCY"])
>>> care_site.head(2)
care_site_id, care_site_name, DESCRIPTION, IS_EMERGENCY
21, HOSP ACCUEIL URG PED (UF),ACCUEIL URG PED,True
22, HOSP CHIRURGIE DIGESTIVE,CHIRURGIE DIGESTIVE,False
23, HOSP PEDIATRIE GEN ET SAU,PEDIATRIE GEN ET SAU,True

Specifying custom regular expressions. It is a good idea to provide true and false examples for each attribute. These examples will be tested against the provided regular expressions.

>>> my_attributes = [
    {
        "attribute": "IS_EMERGENCY",
        "pattern": r"URG|SAU|UHCD|ZHTCD",
        "true_examples": ["URG", "URGENCES", "SAU"],
        "false_examples": ["CHIRURGIE"],
    },
    {
        "attribute": "IS_ICU",
        "pattern": r"REA|REANI",
        "true_examples": ["REA", "REA NEURO", "REANIMATION"],
        "false_examples": ["CARREAU"],
    },
]
>>> care_site = add_care_site_attributes(care_site, attribute_regex_patterns=my_attributes)
Source code in eds_scikit/structures/attributes.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def add_care_site_attributes(
    care_site: DataFrame,
    only_attributes: Optional[List[str]] = None,
    attribute_regex_patterns: Optional[List[str]] = None,
) -> DataFrame:
    """Add boolean attributes as columns to care_site dataframe.

    This algo applies simple regular expressions to the ``care_site_name``
    in order to compute boolean attributes of the care site.
    Implemented attributes are:

    - ``IS_EMERGENCY``
    - ``IS_ICU``

    In order to make the detection of attributes more robust, the
    column ``care_site_name`` is first transformed to a ``DESCRIPTION``.
    This is done by :py:func:`~eds_scikit.structures.description.add_care_site_description`.

    Parameters
    ----------
    care_site : DataFrame
    only_attributes : list of str
        if only a subset of all possible attributes should be computed
    attribute_regex_patterns : list (None)
        If ``None``, the default value is :py:data:`~eds_scikit.structures.attributes.ATTRIBUTE_REGEX_PATTERNS`

    Returns
    -------
    care_site: DataFrame
        same as input with additional columns corresponding to boolean attributes.
        the column ``DESCRIPTION`` is also added : it contains of cleaner version of ``care_site_name``.


    Examples
    --------
    >>> care_site.head(2)
    care_site_id, care_site_name
    21, HOSP ACCUEIL URG PED (UF)
    22, HOSP CHIRURGIE DIGESTIVE
    23, HOSP PEDIATRIE GEN ET SAU
    >>> care_site = add_care_site_attributes(care_site, only_attributes=["IS_EMERGENCY"])
    >>> care_site.head(2)
    care_site_id, care_site_name, DESCRIPTION, IS_EMERGENCY
    21, HOSP ACCUEIL URG PED (UF),ACCUEIL URG PED,True
    22, HOSP CHIRURGIE DIGESTIVE,CHIRURGIE DIGESTIVE,False
    23, HOSP PEDIATRIE GEN ET SAU,PEDIATRIE GEN ET SAU,True

    Specifying custom regular expressions. It is a good idea to provide true and false examples for
    each attribute. These examples will be tested against the provided regular expressions.

    >>> my_attributes = [
        {
            "attribute": "IS_EMERGENCY",
            "pattern": r"\bURG|\bSAU\b|\bUHCD\b|\bZHTCD\b",
            "true_examples": ["URG", "URGENCES", "SAU"],
            "false_examples": ["CHIRURGIE"],
        },
        {
            "attribute": "IS_ICU",
            "pattern": r"\bREA\b|\bREANI",
            "true_examples": ["REA", "REA NEURO", "REANIMATION"],
            "false_examples": ["CARREAU"],
        },
    ]
    >>> care_site = add_care_site_attributes(care_site, attribute_regex_patterns=my_attributes)

    """
    # validate arguments
    if attribute_regex_patterns is None:
        attribute_regex_patterns = ATTRIBUTE_REGEX_PATTERNS

    if only_attributes:
        impossible = set(only_attributes) - set(possible_concepts)
        if impossible:
            raise ValueError(f"Unknown concepts: {impossible}")
        attribute_regex_patterns = [
            item
            for item in attribute_regex_patterns
            if item["attribute"] in only_attributes
        ]

    validate_attribute_regex_patterns(attribute_regex_patterns)

    if "DESCRIPTION" not in care_site.columns:
        care_site = description.add_care_site_description(care_site)

    # apply algo
    for item in attribute_regex_patterns:
        new_column = {
            item["attribute"]: care_site["DESCRIPTION"].str.contains(
                item["pattern"], regex=True
            )
        }
        care_site = care_site.assign(**new_column)

    if only_attributes:
        care_site = care_site.drop(["DESCRIPTION"], axis="columns")

    return care_site

get_parent_attributes

get_parent_attributes(care_site: DataFrame, only_attributes: Optional[List[str]] = None, version: Optional[str] = None, parent_type: str = 'Unité Fonctionnelle (UF)') -> DataFrame

Get all known attributes from parent care sites and propagates them to each child care site

PARAMETER DESCRIPTION
care_site

required columns: ["care_site_id", "care_site_type_source_value", "care_site_name"]

TYPE: DataFrame

only_attributes

same as 🇵🇾func:~eds_scikit.structures.attributes.add_care_site_attributes

TYPE: list of str DEFAULT: None

version

Optional version string for the care site hierarchy

TYPE: Optional[str] DEFAULT: None

parent_type

Type of care site to consider as parent, by default "Unité Fonctionnelle (UF)". Corresponds to the "care_site_type_source_value" column

TYPE: str DEFAULT: 'Unité Fonctionnelle (UF)'

RETURNS DESCRIPTION
care_site_attributes

same index as input care_site. columns: care_site, is_emergency

TYPE: DataFrame

Warnings

This algo requires that the care_site dataframe contains the parent care sites as well as the care sites that you want to tag.

Examples:

>>> attributes = get_parent_attributes(care_site,
                                       only_attributes=["IS_EMERGENCY"],
                                       parent_type="Unité Fonctionnelle (UF)")
>>> attributes.head()
    care_site_id, care_site_name, care_site_type_source_value, IS_EMERGENCY
    92829  , ... ,     False
    29820  , ... ,     True
Source code in eds_scikit/structures/attributes.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def get_parent_attributes(
    care_site: DataFrame,
    only_attributes: Optional[List[str]] = None,
    version: Optional[str] = None,
    parent_type: str = "Unité Fonctionnelle (UF)",
) -> DataFrame:
    """Get all known attributes from parent care sites and propagates them to each child care site

    Parameters
    ----------
    care_site: DataFrame
        required columns: ``["care_site_id", "care_site_type_source_value", "care_site_name"]``
    only_attributes : list of str
        same as :py:func:`~eds_scikit.structures.attributes.add_care_site_attributes`
    version: Optional[str]
        Optional version string for the care site hierarchy
    parent_type: str
        Type of care site to consider as parent, by default "Unité Fonctionnelle (UF)".
        Corresponds to the `"care_site_type_source_value"` column

    Returns
    --------
    care_site_attributes: DataFrame
        same index as input care_site. columns: care_site, is_emergency


    Warnings
    --------
    This algo requires that the `care_site` dataframe contains
    the parent care sites as well as the care sites
    that you want to tag.

    Examples
    --------
    >>> attributes = get_parent_attributes(care_site,
                                           only_attributes=["IS_EMERGENCY"],
                                           parent_type="Unité Fonctionnelle (UF)")
    >>> attributes.head()
        care_site_id, care_site_name, care_site_type_source_value, IS_EMERGENCY
        92829  , ... ,     False
        29820  , ... ,     True

    """

    function_name = "get_care_site_hierarchy"
    if version is not None:
        function_name += f".{version}"
    hierarchy = registry.get("data", function_name=function_name)()

    fw = framework.get_framework(care_site)
    hierarchy = framework.to(fw, hierarchy)

    # STEP 1: get attributes of parent
    parent_attributes = care_site.loc[
        care_site["care_site_type_source_value"] == parent_type,
        ["care_site_id", "care_site_name"],
    ]
    parent_attributes = add_care_site_attributes(
        parent_attributes, only_attributes=only_attributes
    )
    boolean_columns = [
        col for (col, dtype) in parent_attributes.dtypes.iteritems() if dtype == "bool"
    ]

    parent_attributes = parent_attributes.drop(
        ["care_site_name"], axis="columns"
    ).rename(columns={"care_site_id": "parent_id"})

    # STEP 2: propagate attributes from parent to all children
    hierarchy = hierarchy.loc[:, ["care_site_id", parent_type]].rename(
        columns={parent_type: "parent_id"}
    )
    children_attributes = hierarchy.merge(
        parent_attributes, how="left", on="parent_id"
    ).drop(["parent_id"], axis="columns")

    # STEP 3 : merge to input dataframe
    old_columns = care_site.columns
    care_site = care_site.merge(children_attributes, how="left", on="care_site_id")
    for col in care_site.columns:
        if col in boolean_columns and col not in old_columns:
            care_site[col] = care_site[col].fillna(value=False)

    return care_site

    # NOTE: this is how to return a single column that contains
    # EXACTLY the same index as the input dataframe.
    # For instance koalas requires the index name to be the same
    # for this operation to be valid:
    # >>> df["new_column"] = compute_column(df)

    # attributes = (
    #     care_site.loc[:, ["care_site_id"]]
    #     .reset_index()
    #     .merge(
    #         # drop_duplicates to ensure we keep same size as input
    #         children_attributes.drop_duplicates(subset=["care_site_id"]),
    #         how="left",
    #         on="care_site_id",
    #     )
    #     .fillna(value=False)
    #     # a merge "forgets" the index, we want to output the same as input
    #     .set_index("index")
    # )
    # attributes.index.name = care_site.index.name
Back to top