Skip to content

Diabetes

The eds.diabetes pipeline component extracts mentions of diabetes.

Details of the used patterns
# fmt: off
COMPLICATIONS = [
    r"nephropat",
    r"neuropat",
    r"retinopat",
    r"glomerulopathi",
    r"glomeruloscleros",
    r"angiopathi",
    r"origine",
]

main_pattern = dict(
    source="main",
    regex=[
        r"\bds?n?id\b",
        r"\bdiabet[^o]",
        r"\bdb\b",
        r"\bdt.?(i|ii|1|2)\b",
    ],
    exclude=dict(
        regex=[
            "insipide",
            "nephrogenique",
            "aigu",
            r"\bdr\b",  # Dr. ...
            "endocrino",  # Section title
            "soins aux pieds",  # Section title
            "nutrition",  # Section title
            r"\s?:\n+\W+(?!oui|non|\W)",  # General pattern for section title
        ],
        window=(-5, 5),
    ),
    regex_attr="NORM",
    assign=[
        dict(
            name="complicated_before",
            regex=r"(" + r"|".join(COMPLICATIONS + ["origine"]) + r")",
            window=-3,
        ),
        dict(
            name="complicated_after",
            regex=r"("
            + r"|".join([r"(?<!sans )compli", r"(?<!a)symptomatique"] + COMPLICATIONS)
            + r")",
            window=12,
        ),
        dict(
            name="type",
            regex=r"type.(i|ii|1|2)",
            window=6,
        ),
        dict(
            name="insulin",
            regex=r"insulino.?(dep|req)",
            window=6,
        ),
        dict(
            name="corticoid",
            regex=r"(\bctc\b|cortico(?:.?induit)?)",
            window=6,
        ),
    ],
)

complicated_pattern = dict(
    source="complicated",
    regex=[
        r"(mal|maux).perforants?(.plantaire)?",
        r"pieds? diabeti",
    ],
    exclude=dict(
        regex="soins aux",  # Section title
        window=-2,
    ),
    regex_attr="NORM",
)

default_patterns = [
    main_pattern,
    complicated_pattern,
]
# fmt: on

Extensions

On each span span that match, the following attributes are available:

  • span._.detailed_status: set to either
    • "WITH_COMPLICATION" if the diabetes is complicated (e.g., via organ damages)
    • "WITHOUT_COMPLICATION" otherwise
  • span._.assigned: dictionary with the following keys, if relevant:
    • type: type of diabetes (I or II)
    • insulin: if the diabetes is insulin-dependent
    • corticoid: if the diabetes is corticoid-induced

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.diabetes())

Below are a few examples:

text = "Présence d'un DT2"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: [DT2]
text = "Présence d'un DNID"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: [DNID]
text = "Patient diabétique"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: [diabétique]
text = "Un diabète insipide"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: []
text = "Atteinte neurologique d'origine diabétique"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: [origine diabétique]

span = spans[0]

span._.detailed_status
# Out: WITH_COMPLICATION

span._.assigned
# Out: {'complicated_before': [origine]}
text = "Une rétinopathie diabétique"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: [rétinopathie diabétique]

span = spans[0]

span._.detailed_status
# Out: WITH_COMPLICATION

span._.assigned
# Out: {'complicated_before': [rétinopathie]}
text = "Il y a un mal perforant plantaire"
doc = nlp(text)
spans = doc.spans["diabetes"]

spans
# Out: [mal perforant plantaire]

span = spans[0]

span._.detailed_status
# Out: WITH_COMPLICATION

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline

TYPE: Optional[PipelineProtocol]

name

The name of the component

TYPE: Optional[str]

patterns

The patterns to use for matching

DEFAULT: [{'source': 'main', 'regex': ['\\bds?n?id\\b', ...

label

The label to use for the Span object and the extension

TYPE: str DEFAULT: diabetes

span_setter

The span setter to use

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'diabetes': True}

Authors and citation

The eds.diabetes component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.


  1. Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069