Skip to content

Peptic ulcer disease

The eds.peptic_ulcer_disease pipeline component extracts mentions of peptic ulcer disease.

Details of the used patterns
# fmt: off
main_pattern = dict(
    source="main",
    regex=[
        r"ulcere.{1,10}gastr",
        r"ulcere.{1,10}duoden",
        r"ulcere.{1,10}antra",
        r"ulcere.{1,10}pept",
        r"ulcere.{1,10}estomac",
        r"ulcere.{1,10}curling",
        r"ulcere.{1,10}bulb",
        r"(œ|oe)sophagites.{1,5}pepti.{1,10}ulcer",
        r"gastrite.{1,20}ulcer",
        r"antrite.{1,5}ulcer",
    ],
    regex_attr="NORM",
)

acronym = dict(
    source="acronym",
    regex=[
        r"\bUGD\b",
    ],
    regex_attr="TEXT",
)

generic = dict(
    source="generic",
    regex=r"ulcere",
    regex_attr="NORM",
    assign=dict(
        name="is_peptic",
        regex=r"\b(gastr|digest)",
        window=(-20, 20),
        limit_to_sentence=False,
    ),
)

default_patterns = [
    main_pattern,
    acronym,
    generic,
]
# fmt: on

Extensions

On each span span that matches, the following attributes are available:

  • span._.detailed_status: set to None

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.peptic_ulcer_disease())

Below are a few examples:

text = "Beaucoup d'ulcères gastriques"
doc = nlp(text)
spans = doc.spans["peptic_ulcer_disease"]

spans
# Out: [ulcères gastriques]
text = "Présence d'UGD"
doc = nlp(text)
spans = doc.spans["peptic_ulcer_disease"]

spans
# Out: [UGD]
text = "La patient à des ulcères"
doc = nlp(text)
spans = doc.spans["peptic_ulcer_disease"]

spans
# Out: []
text = "Au niveau gastrique: blabla blabla blabla blabla blabla quelques ulcères"
doc = nlp(text)
spans = doc.spans["peptic_ulcer_disease"]

spans
# Out: [ulcères]

span = spans[0]

span._.assigned
# Out: {'is_peptic': [gastrique]}

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object

TYPE: Optional[PipelineProtocol]

name

The name of the component

TYPE: Optional[str]

patterns

The patterns to use for matching

DEFAULT: [{'source': 'main', 'regex': ['ulcere.{1,10}gas...

label

The label to use for the Span object and the extension

TYPE: str DEFAULT: peptic_ulcer_disease

span_setter

How to set matches on the doc

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'peptic_ulcer_disease': True}

Authors and citation

The eds.peptic_ulcer_disease component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.


  1. Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069