Skip to content

COPD[source]

The eds.copd pipeline component extracts mentions of COPD (Chronic obstructive pulmonary disease). It will notably match:

  • Mentions of various diseases (see below)
  • Pulmonary hypertension
  • Long-term oxygen therapy
Details of the used patterns
# fmt: off
main_pattern = dict(
    source="main",
    regex=[
        r"alveolites.{1,5}fibrosante",
        r"asthm",
        r"broncho.?pneumopathies.{1,5}chroniques.{1,5}obstru",
        r"bronchites.{1,5}chroniques.{1,5}obstru",
        r"fibro.{1,20}(poumon|pulmo|pleur)",
        r"fibrose.{1,5}interstitielle.{1,5}diffuse.{1,5}idiopathique",
        r"fibrose.{1,5}intersti",
        r"obstruction.{1,5}chronique.{1,10}voie.{1,5}aerienne",
        r"pneumoconiose",
        r"pneumo(nie|pathie).{0,15}(intersti|radiq|infiltr|fibro|organis)",
        r"poumon.{1,5}noir",
        r"sclerose.{1,5}pulmo",
        r"fibro.?elastose.{1,5}pleuro.?paren",
        r"apnee.{1,25}sommeil",
        r"emphyseme",
        r"insuffisan.{1,5}respiratoire.{1,5}chron",
        r"mucoviscidose",
        r"bronchiolite.oblilerante.{1,10}pneumo.{1,20}organis",
    ],
    regex_attr="NORM",
)

htap = dict(
    source="htap",
    regex=[
        r"\bhtap\b",
        r"hypertension.{0,10}pulmo",
        r"hypertension.{1,5}arter.{1,15}(poumon|pulmo)",
    ],
    regex_attr="NORM",
    exclude=[
        dict(
            regex="minime",
            window=(0, 3),
        ),
    ],
)

oxygen = dict(
    source="oxygen",
    regex=[
        r"oxygeno.?dependance",
        r"oxygeno.?requeran",
        r"oxygenation",
        r"oxygeno.?ther",
        r"oxygene",
    ],
    regex_attr="NORM",
    assign=[
        dict(
            name="long",
            regex=r"(long.{1,10}(?:cour|dure)|chroni|domicil)",
            window=6,
        ),
        dict(
            name="long_bis",
            regex=r"(persist|major|minor)",
            window=-6,
        ),
        dict(
            name="need",
            regex=r"(besoin)",
            window=(-6, 6),
        ),
    ],
)

acronym = dict(
    source="acronym",
    regex=[
        r"\bBPCO\b",
        r"\bFPI\b",
        r"\bOLD\b",
        r"\bFEPP\b",
        r"\bPINS\b",
        r"\bPID\b",
        r"\bSAOS\b",
        r"\bSAS\b",
        r"\bSAHOS\b",
        r"\bBOOP\b",
    ],
    regex_attr="TEXT",
)

fid = dict(
    source="fid",
    regex=[
        r"\bfid\b",
    ],
    regex_attr="NORM",
    exclude=[
        dict(
            regex=[
                r"\bfig\b",
                r"palpation",
            ],
            window=(-7, 7),
        ),
    ],
)

default_patterns = [
    main_pattern,
    htap,
    oxygen,
    acronym,
    fid,
]
# fmt: on

Extensions

On each span span that match, the following attributes are available:

  • span._.detailed_status: set to None

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.copd())

Below are a few examples:

text = "Une fibrose interstitielle diffuse idiopathique"
doc = nlp(text)
spans = doc.spans["copd"]

spans
# Out: [fibrose interstitielle diffuse idiopathique]
text = "Patient atteint de pneumoconiose"
doc = nlp(text)
spans = doc.spans["copd"]

spans
# Out: [pneumoconiose]
text = "Présence d'une HTAP."
doc = nlp(text)
spans = doc.spans["copd"]

spans
# Out: [HTAP]
text = "On voit une hypertension pulmonaire minime"
doc = nlp(text)
spans = doc.spans["copd"]

spans
# Out: []
text = "La patiente a été mis sous oxygénorequérance"
doc = nlp(text)
spans = doc.spans["copd"]

spans
# Out: []
text = "La patiente est sous oxygénorequérance au long cours"
doc = nlp(text)
spans = doc.spans["copd"]

spans
# Out: [oxygénorequérance au long cours]

span = spans[0]

span._.assigned
# Out: {'long': [long cours]}

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline

TYPE: Optional[PipelineProtocol]

name

The name of the component

TYPE: Optional[str] DEFAULT: 'copd'

patterns

The patterns to use for matching

TYPE: Union[Dict[str, Any], List[Dict[str, Any]]] DEFAULT: [{'source': 'main', 'regex': ['alveolites.{1,5}...

label

The label to use for the Span object and the extension

TYPE: str DEFAULT: copd

span_setter

How to set matches on the doc

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'copd': True}

Authors and citation

The eds.copd component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.


  1. Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069