Skip to content

Connective tissue disease

The eds.connective_tissue_disease pipeline component extracts mentions of connective tissue diseases.

Details of the used patterns
# fmt: off
TO_EXCLUDE = r"(?<!a )((\bacc\b)|anti.?coag|anti.?corps|buschke|(\bac\b)|(\bbio))"

main_pattern = dict(
    source="main",
    regex=[
        r"arthrites.{1,5}juveniles.{1,5}idiopa",
        r"myosite",
        r"myopathie.{1,5}inflammatoire",
        r"polyarthrite.{1,5}chronique.{1,5}evol",
        r"polymyosie",
        r"polyarthrites.{1,5}(rhizo|rhuma)",
        r"sclerodermie",
        r"connectivite",
        r"sarcoidose",
    ],
    exclude=dict(
        regex=[TO_EXCLUDE],
        window=(-7, 7),
    ),
    regex_attr="NORM",
)

lupus = dict(
    source="lupus",
    regex=[
        r"\blupus",
    ],
    regex_attr="NORM",
)

lupique = dict(
    source="lupique",
    regex=[
        r"\blupique",
    ],
    exclude=dict(
        regex=[TO_EXCLUDE],
        window=(-7, 7),
    ),
    regex_attr="NORM",
)

acronym = dict(
    source="acronyms",
    regex=[
        r"\bAJI\b",
        r"\bLED\b",
        r"\bPCE\b",
        r"\bCREST\b",
        r"\bPPR\b",
        r"\bMICI\b",
        r"\bMNAI\b",
    ],
    regex_attr="TEXT",
)

named_disease = dict(
    source="named_disease",
    regex=[
        r"libman.?lack",
        r"\bstill",
        r"felty",
        r"forestier.?certon",
        r"gou(g|j)erot",
        r"raynaud",
        r"thibierge.?weiss",
        r"sjogren",
        r"gou(g|j)erot.?sjogren",
    ],
    regex_attr="NORM",
)

default_patterns = [
    main_pattern,
    lupus,
    lupique,
    acronym,
    named_disease,
]
# fmt: on

Extensions

On each span span that match, the following attributes are available:

  • span._.detailed_status: set to "PRESENT"

Examples

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
    "eds.normalizer",
    config=dict(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(f"eds.connective_tissue_disease")

Below are a few examples:

text = "Présence d'une sclérodermie."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [sclérodermie]
text = "Patient atteint d'un lupus."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [lupus]
text = "Présence d'anticoagulants lupiques,"
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: []
text = "Il y a une MICI."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [MICI]
text = "Syndrome de Raynaud"
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [Raynaud]

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object

TYPE: Optional[PipelineProtocol]

name

The name of the component

TYPE: str

patterns

The patterns to use for matching

TYPE: Optional[Dict[str, Any]] DEFAULT: [{'source': 'main', 'regex': ['arthrites.{1,5}j...

label

The label to use for the Span object and the extension

TYPE: str DEFAULT: connective_tissue_disease

span_setter

How to set matches on the doc

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'connective_tissue_disease': True}

Authors and citation

The eds.connective_tissue_disease component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.