Connective tissue disease

The eds.connective_tissue_disease pipeline component extracts mentions of connective tissue diseases.

Details of the used patterns

# fmt: off
TO_EXCLUDE = r"(?<!a )((\bacc\b)|anti.?coag|anti.?corps|buschke|(\bac\b)|(\bbio))"

main_pattern = dict(
    source="main",
    regex=[
        r"arthrites.{1,5}juveniles.{1,5}idiopa",
        r"myosite",
        r"myopathie.{1,5}inflammatoire",
        r"polyarthrite.{1,5}chronique.{1,5}evol",
        r"polymyosie",
        r"polyarthrites.{1,5}(rhizo|rhuma)",
        r"sclerodermie",
        r"connectivite",
        r"sarcoidose",
    ],
    exclude=dict(
        regex=[TO_EXCLUDE],
        window=(-7, 7),
    ),
    regex_attr="NORM",
)

lupus = dict(
    source="lupus",
    regex=[
        r"\blupus",
    ],
    regex_attr="NORM",
)

lupique = dict(
    source="lupique",
    regex=[
        r"\blupique",
    ],
    exclude=dict(
        regex=[TO_EXCLUDE],
        window=(-7, 7),
    ),
    regex_attr="NORM",
)

acronym = dict(
    source="acronyms",
    regex=[
        r"\bAJI\b",
        r"\bLED\b",
        r"\bPCE\b",
        r"\bCREST\b",
        r"\bPPR\b",
        r"\bMICI\b",
        r"\bMNAI\b",
    ],
    regex_attr="TEXT",
)

named_disease = dict(
    source="named_disease",
    regex=[
        r"libman.?lack",
        r"\bstill",
        r"felty",
        r"forestier.?certon",
        r"gou(g|j)erot",
        r"raynaud",
        r"thibierge.?weiss",
        r"sjogren",
        r"gou(g|j)erot.?sjogren",
    ],
    regex_attr="NORM",
)

default_patterns = [
    main_pattern,
    lupus,
    lupique,
    acronym,
    named_disease,
]
# fmt: on

Extensions

On each span span that match, the following attributes are available:

span._.detailed_status: set to "PRESENT"

Examples

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
    "eds.normalizer",
    config=dict(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(f"eds.connective_tissue_disease")

Below are a few examples:

12345

text = "Présence d'une sclérodermie."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [sclérodermie]

text = "Patient atteint d'un lupus."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [lupus]

text = "Présence d'anticoagulants lupiques,"
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: []

text = "Il y a une MICI."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [MICI]

text = "Syndrome de Raynaud"
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]

spans
# Out: [Raynaud]

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object TYPE: `Optional[PipelineProtocol]`
`name`	The name of the component TYPE: `str`
`patterns`	The patterns to use for matching TYPE: `Optional[Dict[str, Any]]` DEFAULT: `[{'source': 'main', 'regex': ['arthrites.{1,5}j...`
`label`	The label to use for the `Span` object and the extension TYPE: `str` DEFAULT: `connective_tissue_disease`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'connective_tissue_disease': True}`

Authors and citation

The eds.connective_tissue_disease component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.