Connective tissue disease
The eds.connective_tissue_disease
pipeline component extracts mentions of connective tissue diseases.
Details of the used patterns
# fmt: off
TO_EXCLUDE = r"(?<!a )((\bacc\b)|anti.?coag|anti.?corps|buschke|(\bac\b)|(\bbio))"
main_pattern = dict(
source="main",
regex=[
r"arthrites.{1,5}juveniles.{1,5}idiopa",
r"myosite",
r"myopathie.{1,5}inflammatoire",
r"polyarthrite.{1,5}chronique.{1,5}evol",
r"polymyosie",
r"polyarthrites.{1,5}(rhizo|rhuma)",
r"sclerodermie",
r"connectivite",
r"sarcoidose",
],
exclude=dict(
regex=[TO_EXCLUDE],
window=(-7, 7),
),
regex_attr="NORM",
)
lupus = dict(
source="lupus",
regex=[
r"\blupus",
],
regex_attr="NORM",
)
lupique = dict(
source="lupique",
regex=[
r"\blupique",
],
exclude=dict(
regex=[TO_EXCLUDE],
window=(-7, 7),
),
regex_attr="NORM",
)
acronym = dict(
source="acronyms",
regex=[
r"\bAJI\b",
r"\bLED\b",
r"\bPCE\b",
r"\bCREST\b",
r"\bPPR\b",
r"\bMICI\b",
r"\bMNAI\b",
],
regex_attr="TEXT",
)
named_disease = dict(
source="named_disease",
regex=[
r"libman.?lack",
r"\bstill",
r"felty",
r"forestier.?certon",
r"gou(g|j)erot",
r"raynaud",
r"thibierge.?weiss",
r"sjogren",
r"gou(g|j)erot.?sjogren",
],
regex_attr="NORM",
)
default_patterns = [
main_pattern,
lupus,
lupique,
acronym,
named_disease,
]
# fmt: on
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: set to"PRESENT"
Examples
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
"eds.normalizer",
config=dict(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(f"eds.connective_tissue_disease")
Below are a few examples:
text = "Présence d'une sclérodermie."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]
spans
# Out: [sclérodermie]
text = "Patient atteint d'un lupus."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]
spans
# Out: [lupus]
text = "Présence d'anticoagulants lupiques,"
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]
spans
# Out: []
text = "Il y a une MICI."
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]
spans
# Out: [MICI]
text = "Syndrome de Raynaud"
doc = nlp(text)
spans = doc.spans["connective_tissue_disease"]
spans
# Out: [Raynaud]
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching TYPE: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
Authors and citation
The eds.connective_tissue_disease
component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.