COPD
The eds.copd
pipeline component extracts mentions of COPD (Chronic obstructive pulmonary disease). It will notably match:
- Mentions of various diseases (see below)
- Pulmonary hypertension
- Long-term oxygen therapy
Details of the used patterns
# fmt: off
main_pattern = dict(
source="main",
regex=[
r"alveolites.{1,5}fibrosante",
r"asthm",
r"broncho.?pneumopathies.{1,5}chroniques.{1,5}obstru",
r"bronchites.{1,5}chroniques.{1,5}obstru",
r"fibro.{1,20}(poumon|pulmo|pleur)",
r"fibrose.{1,5}interstitielle.{1,5}diffuse.{1,5}idiopathique",
r"fibrose.{1,5}intersti",
r"obstruction.{1,5}chronique.{1,10}voie.{1,5}aerienne",
r"pneumoconiose",
r"pneumo(nie|pathie).{0,15}(intersti|radiq|infiltr|fibro|organis)",
r"poumon.{1,5}noir",
r"sclerose.{1,5}pulmo",
r"fibro.?elastose.{1,5}pleuro.?paren",
r"apnee.{1,25}sommeil",
r"emphyseme",
r"insuffisan.{1,5}respiratoire.{1,5}chron",
r"mucoviscidose",
r"bronchiolite.oblilerante.{1,10}pneumo.{1,20}organis",
],
regex_attr="NORM",
)
htap = dict(
source="htap",
regex=[
r"\bhtap\b",
r"hypertension.{0,10}pulmo",
r"hypertension.{1,5}arter.{1,15}(poumon|pulmo)",
],
regex_attr="NORM",
exclude=[
dict(
regex="minime",
window=(0, 3),
),
],
)
oxygen = dict(
source="oxygen",
regex=[
r"oxygeno.?dependance",
r"oxygeno.?requeran",
r"oxygenation",
r"oxygeno.?ther",
r"oxygene",
],
regex_attr="NORM",
assign=[
dict(
name="long",
regex=r"(long.{1,10}(?:cour|dure)|chroni|domicil)",
window=6,
),
dict(
name="long_bis",
regex=r"(persist|major|minor)",
window=-6,
),
dict(
name="need",
regex=r"(besoin)",
window=(-6, 6),
),
],
)
acronym = dict(
source="acronym",
regex=[
r"\bBPCO\b",
r"\bFPI\b",
r"\bOLD\b",
r"\bFEPP\b",
r"\bPINS\b",
r"\bPID\b",
r"\bSAOS\b",
r"\bSAS\b",
r"\bSAHOS\b",
r"\bBOOP\b",
],
regex_attr="TEXT",
)
fid = dict(
source="fid",
regex=[
r"\bfid\b",
],
regex_attr="NORM",
exclude=[
dict(
regex=[
r"\bfig\b",
r"palpation",
],
window=(-7, 7),
),
],
)
default_patterns = [
main_pattern,
htap,
oxygen,
acronym,
fid,
]
# fmt: on
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: set to"PRESENT"
Examples
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
"eds.normalizer",
config=dict(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(f"eds.copd")
Below are a few examples:
text = "Une fibrose interstitielle diffuse idiopathique"
doc = nlp(text)
spans = doc.spans["copd"]
spans
# Out: [fibrose interstitielle diffuse idiopathique]
text = "Patient atteint de pneumoconiose"
doc = nlp(text)
spans = doc.spans["copd"]
spans
# Out: [pneumoconiose]
text = "Présence d'une HTAP."
doc = nlp(text)
spans = doc.spans["copd"]
spans
# Out: [HTAP]
text = "On voit une hypertension pulmonaire minime"
doc = nlp(text)
spans = doc.spans["copd"]
spans
# Out: []
text = "La patiente a été mis sous oxygénorequérance"
doc = nlp(text)
spans = doc.spans["copd"]
spans
# Out: []
text = "La patiente est sous oxygénorequérance au long cours"
doc = nlp(text)
spans = doc.spans["copd"]
spans
# Out: [oxygénorequérance au long cours]
span = spans[0]
span._.assigned
# Out: {'long': [long cours]}
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching TYPE: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
Authors and citation
The eds.copd
component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.