AIDS[source]
The eds.aids
pipeline component extracts mentions of AIDS. It will notably match:
- Mentions of VIH/HIV at the SIDA/AIDS stage
- Mentions of VIH/HIV with opportunistic(s) infection(s)
Details of the used patterns
# fmt: off
aids = dict(
source="aids",
regex=[
r"(vih.{1,5}stade.{1,5})?\bsida\b",
],
regex_attr="NORM",
)
hiv = dict(
source="hiv",
regex=[
r"\bhiv\b",
r"\bvih\b",
],
exclude=dict(
regex=["serologie", "prelevement"],
window=(-20, 20),
limit_to_sentence=False,
),
assign=[
dict(
name="opportunist",
regex=r"("
+ r"|".join(
[
r"kapo[sz]i",
r"toxoplasmose",
r"meningo.?encephalite.toxo",
r"pneumocystose",
r"\bpep\b",
r"pneumocystis",
r"cryptococcose",
r"cytomégalovirus",
r"myobact",
r"opportunist",
r"co.?infect",
]
)
+ ")"
+ r"(?!.{0,20}(?:non|0))",
window=(-10, 30),
limit_to_sentence=False,
),
dict(
name="stage",
regex=r"stade.{0,5}\b(b|c)\b",
window=10,
),
],
regex_attr="NORM",
)
default_patterns = [
aids,
hiv,
]
# fmt: on
On HIV infection
pre-AIDS HIV infection are not extracted, only AIDS.
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: set to Nonespan._.assigned
: dictionary with the following keys, if relevant:opportunist
: list of opportunist infections extracted around the HIV mentionstage
: stage of the HIV infection
Examples
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
eds.normalizer(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(f"eds.aids")
Below are a few examples:
text = "Patient atteint du VIH au stade SIDA."
doc = nlp(text)
spans = doc.spans["aids"]
spans
# Out: [VIH au stade SIDA]
text = "Patient atteint du VIH."
doc = nlp(text)
spans = doc.spans["aids"]
spans
# Out: []
text = "Il y a un VIH avec coinfection pneumocystose"
doc = nlp(text)
spans = doc.spans["aids"]
spans
# Out: [VIH]
span = spans[0]
span._.assigned
# Out: {'opportunist': [coinfection, pneumocystose]}
text = "Présence d'un VIH stade C"
doc = nlp(text)
spans = doc.spans["aids"]
spans
# Out: [VIH]
span = spans[0]
span._.assigned
# Out: {'stage': [C]}
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching TYPE: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
Authors and citation
The eds.aids
component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.
Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069