AIDS

The eds.aids pipeline component extracts mentions of AIDS. It will notably match:

Mentions of VIH/HIV at the SIDA/AIDS stage
Mentions of VIH/HIV with opportunistic(s) infection(s)

Details of the used patterns

# fmt: off
aids = dict(
    source="aids",
    regex=[
        r"(vih.{1,5}stade.{1,5})?\bsida\b",
    ],
    regex_attr="NORM",
)

hiv = dict(
    source="hiv",
    regex=[
        r"\bhiv\b",
        r"\bvih\b",
    ],
    exclude=dict(
        regex=["serologie", "prelevement"],
        window=(-20, 20),
        limit_to_sentence=False,
    ),
    assign=[
        dict(
            name="opportunist",
            regex=r"("
            + r"|".join(
                [
                    r"kapo[sz]i",
                    r"toxoplasmose",
                    r"meningo.?encephalite.toxo",
                    r"pneumocystose",
                    r"\bpep\b",
                    r"pneumocystis",
                    r"cryptococcose",
                    r"cytomégalovirus",
                    r"myobact",
                    r"opportunist",
                    r"co.?infect",
                ]
            )
            + ")"
            + r"(?!.{0,20}(?:non|0))",
            window=(-10, 30),
            limit_to_sentence=False,
        ),
        dict(
            name="stage",
            regex=r"stade.{0,5}\b(b|c)\b",
            window=10,
        ),
    ],
    regex_attr="NORM",
)

default_patterns = [
    aids,
    hiv,
]
# fmt: on

On HIV infection

pre-AIDS HIV infection are not extracted, only AIDS.

Extensions

On each span span that match, the following attributes are available:

span._.detailed_status: set to "PRESENT"
span._.assigned: dictionary with the following keys, if relevant:
- opportunist: list of opportunist infections extracted around the HIV mention
- stage: stage of the HIV infection

Examples

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
    "eds.normalizer",
    config=dict(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(f"eds.aids")

Below are a few examples:

SIDAVIHCoinfectionVIH stade SIDA

text = "Patient atteint du VIH au stade SIDA."
doc = nlp(text)
spans = doc.spans["aids"]

spans
# Out: [VIH au stade SIDA]

text = "Patient atteint du VIH."
doc = nlp(text)
spans = doc.spans["aids"]

spans
# Out: []

text = "Il y a un VIH avec coinfection pneumocystose"
doc = nlp(text)
spans = doc.spans["aids"]

spans
# Out: [VIH]

span = spans[0]

span._.assigned
# Out: {'opportunist': [coinfection, pneumocystose]}

text = "Présence d'un VIH stade C"
doc = nlp(text)
spans = doc.spans["aids"]

spans
# Out: [VIH]

span = spans[0]

span._.assigned
# Out: {'stage': [C]}

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object TYPE: `Optional[PipelineProtocol]`
`name`	The name of the component TYPE: `Optional[str]` DEFAULT: `'eds.aids'`
`patterns`	The patterns to use for matching TYPE: `Union[Dict[str, Any], List[Dict[str, Any]]]` DEFAULT: `[{'source': 'aids', 'regex': ['(vih.{1,5}stade....`
`label`	The label to use for the `Span` object and the extension TYPE: `str` DEFAULT: `aids`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'aids': True}`

Authors and citation

The eds.aids component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.