Leukemia[source]

The eds.leukemia pipeline component extracts mentions of leukemia.

Details of the used patterns

# fmt: off
main_pattern = dict(
    source="main",
    regex=[
        r"leucemie?",
        r"(syndrome?.)?myelo\s*proliferatif",
        r"m[yi]eloprolifer",
    ],
    exclude=dict(
        regex=[
            "plasmocyte",
            "benin",
            "benign",
        ],
        window=5,
    ),
    regex_attr="NORM",
)

acronym = dict(
    source="acronym",
    regex=[
        r"\bLAM\b",
        r"\bLAM.?[0-9]",
        r"\bLAL\b",
        r"\bLMC\b",
        r"\bLCE\b",
        r"\bLMM[JC]\b",
        r"\bLCN\b",
        r"\bAREB\b",
        r"\bAPMF\b",
        r"\bLLC\b",
        r"\bSMD\b",
        r"LA my[éèe]lomonocytaire",
    ],
    regex_attr="TEXT",
    exclude=dict(
        regex="anti",
        window=-20,
    ),
)

other = dict(
    source="other",
    regex=[
        r"myelofibrose",
        r"vaquez",
        r"thrombocytem\w+.{1,3}essentiell?e?",
        r"splenomegal\w+.{1,3}myeloide",
        r"mastocytose.{1,5}maligne?",
        r"polyglobul\w+.{1,10}essentiell?e?",
        r"letterer.?siwe",
        r"anemie.refractaire.{1,20}blaste",
        r"m[iy]elod[iy]splasi",
        r"syndrome.myelo.?dysplasique",
    ],
    regex_attr="NORM",
)

default_patterns = [
    main_pattern,
    acronym,
    other,
]

# fmt: on

Extensions

On each span span that match, the following attributes are available:

span._.detailed_status: set to None

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.leukemia())

Below are a few examples:

1234

text = "Sydrome myéloprolifératif"
doc = nlp(text)
spans = doc.spans["leukemia"]

spans
# Out: [myéloprolifératif]

text = "Sydrome myéloprolifératif bénin"
doc = nlp(text)
spans = doc.spans["leukemia"]

spans
# Out: []

text = "Patient atteint d'une LAM"
doc = nlp(text)
spans = doc.spans["leukemia"]

spans
# Out: [LAM]

text = "Une maladie de Vaquez"
doc = nlp(text)
spans = doc.spans["leukemia"]

spans
# Out: [Vaquez]

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline TYPE: `Optional[PipelineProtocol]`
`name`	The name of the component TYPE: `Optional[str]`
`patterns`	The patterns to use for matching TYPE: `FullConfig` DEFAULT: `[{'source': 'main', 'regex': ['leucemie?', '(sy...`
`label`	The label to use for the `Span` object and the extension TYPE: `str` DEFAULT: `leukemia`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'leukemia': True}`

Authors and citation

The eds.leukemia component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.

Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069