Skip to content

Lymphoma

The eds.lymphoma pipeline component extracts mentions of lymphoma.

Details of the used patterns
# fmt: off
main_pattern = dict(
    source="main",
    regex=[
        r"lymphom(?:.{1,10}hodgkin)",
        r"lymphom",
        r"lymphangio",
        r"sezary",
        r"burkitt",
        r"kaposi",
        r"hodgkin",
        r"amylose",
        r"plasm[ao]cytome",
        r"lympho.{1,3}sarcome",
        r"lympho.?prolif",
        r"hemopathie.{1,10}lymphoide",
        r"macroglobulinemie",
        r"immunocytome",
        r"maladie.des.chaine",
        r"histiocytose.{1,5}(maligne|langerhans)",
        r"waldenst(ro|or)m",
        r"mycos.{1,10}fongoide",
        r"myelome",
        r"maladie.{1,5}immunoproliferative.{1,5}maligne",
        r"leucemie.{1,10}plasmocyte",
    ],
    regex_attr="NORM",
)

acronym = dict(
    source="acronym",
    regex=[
        r"\bLNH\b",
        r"\bLH\b",
        r"\bEATL\b",
        r"\bLAGC\b",
        r"\bLDGCB\b",
    ],
    regex_attr="TEXT",
    exclude=dict(
        regex=["/L", "/mL"],
        window=10,
    ),
)


gammapathy = dict(
    source="gammapathy",
    regex=[
        r"gammapathie monoclonale",
    ],
    exclude=dict(
        regex=[
            "benin",
            "benign",
            "signification.indeter",
            "NMSI",
            "MGUS",
        ],
        window=(0, 5),
    ),
    regex_attr="NORM",
)


default_patterns = [
    main_pattern,
    acronym,
    # gammapathy,
]
# fmt: on

Extensions

On each span span that match, the following attributes are available:

  • span._.detailed_status: set to None

Monoclonal gammapathy

Monoclonal gammapathies are not extracted by this pipeline

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.lymphoma())

Below are a few examples:

text = "Un lymphome de Hodgkin."
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: [lymphome de Hodgkin]
text = "Atteint d'un Waldenstörm"
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: [Waldenstörm]
text = "Un LAGC"
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: [LAGC]
text = "anti LAGC: 10^4/mL"
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: []

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline

TYPE: Optional[PipelineProtocol]

name

The name of the component

TYPE: Optional[str]

patterns

The patterns to use for matching

DEFAULT: [{'source': 'main', 'regex': ['lymphom(?:.{1,10...

label

The label to use for the Span object and the extension

TYPE: str DEFAULT: lymphoma

span_setter

How to set matches on the doc

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'lymphoma': True}

Authors and citation

The eds.lymphoma component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.


  1. Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069