Lymphoma[source]

The eds.lymphoma pipeline component extracts mentions of lymphoma.

Details of the used patterns

# fmt: off
main_pattern = dict(
    source="main",
    regex=[
        r"lymphom(?:.{1,10}hodgkin)",
        r"lymphom",
        r"lymphangio",
        r"sezary",
        r"burkitt?",
        r"kaposi",
        r"hodgkin",
        r"amylose",
        r"plasm[ao]cytome",
        r"lympho.{1,3}sarcome",
        r"lympho.?prolif",
        r"hemopathie.{1,10}lymphoide",
        r"macroglobulinemie",
        r"imm?unocytome",
        r"maladie.des.chaines?",
        r"histi?ocytose.{1,5}(maligne|langerhans?)",
        r"waldenst(ro|or)m",
        r"mycos.{1,10}fongoide",
        r"myelome",
        r"maladie.{1,5}imm?uno\s*proliferative.{1,5}maligne",
        r"leucemie.{1,10}plasmocyte",
    ],
    regex_attr="NORM",
)

acronym = dict(
    source="acronym",
    regex=[
        r"\bLNH\b",
        r"\bLH\b",
        r"\bEATL\b",
        r"\bLAGC\b",
        r"\bLDGCB\b",
    ],
    regex_attr="TEXT",
    exclude=dict(
        regex=["/L", "/mL"],
        window=10,
    ),
)


gammapathy = dict(
    source="gammapathy",
    regex=[
        r"gam?mapath\w+\s*monoclonale",
    ],
    exclude=dict(
        regex=[
            "benin",
            "benign",
            "signification.indeter",
            "NMSI",
            "MGUS",
        ],
        window=(0, 5),
    ),
    regex_attr="NORM",
)


default_patterns = [
    main_pattern,
    acronym,
    # gammapathy,
]

# fmt: on

Extensions

On each span span that match, the following attributes are available:

span._.detailed_status: set to None

Monoclonal gammapathy

Monoclonal gammapathies are not extracted by this pipeline

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.lymphoma())

Below are a few examples:

1234

text = "Un lymphome de Hodgkin."
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: [lymphome de Hodgkin]

text = "Atteint d'un Waldenstörm"
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: [Waldenstörm]

text = "Un LAGC"
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: [LAGC]

text = "anti LAGC: 10^4/mL"
doc = nlp(text)
spans = doc.spans["lymphoma"]

spans
# Out: []

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline TYPE: `Optional[PipelineProtocol]`
`name`	The name of the component TYPE: `Optional[str]`
`patterns`	The patterns to use for matching TYPE: `FullConfig` DEFAULT: `[{'source': 'main', 'regex': ['lymphom(?:.{1,10...`
`label`	The label to use for the `Span` object and the extension TYPE: `str` DEFAULT: `lymphoma`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'lymphoma': True}`

Authors and citation

The eds.lymphoma component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.

Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069