Skip to content

Solid tumor

The eds.solid_tumor pipeline component extracts mentions of solid tumors. It will notably match:

Details of the used patterns
# fmt: off
BENINE = r"benign|benin|(grade.?\b[i1]\b)"
STAGE = r"stade ([^\s]*)"

main_pattern = dict(
    source="main",
    regex=[
        r"carcinom(?!.{0,10}in.?situ)",
        r"seminome",
        r"(?<!lympho)(?<!lympho-)sarcome",
        r"blastome",
        r"cancer([^o]|\s|\b)",
        r"adamantinome",
        r"chordome",
        r"craniopharyngiome",
        r"melanome",
        r"neoplasie",
        r"neoplasme",
        r"linite",
        r"melanome",
        r"mesoteliome",
        r"mesotheliome",
        r"seminome",
        r"myxome",
        r"paragangliome",
        r"craniopharyngiome",
        r"k .{0,5}(prostate|sein)",
        r"pancoast.?tobias",
        r"syndrome.{1,10}lynch",
        r"li.?fraumeni",
        r"germinome",
        r"adeno[\s-]?k",
        r"thymome",
        r"\bnut\b",
        r"\bgist\b",
        r"\bchc\b",
        r"\badk\b",
        r"\btves\b",
        r"\btv.tves\b",
        r"lesion.{1,20}tumor",
        r"tumeur",
        r"carcinoid",
        r"histiocytome",
        r"ependymome",
        # r"primitif", Trop de FP
    ],
    exclude=dict(
        regex=BENINE,
        window=(0, 5),
    ),
    regex_attr="NORM",
    assign=[
        dict(
            name="metastasis",
            regex=r"(metasta|multinodul)",
            window=(-3, 7),
            reduce_mode="keep_last",
        ),
        dict(
            name="stage",
            regex=STAGE,
            window=7,
            reduce_mode="keep_last",
        ),
    ],
)

metastasis_pattern = dict(
    source="metastasis",
    regex=[
        r"cellule.{1,5}tumorale.{1,5}circulantes",
        r"metasta",
        r"multinodul",
        r"carcinose",
        r"ruptures.{1,5}corticale",
        r"envahissement.{0,15}parties\smolle",
        r"(localisation|lesion)s?.{0,20}second",
        r"(lymphangite|meningite).{1,5}carcinomateuse",
    ],
    regex_attr="NORM",
    exclude=dict(
        regex=r"goitre",
        window=-3,
    ),
)

default_patterns = [
    main_pattern,
    metastasis_pattern,
]
# fmt: on

Extensions

On each span span that match, the following attributes are available:

  • span._.detailed_status: set to either
    • "METASTASIS" for tumors at the metastatic stage
    • "LOCALIZED" else
  • span._.assigned: dictionary with the following keys, if relevant:
    • stage: stage of the tumor

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.solid_tumor())

Below are a few examples:

text = "Présence d'un carcinome intra-hépatique."
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [carcinome]
text = "Patient avec un K sein."
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [K sein]
text = "Il y a une tumeur bénigne"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: []
text = "Tumeur métastasée"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [Tumeur métastasée]

span = spans[0]

span._.detailed_status
# Out: METASTASIS

span._.assigned
# Out: {'metastasis': métastasée}
text = "Cancer du poumon au stade 4"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [Cancer du poumon au stade 4]

span = spans[0]

span._.detailed_status
# Out: METASTASIS

span._.assigned
# Out: {'stage': 4}
text = "Cancer du poumon au stade 2"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [Cancer du poumon au stade 2]

span = spans[0]

span._.assigned
# Out: {'stage': 2}
text = "Présence de nombreuses lésions secondaires"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [lésions secondaires]

span = spans[0]

span._.detailed_status
# Out: METASTASIS

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline

TYPE: Optional[PipelineProtocol]

name

The name of the component

TYPE: Optional[str]

patterns

The patterns to use for matching

DEFAULT: [{'source': 'main', 'regex': ['carcinom(?!.{0,1...

label

The label to use for the Span object and the extension

TYPE: str DEFAULT: solid_tumor

span_setter

How to set matches on the doc

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'solid_tumor': True}

use_tnm

Whether to use TNM scores matching as well

TYPE: bool DEFAULT: False

Authors and citation

The eds.solid_tumor component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.