Solid tumor[source]

The eds.solid_tumor pipeline component extracts mentions of solid tumors. It will notably match:

Details of the used patterns

# fmt: off
BENINE = r"benign|benin|(grade.?\b[i1]\b)"
STAGE = r"stade ([^\s]*)"

main_pattern = dict(
    source="main",
    regex=[
        r"carcinom(?!.{0,10}in.?situ)",
        r"seminome",
        r"(?<!lympho)(?<!lympho-)sarcome",
        r"blastome",
        r"cancer([^o]|\s|\b)",
        r"adamantinome",
        r"chordome",
        r"craniopharyngiome",
        r"melanome",
        r"neoplasie",
        r"neoplasme",
        r"linite",
        r"melanome",
        r"mesoteliome",
        r"mesotheliome",
        r"seminome",
        r"myxome",
        r"paragangliome",
        r"craniopharyngiome",
        r"k .{0,5}(prostate|sein)",
        r"pancoast.?tobias",
        r"syndrome.{1,10}lynch",
        r"li.?fraumeni",
        r"germinome",
        r"adeno[\s-]?k",
        r"thymome",
        r"\bnut\b",
        r"\bgist\b",
        r"\bchc\b",
        r"\badk\b",
        r"\btves\b",
        r"\btv.tves\b",
        r"lesion.{1,20}tumor",
        r"tumeur",
        r"carcinoid",
        r"histiocytome",
        r"ependymome",
        # r"primitif", Trop de FP
    ],
    exclude=dict(
        regex=BENINE,
        window=(0, 5),
    ),
    regex_attr="NORM",
    assign=[
        dict(
            name="metastasis",
            regex=r"(metasta|multinodul)",
            window=(-3, 7),
            reduce_mode="keep_last",
        ),
        dict(
            name="stage",
            regex=STAGE,
            window=7,
            reduce_mode="keep_last",
        ),
    ],
)

metastasis_pattern = dict(
    source="metastasis",
    regex=[
        r"cellule.{1,5}tumorale.{1,5}circulantes",
        r"metasta",
        r"multinodul",
        r"carcinose",
        r"ruptures.{1,5}corticale",
        r"envahissement.{0,15}parties\smolle",
        r"(localisation|lesion)s?.{0,20}second",
        r"(lymphangite|meningite).{1,5}carcinomateuse",
    ],
    regex_attr="NORM",
    exclude=dict(
        regex=r"goitre",
        window=-3,
    ),
)

# Patterns developed for CT-Scan reports
metastasis_ct_scan = dict(
    source="metastasis_ct_scan",
    regex=[
        r"(?i)(m[ée]tasta(se|tique)s?)",
        r"(diss[ée]min[ée]e?s?)",
        r"(carcinose)",
        r"(((allure|l[ée]sion|localisation|progression)s?\s)(suspecte?s?)?.{0,50}(secondaire)s?)",
        r"(l(a|â)ch(é|e|er)\sde\sballons?)",
        r"(l[ée]sions?\s(non\s)?cibles?)",
        r"(rupture.{1,20}corticale)",
        r"(envahissement.{0,15}parties\smolles)",
        r"((l[i,y]se).{1,20}os)|ost[eé]ol[i,y]|rupture.{1,20}corticale|envahissement.{1,20}parties\smolles|ost[eé]ocondensa.{1,20}(suspect|secondaire|[ée]volutive)",
        r"(l[ée]sion|anomalie|image).{1,20}os.{1,30}(suspect|secondaire|[ée]volutive)",
        r"os.{1,30}(l[ée]sion|anomalie|image).{1,20}(suspect|secondaire|[ée]volutive)",
        r"(l[ée]sion|anomalie|image).{1,20}l[i,y]tique",
        r"(l[ée]sion|anomalie|image).{1,20}condensant.{1,20}(suspect|secondaire|[ée]volutive)",
        r"fracture.{1,30}(suspect|secondaire|[ée]volutive)",
        r"((l[ée]sion|anomalie|image|nodule).{1,80}(secondaire))",
        r"((l[ée]sion|anomalie|image|nodule)s.{1,40}suspec?ts?)",
    ],
    regex_attr="NORM",
)

default_patterns = [
    main_pattern,
    metastasis_pattern,
]

# fmt: on

Extensions

On each span span that match, the following attributes are available:

span._.detailed_status: set to either
- "METASTASIS" for tumors at the metastatic stage
- "LOCALIZED" else
span._.assigned: dictionary with the following keys, if relevant:
- stage: stage of the tumor

Examples

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.normalizer(
        accents=True,
        lowercase=True,
        quotes=True,
        spaces=True,
        pollution=dict(
            information=True,
            bars=True,
            biology=True,
            doctors=True,
            web=True,
            coding=True,
            footer=True,
        ),
    ),
)
nlp.add_pipe(eds.solid_tumor())

Below are a few examples:

1234567

text = "Présence d'un carcinome intra-hépatique."
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [carcinome]

text = "Patient avec un K sein."
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [K sein]

text = "Il y a une tumeur bénigne"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: []

text = "Tumeur métastasée"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [Tumeur métastasée]

span = spans[0]

span._.detailed_status
# Out: METASTASIS

span._.assigned
# Out: {'metastasis': métastasée}

text = "Cancer du poumon au stade 4"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [Cancer du poumon au stade 4]

span = spans[0]

span._.detailed_status
# Out: METASTASIS

span._.assigned
# Out: {'stage': 4}

text = "Cancer du poumon au stade 2"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [Cancer du poumon au stade 2]

span = spans[0]

span._.assigned
# Out: {'stage': 2}

text = "Présence de nombreuses lésions secondaires"
doc = nlp(text)
spans = doc.spans["solid_tumor"]

spans
# Out: [lésions secondaires]

span = spans[0]

span._.detailed_status
# Out: METASTASIS

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline TYPE: `Optional[PipelineProtocol]`
`name`	The name of the component TYPE: `Optional[str]` DEFAULT: `'solid_tumor'`
`patterns`	The patterns to use for matching TYPE: `Union[Dict[str, Any], List[Dict[str, Any]]]` DEFAULT: `[{'source': 'main', 'regex': ['carcinom(?!.{0,1...`
`label`	The label to use for the `Span` object and the extension TYPE: `str` DEFAULT: `solid_tumor`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'solid_tumor': True}`
`use_tnm`	Whether to use TNM scores matching as well TYPE: `bool` DEFAULT: `False`
`use_patterns_metastasis_ct_scan`	Whether to use the metastasis patterns developed for the CT-Scans TYPE: `bool` DEFAULT: `False`

Authors and citation

The eds.solid_tumor component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024 and Kempf et al., 2022.

Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069
Kempf E., Priou S., Lamé G., Daniel C., Bellamine A., Sommacale D., Belkacemi y., Bey R., Galula G., Taright N., Tannier X., Rance B., Flicoteaux R., Hemery F., Audureau E., Chatellier G. and Tournigand C., 2022. Impact of two waves of Sars-Cov2 outbreak on the number, clinical presentation, care trajectories and survival of patients newly referred for a colorectal cancer: A French multicentric cohort study from a large group of University hospitals. {International Journal of Cancer}. 150, pp.1609-1618. 10.1002/ijc.33928