Solid tumor
The eds.solid_tumor
pipeline component extracts mentions of solid tumors. It will notably match:
Details of the used patterns
# fmt: off
BENINE = r"benign|benin|(grade.?\b[i1]\b)"
STAGE = r"stade ([^\s]*)"
main_pattern = dict(
source="main",
regex=[
r"carcinom(?!.{0,10}in.?situ)",
r"seminome",
r"(?<!lympho)(?<!lympho-)sarcome",
r"blastome",
r"cancer([^o]|\s|\b)",
r"adamantinome",
r"chordome",
r"craniopharyngiome",
r"melanome",
r"neoplasie",
r"neoplasme",
r"linite",
r"melanome",
r"mesoteliome",
r"mesotheliome",
r"seminome",
r"myxome",
r"paragangliome",
r"craniopharyngiome",
r"k .{0,5}(prostate|sein)",
r"pancoast.?tobias",
r"syndrome.{1,10}lynch",
r"li.?fraumeni",
r"germinome",
r"adeno[\s-]?k",
r"thymome",
r"\bnut\b",
r"\bgist\b",
r"\bchc\b",
r"\badk\b",
r"\btves\b",
r"\btv.tves\b",
r"lesion.{1,20}tumor",
r"tumeur",
r"carcinoid",
r"histiocytome",
r"ependymome",
# r"primitif", Trop de FP
],
exclude=dict(
regex=BENINE,
window=(0, 5),
),
regex_attr="NORM",
assign=[
dict(
name="metastasis",
regex=r"(metasta|multinodul)",
window=(-3, 7),
reduce_mode="keep_last",
),
dict(
name="stage",
regex=STAGE,
window=7,
reduce_mode="keep_last",
),
],
)
metastasis_pattern = dict(
source="metastasis",
regex=[
r"cellule.{1,5}tumorale.{1,5}circulantes",
r"metasta",
r"multinodul",
r"carcinose",
r"ruptures.{1,5}corticale",
r"envahissement.{0,15}parties\smolle",
r"(localisation|lesion)s?.{0,20}second",
r"(lymphangite|meningite).{1,5}carcinomateuse",
],
regex_attr="NORM",
exclude=dict(
regex=r"goitre",
window=-3,
),
)
# Patterns developed for CT-Scan reports
metastasis_ct_scan = dict(
source="metastasis_ct_scan",
regex=[
r"(?i)(m[ée]tasta(se|tique)s?)",
r"(diss[ée]min[ée]e?s?)",
r"(carcinose)",
r"(((allure|l[ée]sion|localisation|progression)s?\s)(suspecte?s?)?.{0,50}(secondaire)s?)",
r"(l(a|â)ch(é|e|er)\sde\sballons?)",
r"(l[ée]sions?\s(non\s)?cibles?)",
r"(rupture.{1,20}corticale)",
r"(envahissement.{0,15}parties\smolles)",
r"((l[i,y]se).{1,20}os)|ost[eé]ol[i,y]|rupture.{1,20}corticale|envahissement.{1,20}parties\smolles|ost[eé]ocondensa.{1,20}(suspect|secondaire|[ée]volutive)",
r"(l[ée]sion|anomalie|image).{1,20}os.{1,30}(suspect|secondaire|[ée]volutive)",
r"os.{1,30}(l[ée]sion|anomalie|image).{1,20}(suspect|secondaire|[ée]volutive)",
r"(l[ée]sion|anomalie|image).{1,20}l[i,y]tique",
r"(l[ée]sion|anomalie|image).{1,20}condensant.{1,20}(suspect|secondaire|[ée]volutive)",
r"fracture.{1,30}(suspect|secondaire|[ée]volutive)",
r"((l[ée]sion|anomalie|image|nodule).{1,80}(secondaire))",
r"((l[ée]sion|anomalie|image|nodule)s.{1,40}suspec?ts?)",
],
regex_attr="NORM",
)
default_patterns = [
main_pattern,
metastasis_pattern,
]
# fmt: on
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: set to either"METASTASIS"
for tumors at the metastatic stage"LOCALIZED"
else
span._.assigned
: dictionary with the following keys, if relevant:stage
: stage of the tumor
Examples
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
eds.normalizer(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(eds.solid_tumor())
Below are a few examples:
text = "Présence d'un carcinome intra-hépatique."
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [carcinome]
text = "Patient avec un K sein."
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [K sein]
text = "Il y a une tumeur bénigne"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: []
text = "Tumeur métastasée"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [Tumeur métastasée]
span = spans[0]
span._.detailed_status
# Out: METASTASIS
span._.assigned
# Out: {'metastasis': métastasée}
text = "Cancer du poumon au stade 4"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [Cancer du poumon au stade 4]
span = spans[0]
span._.detailed_status
# Out: METASTASIS
span._.assigned
# Out: {'stage': 4}
text = "Cancer du poumon au stade 2"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [Cancer du poumon au stade 2]
span = spans[0]
span._.assigned
# Out: {'stage': 2}
text = "Présence de nombreuses lésions secondaires"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [lésions secondaires]
span = spans[0]
span._.detailed_status
# Out: METASTASIS
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching DEFAULT: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
use_tnm | Whether to use TNM scores matching as well TYPE: |
use_patterns_metastasis_ct_scan | Whether to use the metastasis patterns developed for the CT-Scans TYPE: |
Authors and citation
The eds.solid_tumor
component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024 and Kempf et al., 2022.
Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069
Kempf E., Priou S., Lamé G., Daniel C., Bellamine A., Sommacale D., Belkacemi y., Bey R., Galula G., Taright N., Tannier X., Rance B., Flicoteaux R., Hemery F., Audureau E., Chatellier G. and Tournigand C., 2022. Impact of two waves of Sars-Cov2 outbreak on the number, clinical presentation, care trajectories and survival of patients newly referred for a colorectal cancer: A French multicentric cohort study from a large group of University hospitals. {International Journal of Cancer}. 150, pp.1609-1618. 10.1002/ijc.33928