Solid tumor
The eds.solid_tumor
pipeline component extracts mentions of solid tumors. It will notably match:
Details of the used patterns
# fmt: off
BENINE = r"benign|benin|(grade.?\b[i1]\b)"
STAGE = r"stade ([^\s]*)"
main_pattern = dict(
source="main",
regex=[
r"carcinom(?!.{0,10}in.?situ)",
r"seminome",
r"(?<!lympho)(?<!lympho-)sarcome",
r"blastome",
r"cancer([^o]|\s|\b)",
r"adamantinome",
r"chordome",
r"craniopharyngiome",
r"melanome",
r"neoplasie",
r"neoplasme",
r"linite",
r"melanome",
r"mesoteliome",
r"mesotheliome",
r"seminome",
r"myxome",
r"paragangliome",
r"craniopharyngiome",
r"k .{0,5}(prostate|sein)",
r"pancoast.?tobias",
r"syndrome.{1,10}lynch",
r"li.?fraumeni",
r"germinome",
r"adeno[\s-]?k",
r"thymome",
r"\bnut\b",
r"\bgist\b",
r"\bchc\b",
r"\badk\b",
r"\btves\b",
r"\btv.tves\b",
r"lesion.{1,20}tumor",
r"tumeur",
r"carcinoid",
r"histiocytome",
r"ependymome",
# r"primitif", Trop de FP
],
exclude=dict(
regex=BENINE,
window=(0, 5),
),
regex_attr="NORM",
assign=[
dict(
name="metastasis",
regex=r"(metasta|multinodul)",
window=(-3, 7),
reduce_mode="keep_last",
),
dict(
name="stage",
regex=STAGE,
window=7,
reduce_mode="keep_last",
),
],
)
metastasis_pattern = dict(
source="metastasis",
regex=[
r"cellule.{1,5}tumorale.{1,5}circulantes",
r"metasta",
r"multinodul",
r"carcinose",
r"ruptures.{1,5}corticale",
r"envahissement.{0,15}parties\smolle",
r"(localisation|lesion)s?.{0,20}second",
r"(lymphangite|meningite).{1,5}carcinomateuse",
],
regex_attr="NORM",
exclude=dict(
regex=r"goitre",
window=-3,
),
)
default_patterns = [
main_pattern,
metastasis_pattern,
]
# fmt: on
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: set to either"METASTASIS"
for tumors at the metastatic stage"LOCALIZED"
else
span._.assigned
: dictionary with the following keys, if relevant:stage
: stage of the tumor
Examples
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.sentences")
nlp.add_pipe(
"eds.normalizer",
config=dict(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(f"eds.solid_tumor")
Below are a few examples:
text = "Présence d'un carcinome intra-hépatique."
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [carcinome]
text = "Patient avec un K sein."
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [K sein]
text = "Il y a une tumeur bénigne"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: []
text = "Tumeur métastasée"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [Tumeur métastasée]
span = spans[0]
span._.detailed_status
# Out: METASTASIS
span._.assigned
# Out: {'metastasis': métastasée}
text = "Cancer du poumon au stade 4"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [Cancer du poumon au stade 4]
span = spans[0]
span._.detailed_status
# Out: METASTASIS
span._.assigned
# Out: {'stage': 4}
text = "Cancer du poumon au stade 2"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [Cancer du poumon au stade 2]
span = spans[0]
span._.assigned
# Out: {'stage': 2}
text = "Présence de nombreuses lésions secondaires"
doc = nlp(text)
spans = doc.spans["solid_tumor"]
spans
# Out: [lésions secondaires]
span = spans[0]
span._.detailed_status
# Out: METASTASIS
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching DEFAULT: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
use_tnm | Whether to use TNM scores matching as well TYPE: |
Authors and citation
The eds.solid_tumor
component was developed by AP-HP's Data Science team with a team of medical experts. A paper describing in details the development of those components is being drafted and will soon be available.