Lymphoma
The eds.lymphoma
pipeline component extracts mentions of lymphoma.
Details of the used patterns
# fmt: off
main_pattern = dict(
source="main",
regex=[
r"lymphom(?:.{1,10}hodgkin)",
r"lymphom",
r"lymphangio",
r"sezary",
r"burkitt",
r"kaposi",
r"hodgkin",
r"amylose",
r"plasm[ao]cytome",
r"lympho.{1,3}sarcome",
r"lympho.?prolif",
r"hemopathie.{1,10}lymphoide",
r"macroglobulinemie",
r"immunocytome",
r"maladie.des.chaine",
r"histiocytose.{1,5}(maligne|langerhans)",
r"waldenst(ro|or)m",
r"mycos.{1,10}fongoide",
r"myelome",
r"maladie.{1,5}immunoproliferative.{1,5}maligne",
r"leucemie.{1,10}plasmocyte",
],
regex_attr="NORM",
)
acronym = dict(
source="acronym",
regex=[
r"\bLNH\b",
r"\bLH\b",
r"\bEATL\b",
r"\bLAGC\b",
r"\bLDGCB\b",
],
regex_attr="TEXT",
exclude=dict(
regex=["/L", "/mL"],
window=10,
),
)
gammapathy = dict(
source="gammapathy",
regex=[
r"gammapathie monoclonale",
],
exclude=dict(
regex=[
"benin",
"benign",
"signification.indeter",
"NMSI",
"MGUS",
],
window=(0, 5),
),
regex_attr="NORM",
)
default_patterns = [
main_pattern,
acronym,
# gammapathy,
]
# fmt: on
Extensions
On each span span
that match, the following attributes are available:
span._.detailed_status
: set to None
Monoclonal gammapathy
Monoclonal gammapathies are not extracted by this pipeline
Examples
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
eds.normalizer(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
nlp.add_pipe(eds.lymphoma())
Below are a few examples:
text = "Un lymphome de Hodgkin."
doc = nlp(text)
spans = doc.spans["lymphoma"]
spans
# Out: [lymphome de Hodgkin]
text = "Atteint d'un Waldenstörm"
doc = nlp(text)
spans = doc.spans["lymphoma"]
spans
# Out: [Waldenstörm]
text = "Un LAGC"
doc = nlp(text)
spans = doc.spans["lymphoma"]
spans
# Out: [LAGC]
text = "anti LAGC: 10^4/mL"
doc = nlp(text)
spans = doc.spans["lymphoma"]
spans
# Out: []
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline TYPE: |
name | The name of the component TYPE: |
patterns | The patterns to use for matching DEFAULT: |
label | The label to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
Authors and citation
The eds.lymphoma
component was developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024.
Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069