UMLS

The eds.umls pipeline component matches the UMLS (Unified Medical Language System from NIH) terminology.

Very low recall

When using the exact matching mode, this component has a very poor recall performance. We can use the simstring mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.

Examples

eds.umls is an additional module that needs to be setup by:

pip install -U umls_downloader
Signing up for a UMLS Terminology Services Account. After filling a short form, you will receive your token API within a few days.
Set UMLS_API_KEY locally: export UMLS_API_KEY=your_api_key

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.umls())

text = "Grosse toux: le malade a été mordu par des Amphibiens " "sous le genou"

doc = nlp(text)

doc.ents
# Out: (toux, a, par, Amphibiens, genou)

ent = doc.ents[0]

ent.label_
# Out: umls

ent._.umls
# Out: C0010200

You can easily change the default languages and sources with the pattern_config argument:

import edsnlp, edsnlp.pipes as eds

# Enable the French and English languages, through the French MeSH and LOINC
pattern_config = dict(languages=["FRE", "ENG"], sources=["MSHFRE", "LNC"])

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.umls(pattern_config=pattern_config))

See more options of languages and sources here.

Parameters

PARAMETER	DESCRIPTION
`nlp`	spaCy `Language` object. TYPE: `PipelineProtocol`
`name`	The name of the pipe TYPE: `str` DEFAULT: `'umls'`
`attr`	Attribute to match on, eg `TEXT`, `NORM`, etc. TYPE: `Union[str, Dict[str, str]]` DEFAULT: `'NORM'`
`ignore_excluded`	Whether to skip excluded tokens during matching. TYPE: `bool` DEFAULT: `False`
`ignore_space_tokens`	Whether to skip space tokens during matching. TYPE: `bool` DEFAULT: `False`
`term_matcher`	The term matcher to use, either "exact" or "simstring" TYPE: `TerminologyTermMatcher` DEFAULT: `'exact'`
`term_matcher_config`	The configuration for the term matcher TYPE: `Dict[str, Any]` DEFAULT: `{}`
`pattern_config`	The pattern retriever configuration TYPE: `Dict[str, Any]` DEFAULT: `dict(languages=['FRE'], sources=None)`
`label`	Label name to use for the `Span` object and the extension TYPE: `str` DEFAULT: `'umls'`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'umls': True}`

Authors and citation

The eds.umls pipeline was developed by AP-HP's Data Science team and INRIA SODA's team.