UMLS
The eds.umls
pipeline component matches the UMLS (Unified Medical Language System from NIH) terminology.
Very low recall
When using the exact
matching mode, this component has a very poor recall performance.
We can use the simstring
mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.
Usage
eds.umls
is an additional module that needs to be setup by:
pip install -U umls_downloader
- Signing up for a UMLS Terminology Services Account. After filling a short form, you will receive your token API within a few days.
- Set
UMLS_API_KEY
locally:export UMLS_API_KEY=your_api_key
import spacy
nlp = spacy.blank("fr")
nlp.add_pipe("eds.umls")
text = "Grosse toux: le malade a été mordu par des Amphibiens " "sous le genou"
doc = nlp(text)
doc.ents
# Out: (toux, a, par, Amphibiens, genou)
ent = doc.ents[0]
ent.label_
# Out: umls
ent._.umls
# Out: C0010200
You can easily change the default languages and sources with the pattern_config
argument:
import spacy
# Enable the french and english languages, through the french MeSH and LOINC
pattern_config = dict(languages=["FRE", "ENG"], sources=["MSHFRE", "LNC"])
nlp = spacy.blank("fr")
nlp.add_pipe("eds.umls", config=dict(pattern_config=pattern_config))
See more options of languages and sources here.
Configuration
The pipeline can be configured using the following parameters :
PARAMETER | DESCRIPTION |
---|---|
attr |
Attribute to match on, eg
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens during matching.
TYPE:
|
ignore_space_tokens |
Whether to skip space tokens during matching.
TYPE:
|
term_matcher |
The term matcher to use, either
TYPE:
|
term_matcher_config |
The configuration for the term matcher
TYPE:
|
pattern_config |
The pattern retriever configuration
TYPE:
|
Authors and citation
The eds.umls
pipeline was developed by AP-HP's Data Science team and INRIA SODA's team.