UMLS
The eds.umls
pipeline component matches the UMLS (Unified Medical Language System from NIH) terminology.
Very low recall
When using the exact
matching mode, this component has a very poor recall performance. We can use the simstring
mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.
Examples
eds.umls
is an additional module that needs to be setup by:
pip install -U umls_downloader
- Signing up for a UMLS Terminology Services Account. After filling a short form, you will receive your token API within a few days.
- Set
UMLS_API_KEY
locally:export UMLS_API_KEY=your_api_key
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.umls())
text = "Grosse toux: le malade a été mordu par des Amphibiens " "sous le genou"
doc = nlp(text)
doc.ents
# Out: (toux, a, par, Amphibiens, genou)
ent = doc.ents[0]
ent.label_
# Out: umls
ent._.umls
# Out: C0010200
You can easily change the default languages and sources with the pattern_config
argument:
import edsnlp, edsnlp.pipes as eds
# Enable the French and English languages, through the French MeSH and LOINC
pattern_config = dict(languages=["FRE", "ENG"], sources=["MSHFRE", "LNC"])
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.umls(pattern_config=pattern_config))
See more options of languages and sources here.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | spaCy TYPE: |
name | The name of the pipe TYPE: |
attr | Attribute to match on, eg TYPE: |
ignore_excluded | Whether to skip excluded tokens during matching. TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. TYPE: |
term_matcher | The term matcher to use, either "exact" or "simstring" TYPE: |
term_matcher_config | The configuration for the term matcher TYPE: |
pattern_config | The pattern retriever configuration TYPE: |
label | Label name to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
Authors and citation
The eds.umls
pipeline was developed by AP-HP's Data Science team and INRIA SODA's team.