CIM10
The eds.cim10
pipeline component extract terms from documents using the CIM10 (French-language ICD) terminology as a reference.
Very low recall
When using the exact
matching mode, this component has a very poor recall performance. We can use the simstring
mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.
Examples
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.cim10", config=dict(term_matcher="simstring"))
text = "Le patient est suivi pour fièvres typhoïde et paratyphoïde."
doc = nlp(text)
doc.ents
# Out: (fièvres typhoïde et paratyphoïde,)
ent = doc.ents[0]
ent.label_
# Out: cim10
ent.kb_id_
# Out: A01
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | The name of the component TYPE: |
attr | The default attribute to use for matching. TYPE: |
ignore_excluded | Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens). TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. TYPE: |
term_matcher | The matcher to use for matching phrases ? One of (exact, simstring) TYPE: |
term_matcher_config | Parameters of the matcher term matcher TYPE: |
label | Label name to use for the TYPE: |
span_setter | How to set matches on the doc TYPE: |
RETURNS | DESCRIPTION |
---|---|
TerminologyMatcher | |
Authors and citation
The eds.cim10
pipeline was developed by AP-HP's Data Science team.