CIM10

The eds.cim10 pipeline component matches the CIM10 (French-language ICD) terminology.

Very low recall

When using the exact' matching mode, this component has a very poor recall performance. We can use thesimstring` mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.

Usage

import spacy

nlp = spacy.blank("fr")
nlp.add_pipe("eds.cim10", config=dict(term_matcher="simstring"))

text = "Le patient est suivi pour fièvres typhoïde et paratyphoïde."

doc = nlp(text)

doc.ents
# Out: (fièvres typhoïde et paratyphoïde,)

ent = doc.ents[0]

ent.label_
# Out: cim10

ent.kb_id_
# Out: A01

Configuration

The pipeline can be configured using the following parameters :

PARAMETER	DESCRIPTION
`attr`	Attribute to match on, eg `TEXT`, `NORM`, etc. TYPE: `Union[str, Dict[str, str]]` DEFAULT: `'NORM'`
`ignore_excluded`	Whether to skip excluded tokens during matching. TYPE: `bool` DEFAULT: `False`
`ignore_space_tokens`	Whether to skip space tokens during matching. TYPE: `bool` DEFAULT: `False`
`term_matcher`	The term matcher to use, either `TerminologyTermMatcher.exact` or `TerminologyTermMatcher.simstring` TYPE: `TerminologyTermMatcher` DEFAULT: `TerminologyTermMatcher.exact`
`term_matcher_config`	The configuration for the term matcher TYPE: `Dict[str, Any]` DEFAULT: `{}`

Authors and citation

The eds.cim10 pipeline was developed by AP-HP's Data Science team.