CIM10

The eds.cim10 pipeline component extract terms from documents using the CIM10 (French-language ICD) terminology as a reference.

Very low recall

When using the exact matching mode, this component has a very poor recall performance. We can use the simstring mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.

Examples

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.cim10", config=dict(term_matcher="simstring"))

text = "Le patient est suivi pour fièvres typhoïde et paratyphoïde."

doc = nlp(text)

doc.ents
# Out: (fièvres typhoïde et paratyphoïde,)

ent = doc.ents[0]

ent.label_
# Out: cim10

ent.kb_id_
# Out: A01

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object TYPE: `PipelineProtocol`
`name`	The name of the component TYPE: `str` DEFAULT: `'eds.cim10'`
`attr`	The default attribute to use for matching. TYPE: `str` DEFAULT: `'NORM'`
`ignore_excluded`	Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens). TYPE: `bool` DEFAULT: `False`
`ignore_space_tokens`	Whether to skip space tokens during matching. TYPE: `bool` DEFAULT: `False`
`term_matcher`	The matcher to use for matching phrases ? One of (exact, simstring) TYPE: `Literal['exact', 'simstring']` DEFAULT: `'exact'`
`term_matcher_config`	Parameters of the matcher term matcher TYPE: `Dict[str, Any]` DEFAULT: `{}`
`label`	Label name to use for the `Span` object and the extension TYPE: `str` DEFAULT: `'cim10'`
`span_setter`	How to set matches on the doc TYPE: `SpanSetterArg` DEFAULT: `{'ents': True, 'cim10': True}`

RETURNS	DESCRIPTION
`TerminologyMatcher`

Authors and citation

The eds.cim10 pipeline was developed by AP-HP's Data Science team.