Skip to content

CIM10

The eds.cim10 pipeline component extract terms from documents using the CIM10 (French-language ICD) terminology as a reference.

Very low recall

When using the exact matching mode, this component has a very poor recall performance. We can use the simstring mode to retrieve approximate matches, albeit at the cost of a significantly higher computation time.

Examples

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.cim10", config=dict(term_matcher="simstring"))

text = "Le patient est suivi pour fièvres typhoïde et paratyphoïde."

doc = nlp(text)

doc.ents
# Out: (fièvres typhoïde et paratyphoïde,)

ent = doc.ents[0]

ent.label_
# Out: cim10

ent.kb_id_
# Out: A01

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object

TYPE: PipelineProtocol

name

The name of the component

TYPE: str DEFAULT: 'eds.cim10'

attr

The default attribute to use for matching.

TYPE: str DEFAULT: 'NORM'

ignore_excluded

Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens).

TYPE: bool DEFAULT: False

ignore_space_tokens

Whether to skip space tokens during matching.

TYPE: bool DEFAULT: False

term_matcher

The matcher to use for matching phrases ? One of (exact, simstring)

TYPE: Literal['exact', 'simstring'] DEFAULT: 'exact'

term_matcher_config

Parameters of the matcher term matcher

TYPE: Dict[str, Any] DEFAULT: {}

label

Label name to use for the Span object and the extension

TYPE: str DEFAULT: 'cim10'

span_setter

How to set matches on the doc

TYPE: SpanSetterArg DEFAULT: {'ents': True, 'cim10': True}

RETURNS DESCRIPTION
TerminologyMatcher

Authors and citation

The eds.cim10 pipeline was developed by AP-HP's Data Science team.