Disorders
Presentation
The following components extract 16 different conditions from the Charlson Comorbidity Index. Each component is based on the ContextualMatcher component.
The components were developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by Petit-Jean et al., 2024
Some general considerations about those components:
- Extracted entities are stored in
doc.ents
anddoc.spans
. For instance, theeds.tobacco
component stores matches indoc.spans["tobacco"]
. - The matched comorbidity is also available under the
ent.label_
of each match. - Matches have an associated
_.status
attribute taking the value1
, or2
. A corresponding_.detailed_status
attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details. - Some components add additional information to matches. For instance, the
tobacco
adds, if relevant, extracted pack-year (= paquet-année). Those information are available under theent._.assigned
attribute. -
Those components work on normalized documents. Please use the
eds.normalizer
pipeline with the following parameters:import edsnlp, edsnlp.pipes as eds ... nlp.add_pipe( eds.normalizer( accents=True, lowercase=True, quotes=True, spaces=True, pollution=dict( information=True, bars=True, biology=True, doctors=True, web=True, coding=True, footer=True, ), ), )
Use qualifiers
Those components should be used with a qualification pipeline to avoid extracted unwanted matches. At the very least, you can use available rule-based qualifiers (eds.negation
, eds.hypothesis
and eds.family
). Better, a machine learning qualification component was developed and trained specifically for those components. For privacy reason, the model isn't publicly available yet.
Use the ML model
The model will soon be available in the models catalogue of AP-HP's CDW.
On the medical definition of the comorbidities
Those components were developped to extract chronic and symptomatic conditions only.
Aggregation
For relevant phenotyping, matches should be aggregated at the document-level. For instance, a document might mention a complicated diabetes at the beginning ("Le patient a une rétinopathie diabétique"), and then refer to this diabetes without mentionning that it is complicated anymore ("Concernant son diabète, le patient ..."). Thus, a good and simple aggregation rule is, for each comorbidity, to
- disregard all entities tagged as irrelevant by the qualification component(s)
- take the maximum (i.e., the most severe) status of the leftover entities
An implementation of this rule is presented here
Petit-Jean T., Gérardin C., Berthelot E., Chatellier G., Frank M., Tannier X., Kempf E. and Bey R., 2024. Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions. Journal of the American Medical Informatics Association. 31, pp.1280-1290. 10.1093/jamia/ocae069