Sections

The eds.sections component extracts section titles from clinical documents. A "section" is then defined as the span of text between two titles.

Here is the list of sections that are currently targeted :

allergies
antécédents
antécédents familiaux
traitements entrée
conclusion
conclusion entrée
habitus
correspondants
diagnostic
données biométriques entrée
examens
examens complémentaires
facteurs de risques
histoire de la maladie
actes
motif
prescriptions
traitements sortie
evolution
modalites sortie
vaccinations
introduction

Remarks :

section introduction corresponds to the span of text between the header "COMPTE RENDU D'HOSPITALISATION" (usually denoting the beginning of the document) and the title of the following detected section
this matcher works well for hospitalization summaries (CRH), but not necessarily for all types of documents (in particular for emergency or scan summaries CR-IMAGERIE)

Experimental

Should you rely on eds.sections for critical downstream tasks, make sure to validate the results to make sure that the component works in your case.

Examples

The following snippet detects section titles. It is complete and can be run as is.

import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sections())

text = """
CRU du 10/09/2021
Motif :
Patient admis pour suspicion de COVID
"""

doc = nlp(text)

doc.spans["section_titles"]
# Out: [Motif]

Extensions

The eds.sections matcher adds two fields to the doc.spans attribute :

The section_titles key contains the list of all section titles extracted using the list declared in the terms.py module.
The sections key contains a list of sections, ie spans of text between two section titles (or the last title and the end of the document).

If the document has entities before calling this matcher an attribute section is added to each entity.

Parameters

PARAMETER	DESCRIPTION
`nlp`	The pipeline object. TYPE: `PipelineProtocol`
`sections`	Dictionary of terms to look for. TYPE: `Dict[str, List[str]]` DEFAULT: `{'allergies': ['allergies'], 'antécédents': ['a...`
`attr`	Default attribute to match on. TYPE: `str` DEFAULT: `NORM`
`add_patterns`	Whether add update patterns to match start / end of lines TYPE: `bool` DEFAULT: `True`
`ignore_excluded`	Whether to skip excluded tokens. TYPE: `bool` DEFAULT: `True`

Authors and citation

The eds.sections matcher was developed by AP-HP's Data Science team.