Skip to content

Sections

The eds.sections component extracts section titles from clinical documents. A "section" is then defined as the span of text between two titles.

Here is the list of sections that are currently targeted :

  • allergies
  • antécédents
  • antécédents familiaux
  • traitements entrée
  • conclusion
  • conclusion entrée
  • habitus
  • correspondants
  • diagnostic
  • données biométriques entrée
  • examens
  • examens complémentaires
  • facteurs de risques
  • histoire de la maladie
  • actes
  • motif
  • prescriptions
  • traitements sortie
  • evolution
  • modalites sortie
  • vaccinations
  • introduction

Remarks :

  • section introduction corresponds to the span of text between the header "COMPTE RENDU D'HOSPITALISATION" (usually denoting the beginning of the document) and the title of the following detected section
  • this matcher works well for hospitalization summaries (CRH), but not necessarily for all types of documents (in particular for emergency or scan summaries CR-IMAGERIE)

Experimental

Should you rely on eds.sections for critical downstream tasks, make sure to validate the results to make sure that the component works in your case.

Examples

The following snippet detects section titles. It is complete and can be run as is.

import edsnlp

nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sections")

text = """
CRU du 10/09/2021
Motif :
Patient admis pour suspicion de COVID
"""

doc = nlp(text)

doc.spans["section_titles"]
# Out: [Motif]

Extensions

The eds.sections matcher adds two fields to the doc.spans attribute :

  1. The section_titles key contains the list of all section titles extracted using the list declared in the terms.py module.
  2. The sections key contains a list of sections, ie spans of text between two section titles (or the last title and the end of the document).

If the document has entities before calling this matcher an attribute section is added to each entity.

Parameters

PARAMETER DESCRIPTION
nlp

The pipeline object.

TYPE: PipelineProtocol

sections

Dictionary of terms to look for.

TYPE: Dict[str, List[str]] DEFAULT: {'allergies': ['allergies'], 'antécédents': ['a...

attr

Default attribute to match on.

TYPE: str DEFAULT: NORM

add_patterns

Whether add update patterns to match start / end of lines

TYPE: bool DEFAULT: True

ignore_excluded

Whether to skip excluded tokens.

TYPE: bool DEFAULT: True

Authors and citation

The eds.sections matcher was developed by AP-HP's Data Science team.