Sections

Detected sections are :

allergies
antécédents
antécédents familiaux
traitements entrée
conclusion
conclusion entrée
habitus
correspondants
diagnostic
données biométriques entrée
examens
examens complémentaires
facteurs de risques
histoire de la maladie
actes
motif
prescriptions
traitements sortie

The pipeline extracts section title. A "section" is then defined as the span of text between two titles.

Use at your own risks

Should you rely on eds.sections for critical downstream tasks, make sure to validate the pipeline to make sure that the component works. For instance, the eds.history pipeline can use sections to make its predictions, but that possibility is deactivated by default.

Usage

The following snippet detects section titles. It is complete and can be run as is.

import spacy

nlp = spacy.blank("fr")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sections")

text = "CRU du 10/09/2021\n" "Motif :\n" "Patient admis pour suspicion de COVID"

doc = nlp(text)

doc.spans["section_titles"]
# Out: [Motif :]

Configuration

The pipeline can be configured using the following parameters :

Parameter	Explanation	Default
`sections`	Sections patterns	`None` (use pre-defined patterns)
`add_patterns`	Whether add endlines patterns	`False`
`attr`	spaCy attribute to match on, eg `NORM` or `TEXT`	`"NORM"`
`ignore_excluded`	Whether to ignore excluded tokens	`True`

Declared extensions

The eds.sections pipeline adds two fields to the doc.spans attribute :

The section_titles key contains the list of all section titles extracted using the list declared in the terms.py module.
The sections key contains a list of sections, ie spans of text between two section titles (or the last title and the end of the document).

Authors and citation

The eds.sections pipeline was developed by AP-HP's Data Science team.