Sections
Detected sections are :
allergiesantécédentsantécédents familiauxtraitements entréeconclusionconclusion entréehabituscorrespondantsdiagnosticdonnées biométriques entréeexamensexamens complémentairesfacteurs de risqueshistoire de la maladieactesmotifprescriptionstraitements sortie
The pipeline extracts section title. A "section" is then defined as the span of text between two titles.
Use at your own risks
Should you rely on eds.sections for critical downstream tasks, make sure to validate the pipeline to make sure that the component works.
For instance, the eds.history pipeline can use sections to make its predictions, but that possibility is deactivated by default.
Usage
The following snippet detects section titles. It is complete and can be run as is.
import spacy
nlp = spacy.blank("fr")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sections")
text = "CRU du 10/09/2021\n" "Motif :\n" "Patient admis pour suspicion de COVID"
doc = nlp(text)
doc.spans["section_titles"]
# Out: [Motif :]
Configuration
The pipeline can be configured using the following parameters :
| Parameter | Explanation | Default |
|---|---|---|
sections |
Sections patterns | None (use pre-defined patterns) |
add_patterns |
Whether add endlines patterns | False |
attr |
spaCy attribute to match on, eg NORM or TEXT |
"NORM" |
ignore_excluded |
Whether to ignore excluded tokens | True |
Declared extensions
The eds.sections pipeline adds two fields to the doc.spans attribute :
- The
section_titleskey contains the list of all section titles extracted using the list declared in theterms.pymodule. - The
sectionskey contains a list of sections, ie spans of text between two section titles (or the last title and the end of the document).
Authors and citation
The eds.sections pipeline was developed by AP-HP's Data Science team.