Sections
The eds.sections component extracts section titles from clinical documents. A "section" is then defined as the span of text between two titles.
Here is the list of sections that are currently targeted :
allergiesantécédentsantécédents familiauxtraitements entréeconclusionconclusion entréehabituscorrespondantsdiagnosticdonnées biométriques entréeexamensexamens complémentairesfacteurs de risqueshistoire de la maladieactesmotifprescriptionstraitements sortieevolutionmodalites sortievaccinationsintroduction
Remarks :
- section
introductioncorresponds to the span of text between the header "COMPTE RENDU D'HOSPITALISATION" (usually denoting the beginning of the document) and the title of the following detected section - this matcher works well for hospitalization summaries (CRH), but not necessarily for all types of documents (in particular for emergency or scan summaries CR-IMAGERIE)
Experimental
Should you rely on eds.sections for critical downstream tasks, make sure to validate the results to make sure that the component works in your case.
Examples
The following snippet detects section titles. It is complete and can be run as is.
import edsnlp
nlp = edsnlp.blank("eds")
nlp.add_pipe("eds.normalizer")
nlp.add_pipe("eds.sections")
text = """
CRU du 10/09/2021
Motif :
Patient admis pour suspicion de COVID
"""
doc = nlp(text)
doc.spans["section_titles"]
# Out: [Motif]
Extensions
The eds.sections matcher adds two fields to the doc.spans attribute :
- The
section_titleskey contains the list of all section titles extracted using the list declared in theterms.pymodule. - The
sectionskey contains a list of sections, ie spans of text between two section titles (or the last title and the end of the document).
If the document has entities before calling this matcher an attribute section is added to each entity.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
nlp | The pipeline object. TYPE: |
sections | Dictionary of terms to look for. TYPE: |
attr | Default attribute to match on. TYPE: |
add_patterns | Whether add update patterns to match start / end of lines TYPE: |
ignore_excluded | Whether to skip excluded tokens. TYPE: |
Authors and citation
The eds.sections matcher was developed by AP-HP's Data Science team.