Sections
The eds.sections
component extracts section titles from clinical documents. A "section" is then defined as the span of text between two titles.
Here is the list of sections that are currently targeted :
allergies
antécédents
antécédents familiaux
traitements entrée
conclusion
conclusion entrée
habitus
correspondants
diagnostic
données biométriques entrée
examens
examens complémentaires
facteurs de risques
histoire de la maladie
actes
motif
prescriptions
traitements sortie
evolution
modalites sortie
vaccinations
introduction
Remarks :
- section
introduction
corresponds to the span of text between the header "COMPTE RENDU D'HOSPITALISATION" (usually denoting the beginning of the document) and the title of the following detected section - this matcher works well for hospitalization summaries (CRH), but not necessarily for all types of documents (in particular for emergency or scan summaries CR-IMAGERIE)
Experimental
Should you rely on eds.sections
for critical downstream tasks, make sure to validate the results to make sure that the component works in your case.
Examples
The following snippet detects section titles. It is complete and can be run as is.
import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sections())
text = """
CRU du 10/09/2021
Motif :
Patient admis pour suspicion de COVID
"""
doc = nlp(text)
doc.spans["section_titles"]
# Out: [Motif]
Extensions
The eds.sections
matcher adds two fields to the doc.spans
attribute :
- The
section_titles
key contains the list of all section titles extracted using the list declared in theterms.py
module. - The
sections
key contains a list of sections, ie spans of text between two section titles (or the last title and the end of the document).
If the document has entities before calling this matcher an attribute section
is added to each entity.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object. TYPE: |
sections | Dictionary of terms to look for. TYPE: |
attr | Default attribute to match on. TYPE: |
add_patterns | Whether add update patterns to match start / end of lines TYPE: |
ignore_excluded | Whether to skip excluded tokens. TYPE: |
Authors and citation
The eds.sections
matcher was developed by AP-HP's Data Science team.