edsnlp.pipelines.misc.sections
patterns
These section titles were extracted from a work performed by Ivan Lerner at AP-HP. It supplied a number of documents annotated for section titles.
The section titles were reviewed by Gilles Chatellier, who gave meaningful insights.
See sections/section-dataset notebook for detail.
allergies = ['allergies']
module-attribute
antecedents = ['antecedents', 'antecedents medicaux et chirurgicaux', 'antecedents personnels', 'antecedents medicaux', 'antecedents chirurgicaux', 'atcd']
module-attribute
antecedents_familiaux = ['antecedents familiaux']
module-attribute
traitements_entree = ['attitude therapeutique initiale', "traitement a l'entree", 'traitement actuel', 'traitement en cours', "traitements a l'entree"]
module-attribute
conclusion = ['au total', 'conclusion', 'conclusion de sortie', 'syntese medicale / conclusion', 'synthese', 'synthese medicale', 'synthese medicale/conclusion', 'conclusion medicale']
module-attribute
conclusion_entree = ["conclusion a l'entree"]
module-attribute
habitus = ['contexte familial et social', 'habitus', 'mode de vie', 'mode de vie - scolarite', 'situation sociale, mode de vie']
module-attribute
correspondants = ['correspondants']
module-attribute
diagnostic = ['diagnostic retenu']
module-attribute
donnees_biometriques_entree = ["donnees biometriques et parametres vitaux a l'entree", "parametres vitaux et donnees biometriques a l'entree"]
module-attribute
examens = ['examen clinique', "examen clinique a l'entree"]
module-attribute
examens_complementaires = ['examen(s) complementaire(s)', 'examens complementaires', "examens complementaires a l'entree", 'examens complementaires realises pendant le sejour', 'examens para-cliniques']
module-attribute
facteurs_de_risques = ['facteurs de risque', 'facteurs de risques']
module-attribute
histoire_de_la_maladie = ['histoire de la maladie', 'histoire de la maladie - explorations', 'histoire de la maladie actuelle', 'histoire du poids', 'histoire recente', 'histoire recente de la maladie', 'rappel clinique', 'resume', 'resume clinique']
module-attribute
actes = ['intervention']
module-attribute
motif = ['motif', "motif d'hospitalisation", "motif de l'hospitalisation", 'motif medical']
module-attribute
prescriptions = ['prescriptions de sortie', 'prescriptions medicales de sortie']
module-attribute
traitements_sortie = ['traitement de sortie']
module-attribute
sections = {'allergies': allergies, 'antécédents': antecedents, 'antécédents familiaux': antecedents_familiaux, 'traitements entrée': traitements_entree, 'conclusion': conclusion, 'conclusion entrée': conclusion_entree, 'habitus': habitus, 'correspondants': correspondants, 'diagnostic': diagnostic, 'données biométriques entrée': donnees_biometriques_entree, 'examens': examens, 'examens complémentaires': examens_complementaires, 'facteurs de risques': facteurs_de_risques, 'histoire de la maladie': histoire_de_la_maladie, 'actes': actes, 'motif': motif, 'prescriptions': prescriptions, 'traitements sortie': traitements_sortie}
module-attribute
sections
Sections
Bases: GenericMatcher
Divides the document into sections.
By default, we are using a dataset of documents annotated for section titles, using the work done by Ivan Lerner, reviewed by Gilles Chatellier.
Detected sections are :
- allergies ;
- antécédents ;
- antécédents familiaux ;
- traitements entrée ;
- conclusion ;
- conclusion entrée ;
- habitus ;
- correspondants ;
- diagnostic ;
- données biométriques entrée ;
- examens ;
- examens complémentaires ;
- facteurs de risques ;
- histoire de la maladie ;
- actes ;
- motif ;
- prescriptions ;
- traitements sortie.
The component looks for section titles within the document,
and stores them in the section_title
extension.
For ease-of-use, the component also populates a section
extension,
which contains a list of spans corresponding to the "sections" of the
document. These span from the start of one section title to the next,
which can introduce obvious bias should an intermediate section title
goes undetected.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy pipeline object.
TYPE:
|
sections |
Dictionary of terms to look for.
TYPE:
|
attr |
Default attribute to match on.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/sections/sections.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
add_patterns = add_patterns
instance-attribute
__init__(nlp, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/sections.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/sections/sections.py
96 97 98 99 100 101 102 103 |
|
__call__(doc)
Divides the doc into sections
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for sections |
Source code in edsnlp/pipelines/misc/sections/sections.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
factory
DEFAULT_CONFIG = dict(sections=None, add_patterns=True, attr='NORM', ignore_excluded=True)
module-attribute
create_component(nlp, name, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/factory.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|