edsnlp.pipelines.misc.sections
factory
DEFAULT_CONFIG = dict(sections=None, add_patterns=True, attr='NORM', ignore_excluded=True)
module-attribute
create_component(nlp, name, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/factory.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
sections
Sections
Bases: GenericMatcher
Divides the document into sections.
By default, we are using a dataset of documents annotated for section titles, using the work done by Ivan Lerner, reviewed by Gilles Chatellier.
Detected sections are :
- allergies ;
- antécédents ;
- antécédents familiaux ;
- traitements entrée ;
- conclusion ;
- conclusion entrée ;
- habitus ;
- correspondants ;
- diagnostic ;
- données biométriques entrée ;
- examens ;
- examens complémentaires ;
- facteurs de risques ;
- histoire de la maladie ;
- actes ;
- motif ;
- prescriptions ;
- traitements sortie.
The component looks for section titles within the document,
and stores them in the section_title
extension.
For ease-of-use, the component also populates a section
extension,
which contains a list of spans corresponding to the "sections" of the
document. These span from the start of one section title to the next,
which can introduce obvious bias should an intermediate section title
goes undetected.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy pipeline object.
TYPE:
|
sections |
Dictionary of terms to look for.
TYPE:
|
attr |
Default attribute to match on.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/sections/sections.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
add_patterns = add_patterns
instance-attribute
__init__(nlp, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/sections.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/sections/sections.py
97 98 99 100 101 102 103 104 |
|
__call__(doc)
Divides the doc into sections
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for sections |
Source code in edsnlp/pipelines/misc/sections/sections.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
patterns
These section titles were extracted from a work performed by Ivan Lerner at AP-HP. It supplied a number of documents annotated for section titles.
The section titles were reviewed by Gilles Chatellier, who gave meaningful insights.
See sections/section-dataset notebook for detail.