edsnlp.pipelines.misc.sections.sections
Sections
Bases: GenericMatcher
Divides the document into sections.
By default, we are using a dataset of documents annotated for section titles, using the work done by Ivan Lerner, reviewed by Gilles Chatellier.
Detected sections are :
- allergies ;
- antécédents ;
- antécédents familiaux ;
- traitements entrée ;
- conclusion ;
- conclusion entrée ;
- habitus ;
- correspondants ;
- diagnostic ;
- données biométriques entrée ;
- examens ;
- examens complémentaires ;
- facteurs de risques ;
- histoire de la maladie ;
- actes ;
- motif ;
- prescriptions ;
- traitements sortie.
The component looks for section titles within the document,
and stores them in the section_title
extension.
For ease-of-use, the component also populates a section
extension,
which contains a list of spans corresponding to the "sections" of the
document. These span from the start of one section title to the next,
which can introduce obvious bias should an intermediate section title
goes undetected.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy pipeline object.
TYPE:
|
sections |
Dictionary of terms to look for.
TYPE:
|
attr |
Default attribute to match on.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/sections/sections.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
add_patterns = add_patterns
instance-attribute
__init__(nlp, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/sections.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/sections/sections.py
101 102 103 104 105 106 107 108 |
|
tag_ents(sections)
Source code in edsnlp/pipelines/misc/sections/sections.py
110 111 112 113 114 |
|
__call__(doc)
Divides the doc into sections
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for sections |
Source code in edsnlp/pipelines/misc/sections/sections.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|