edsnlp.pipelines.misc
sections
factory
DEFAULT_CONFIG = dict(sections=None, add_patterns=True, attr='NORM', ignore_excluded=True)
module-attribute
create_component(nlp, name, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/factory.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
patterns
These section titles were extracted from a work performed by Ivan Lerner at AP-HP. It supplied a number of documents annotated for section titles.
The section titles were reviewed by Gilles Chatellier, who gave meaningful insights.
See sections/section-dataset notebook for detail.
allergies = ['allergies']
module-attribute
antecedents = ['antecedents', 'antecedents medicaux et chirurgicaux', 'antecedents personnels', 'antecedents medicaux', 'antecedents chirurgicaux', 'atcd']
module-attribute
antecedents_familiaux = ['antecedents familiaux']
module-attribute
traitements_entree = ['attitude therapeutique initiale', "traitement a l'entree", 'traitement actuel', 'traitement en cours', "traitements a l'entree"]
module-attribute
conclusion = ['au total', 'conclusion', 'conclusion de sortie', 'syntese medicale / conclusion', 'synthese', 'synthese medicale', 'synthese medicale/conclusion', 'conclusion medicale']
module-attribute
conclusion_entree = ["conclusion a l'entree"]
module-attribute
habitus = ['contexte familial et social', 'habitus', 'mode de vie', 'mode de vie - scolarite', 'situation sociale, mode de vie']
module-attribute
correspondants = ['correspondants']
module-attribute
diagnostic = ['diagnostic retenu']
module-attribute
donnees_biometriques_entree = ["donnees biometriques et parametres vitaux a l'entree", "parametres vitaux et donnees biometriques a l'entree"]
module-attribute
examens = ['examen clinique', "examen clinique a l'entree"]
module-attribute
examens_complementaires = ['examen(s) complementaire(s)', 'examens complementaires', "examens complementaires a l'entree", 'examens complementaires realises pendant le sejour', 'examens para-cliniques']
module-attribute
facteurs_de_risques = ['facteurs de risque', 'facteurs de risques']
module-attribute
histoire_de_la_maladie = ['histoire de la maladie', 'histoire de la maladie - explorations', 'histoire de la maladie actuelle', 'histoire du poids', 'histoire recente', 'histoire recente de la maladie', 'rappel clinique', 'resume', 'resume clinique']
module-attribute
actes = ['intervention']
module-attribute
motif = ['motif', "motif d'hospitalisation", "motif de l'hospitalisation", 'motif medical']
module-attribute
prescriptions = ['prescriptions de sortie', 'prescriptions medicales de sortie']
module-attribute
traitements_sortie = ['traitement de sortie']
module-attribute
sections = {'allergies': allergies, 'antécédents': antecedents, 'antécédents familiaux': antecedents_familiaux, 'traitements entrée': traitements_entree, 'conclusion': conclusion, 'conclusion entrée': conclusion_entree, 'habitus': habitus, 'correspondants': correspondants, 'diagnostic': diagnostic, 'données biométriques entrée': donnees_biometriques_entree, 'examens': examens, 'examens complémentaires': examens_complementaires, 'facteurs de risques': facteurs_de_risques, 'histoire de la maladie': histoire_de_la_maladie, 'actes': actes, 'motif': motif, 'prescriptions': prescriptions, 'traitements sortie': traitements_sortie}
module-attribute
sections
Sections
Bases: GenericMatcher
Divides the document into sections.
By default, we are using a dataset of documents annotated for section titles, using the work done by Ivan Lerner, reviewed by Gilles Chatellier.
Detected sections are :
- allergies ;
- antécédents ;
- antécédents familiaux ;
- traitements entrée ;
- conclusion ;
- conclusion entrée ;
- habitus ;
- correspondants ;
- diagnostic ;
- données biométriques entrée ;
- examens ;
- examens complémentaires ;
- facteurs de risques ;
- histoire de la maladie ;
- actes ;
- motif ;
- prescriptions ;
- traitements sortie.
The component looks for section titles within the document,
and stores them in the section_title
extension.
For ease-of-use, the component also populates a section
extension,
which contains a list of spans corresponding to the "sections" of the
document. These span from the start of one section title to the next,
which can introduce obvious bias should an intermediate section title
goes undetected.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy pipeline object.
TYPE:
|
sections |
Dictionary of terms to look for.
TYPE:
|
attr |
Default attribute to match on.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/sections/sections.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
add_patterns = add_patterns
instance-attribute
__init__(nlp, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/sections.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/sections/sections.py
97 98 99 100 101 102 103 104 |
|
__call__(doc)
Divides the doc into sections
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for sections |
Source code in edsnlp/pipelines/misc/sections/sections.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
consultation_dates
consultation_dates
ConsultationDates
Bases: GenericMatcher
Class to extract consultation dates from "CR-CONS" documents.
The pipeline populates the doc.spans['consultation_dates']
list.
For each extraction s
in this list, the corresponding date is available
as s._.consultation_date
.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
consultation_mention |
List of RegEx for consultation mentions.
TYPE:
|
town_mention : Union[List[str], bool] List of RegEx for all AP-HP hospitals' towns mentions.
- If `type==list`: Overrides the default list
- If `type==bool`: Uses the default list of True, disable if False
document_date_mention : Union[List[str], bool] List of RegEx for document date.
- If `type==list`: Overrides the default list
- If `type==bool`: Uses the default list of True, disable if False
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
date_matcher = Dates(nlp, None=config)
instance-attribute
__init__(nlp, consultation_mention, town_mention, document_date_mention, attr, **kwargs)
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
109 110 111 112 |
|
__call__(doc)
Finds entities
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object with additional
|
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
factory
DEFAULT_CONFIG = dict(consultation_mention=True, town_mention=False, document_date_mention=False, attr='NORM')
module-attribute
create_component(nlp, name, attr, consultation_mention, town_mention, document_date_mention)
Source code in edsnlp/pipelines/misc/consultation_dates/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
patterns
consultation_mention = ['rendez-vous pris', 'consultation', 'consultation.{1,8}examen', 'examen clinique', 'de compte rendu', "date de l'examen", 'examen realise le', 'date de la visite']
module-attribute
town_mention = ['paris', 'kremlin.bicetre', 'creteil', 'boulogne.billancourt', 'villejuif', 'clamart', 'bobigny', 'clichy', 'ivry.sur.seine', 'issy.les.moulineaux', 'draveil', 'limeil', 'champcueil', 'roche.guyon', 'bondy', 'colombes', 'hendaye', 'herck.sur.mer', 'labruyere', 'garches', 'sevran', 'hyeres']
module-attribute
document_date_mention = ['imprime le', 'signe electroniquement', 'signe le', 'saisi le', 'dicte le', 'tape le', 'date de reference', 'date\\s*:', 'dactylographie le', 'date du rapport']
module-attribute
dates
models
Direction
Bases: Enum
Source code in edsnlp/pipelines/misc/dates/models.py
12 13 14 15 16 |
|
FUTURE = 'FUTURE'
class-attribute
PAST = 'PAST'
class-attribute
CURRENT = 'CURRENT'
class-attribute
Mode
Bases: Enum
Source code in edsnlp/pipelines/misc/dates/models.py
19 20 21 22 23 |
|
FROM = 'FROM'
class-attribute
UNTIL = 'UNTIL'
class-attribute
DURATION = 'DURATION'
class-attribute
Period
Bases: BaseModel
Source code in edsnlp/pipelines/misc/dates/models.py
26 27 28 29 30 31 32 |
|
FROM: Optional[Span] = None
class-attribute
UNTIL: Optional[Span] = None
class-attribute
DURATION: Optional[Span] = None
class-attribute
Config
Source code in edsnlp/pipelines/misc/dates/models.py
31 32 |
|
arbitrary_types_allowed = True
class-attribute
BaseDate
Bases: BaseModel
Source code in edsnlp/pipelines/misc/dates/models.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
mode: Optional[Mode] = None
class-attribute
validate_strings(d)
Source code in edsnlp/pipelines/misc/dates/models.py
39 40 41 42 43 44 45 46 47 48 |
|
AbsoluteDate
Bases: BaseDate
Source code in edsnlp/pipelines/misc/dates/models.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
year: Optional[int] = None
class-attribute
month: Optional[int] = None
class-attribute
day: Optional[int] = None
class-attribute
hour: Optional[int] = None
class-attribute
minute: Optional[int] = None
class-attribute
second: Optional[int] = None
class-attribute
to_datetime(tz='Europe/Paris', **kwargs)
Source code in edsnlp/pipelines/misc/dates/models.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
validate_year(v)
Source code in edsnlp/pipelines/misc/dates/models.py
95 96 97 98 99 100 101 |
|
Relative
Bases: BaseDate
Source code in edsnlp/pipelines/misc/dates/models.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
year: Optional[int] = None
class-attribute
month: Optional[int] = None
class-attribute
week: Optional[int] = None
class-attribute
day: Optional[int] = None
class-attribute
hour: Optional[int] = None
class-attribute
minute: Optional[int] = None
class-attribute
second: Optional[int] = None
class-attribute
parse_unit(d)
Units need to be handled separately.
This validator modifies the key corresponding to the unit with the detected value
PARAMETER | DESCRIPTION |
---|---|
d |
Original data
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, str]
|
Transformed data |
Source code in edsnlp/pipelines/misc/dates/models.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
to_datetime(**kwargs)
Source code in edsnlp/pipelines/misc/dates/models.py
139 140 141 142 143 144 145 146 147 148 149 150 |
|
RelativeDate
Bases: Relative
Source code in edsnlp/pipelines/misc/dates/models.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
direction: Direction = Direction.CURRENT
class-attribute
to_datetime(note_datetime=None)
Source code in edsnlp/pipelines/misc/dates/models.py
156 157 158 159 160 161 162 163 164 |
|
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
handle_specifics(d)
Specific patterns such as aujourd'hui
, hier
, etc,
need to be handled separately.
PARAMETER | DESCRIPTION |
---|---|
d |
Original data.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, str]
|
Modified data. |
Source code in edsnlp/pipelines/misc/dates/models.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
Duration
Bases: Relative
Source code in edsnlp/pipelines/misc/dates/models.py
209 210 211 212 213 214 215 |
|
mode: Mode = Mode.DURATION
class-attribute
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
212 213 214 215 |
|
factory
DEFAULT_CONFIG = dict(absolute=None, relative=None, duration=None, false_positive=None, detect_periods=False, on_ents_only=False, attr='LOWER')
module-attribute
create_component(nlp, name, absolute, relative, duration, false_positive, on_ents_only, detect_periods, attr)
Source code in edsnlp/pipelines/misc/dates/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
dates
eds.dates
pipeline.
PERIOD_PROXIMITY_THRESHOLD = 3
module-attribute
Dates
Bases: BaseComponent
Tags and normalizes dates, using the open-source dateparser
library.
The pipeline uses spaCy's filter_spans
function.
It filters out false positives, and introduce a hierarchy between patterns.
For instance, in case of ambiguity, the pipeline will decide that a date is a
date without a year rather than a date without a day.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
absolute |
List of regular expressions for absolute dates.
TYPE:
|
relative |
List of regular expressions for relative dates
(eg
TYPE:
|
duration |
List of regular expressions for durations
(eg
TYPE:
|
false_positive |
List of regular expressions for false positive (eg phone numbers, etc).
TYPE:
|
on_ents_only |
Wether to look on dates in the whole document or in specific sentences:
TYPE:
|
detect_periods |
Wether to detect periods (experimental)
TYPE:
|
attr |
spaCy attribute to use
TYPE:
|
Source code in edsnlp/pipelines/misc/dates/dates.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
nlp = nlp
instance-attribute
on_ents_only = on_ents_only
instance-attribute
regex_matcher = RegexMatcher(attr=attr, alignment_mode='strict')
instance-attribute
detect_periods = detect_periods
instance-attribute
__init__(nlp, absolute, relative, duration, false_positive, on_ents_only, detect_periods, attr)
Source code in edsnlp/pipelines/misc/dates/dates.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
set_extensions()
Set extensions for the dates pipeline.
Source code in edsnlp/pipelines/misc/dates/dates.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
process(doc)
Find dates in doc.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dates
|
list of date spans |
Source code in edsnlp/pipelines/misc/dates/dates.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
parse(dates)
Parse dates using the groupdict returned by the matcher.
PARAMETER | DESCRIPTION |
---|---|
dates |
List of tuples containing the spans and groupdict returned by the matcher.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of processed spans, with the date parsed. |
Source code in edsnlp/pipelines/misc/dates/dates.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
process_periods(dates)
Experimental period detection.
PARAMETER | DESCRIPTION |
---|---|
dates |
List of detected dates.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of detected periods. |
Source code in edsnlp/pipelines/misc/dates/dates.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
__call__(doc)
Tags dates.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for dates
TYPE:
|
Source code in edsnlp/pipelines/misc/dates/dates.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
patterns
false_positive
false_positive_pattern = make_pattern(['(\\d+' + delimiter_pattern + '){3,}\\d+(?!:\\d\\d)\\b', '\\d\\/\\d'])
module-attribute
relative
specific = {'minus1': ('hier', dict(direction='PAST', day=1)), 'minus2': ('avant[-\\s]hier', dict(direction='PAST', day=2)), 'plus1': ('demain', dict(direction='FUTURE', day=1)), 'plus2': ('après[-\\s]demain', dict(direction='FUTURE', day=2))}
module-attribute
specific_pattern = make_pattern(['(?P<specific_{k}>{p})' for (k, (p, _)) in specific.items()])
module-attribute
specific_dict = {k: v for (k, (_, v)) in specific.items()}
module-attribute
relative_pattern = ['(?<=' + mode_pattern + '.{,3})?' + p for p in relative_pattern]
module-attribute
make_specific_pattern(mode='forward')
Source code in edsnlp/pipelines/misc/dates/patterns/relative.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
current
current_patterns: List[str] = ['(?P<year_0>cette\\s+ann[ée]e)(?![-\\s]l[àa])', "(?P<day_0>ce\\s+jour|aujourd['\\s]?hui)", '(?P<week_0>cette\\s+semaine|ces\\sjours[-\\s]ci)', '(?P<month_0>ce\\smois([-\\s]ci)?)']
module-attribute
current_pattern = make_pattern(current_patterns, with_breaks=True)
module-attribute
absolute
no_year_pattern = [day + raw_delimiter_with_spaces_pattern + month + time_pattern + post_num_pattern for day in [ante_num_pattern + numeric_day_pattern, letter_day_pattern] for month in [numeric_month_pattern + post_num_pattern, letter_month_pattern]]
module-attribute
no_day_pattern = [letter_month_pattern + raw_delimiter_with_spaces_pattern + year_pattern + post_num_pattern, ante_num_pattern + lz_numeric_month_pattern + raw_delimiter_with_spaces_pattern + year_pattern + post_num_pattern]
module-attribute
full_year_pattern = ante_num_pattern + fy_pattern + post_num_pattern
module-attribute
absolute_pattern = ['(?<=' + mode_pattern + '.{,3})?' + p for p in absolute_pattern]
module-attribute
duration
cue_pattern = '(pendant|durant|pdt)'
module-attribute
duration_pattern = [cue_pattern + '.{,3}' + numbers.number_pattern + '\\s*' + units.unit_pattern]
module-attribute
atomic
delimiters
raw_delimiters = ['\\/', '\\-']
module-attribute
delimiters = raw_delimiters + ['\\.', '[^\\S\\r\\n]+']
module-attribute
raw_delimiter_pattern = make_pattern(raw_delimiters)
module-attribute
raw_delimiter_with_spaces_pattern = make_pattern(raw_delimiters + ['[^\\S\\r\\n]+'])
module-attribute
delimiter_pattern = make_pattern(delimiters)
module-attribute
ante_num_pattern = '(?<!.(?:{raw_delimiter_pattern})|[0-9][.,])'
module-attribute
post_num_pattern = '(?!{raw_delimiter_pattern})'
module-attribute
time
hour_pattern = '(?<!\\d)(?P<hour>0?[1-9]|1\\d|2[0-3])(?!\\d)'
module-attribute
lz_hour_pattern = '(?<!\\d)(?P<hour>0[1-9]|[12]\\d|3[01])(?!\\d)'
module-attribute
minute_pattern = '(?<!\\d)(?P<minute>0?[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
lz_minute_pattern = '(?<!\\d)(?P<minute>0[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
second_pattern = '(?<!\\d)(?P<second>0?[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
lz_second_pattern = '(?<!\\d)(?P<second>0[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
time_pattern = '(\\s.{,3}' + '{hour_pattern}[h:]({lz_minute_pattern})?' + '((:|m|min){lz_second_pattern})?' + ')?'
module-attribute
units
units = ['(?P<unit_year>ans?|ann[ée]es?)', '(?P<unit_semester>semestres?)', '(?P<unit_trimester>trimestres?)', '(?P<unit_month>mois)', '(?P<unit_week>semaines?)', '(?P<unit_day>jours?|journ[ée]es?)', '(?P<unit_hour>h|heures?)', '(?P<unit_minute>min|minutes?)', '(?P<unit_second>sec|secondes?|s)']
module-attribute
unit_pattern = make_pattern(units, with_breaks=True)
module-attribute
days
letter_days = ['(?P<day_01>premier|1\\s*er)', '(?P<day_02>deux)', '(?P<day_03>trois)', '(?P<day_04>quatre)', '(?P<day_05>cinq)', '(?P<day_06>six)', '(?P<day_07>sept)', '(?P<day_08>huit)', '(?P<day_09>neuf)', '(?P<day_10>dix)', '(?P<day_11>onze)', '(?P<day_12>douze)', '(?P<day_13>treize)', '(?P<day_14>quatorze)', '(?P<day_15>quinze)', '(?P<day_16>seize)', '(?P<day_17>dix\\-?\\s*sept)', '(?P<day_18>dix\\-?\\s*huit)', '(?P<day_19>dix\\-?\\s*neuf)', '(?P<day_20>vingt)', '(?P<day_21>vingt\\-?\\s*et\\-?\\s*un)', '(?P<day_22>vingt\\-?\\s*deux)', '(?P<day_23>vingt\\-?\\s*trois)', '(?P<day_24>vingt\\-?\\s*quatre)', '(?P<day_25>vingt\\-?\\s*cinq)', '(?P<day_26>vingt\\-?\\s*six)', '(?P<day_27>vingt\\-?\\s*sept)', '(?P<day_28>vingt\\-?\\s*huit)', '(?P<day_29>vingt\\-?\\s*neuf)', '(?P<day_30>trente)', '(?P<day_31>trente\\-?\\s*et\\-?\\s*un)']
module-attribute
letter_day_pattern = make_pattern(letter_days)
module-attribute
nlz_numeric_day_pattern = '(?<!\\d)([1-9]|[12]\\d|3[01])(?!\\d)'
module-attribute
numeric_day_pattern = '(?P<day>{numeric_day_pattern})'
module-attribute
lz_numeric_day_pattern = '(?P<day>{lz_numeric_day_pattern})'
module-attribute
day_pattern = '({letter_day_pattern}|{numeric_day_pattern})'
module-attribute
numbers
letter_numbers = ["(?P<number_01>l'|le|la|une?|ce|cette|cet)", '(?P<number_02>deux)', '(?P<number_03>trois)', '(?P<number_04>quatre)', '(?P<number_05>cinq)', '(?P<number_06>six)', '(?P<number_07>sept)', '(?P<number_08>huit)', '(?P<number_09>neuf)', '(?P<number_10>dix)', '(?P<number_11>onze)', '(?P<number_12>douze)', '(?P<number_12>treize)', '(?P<number_13>quatorze)', '(?P<number_14>quinze)', '(?P<number_15>seize)', '(?P<number_16>dix[-\\s]sept)', '(?P<number_17>dix[-\\s]huit)', '(?P<number_18>dix[-\\s]neuf)', '(?P<number_20>vingt)', '(?P<number_21>vingt[-\\s]et[-\\s]un)', '(?P<number_22>vingt[-\\s]deux)', '(?P<number_23>vingt[-\\s]trois)', '(?P<number_24>vingt[-\\s]quatre)', '(?P<number_25>vingt[-\\s]cinq)', '(?P<number_26>vingt[-\\s]six)', '(?P<number_27>vingt[-\\s]sept)', '(?P<number_28>vingt[-\\s]huit)', '(?P<number_29>vingt[-\\s]neuf)', '(?P<number_30>trente)']
module-attribute
numeric_numbers = [str(i) for i in range(1, 100)]
module-attribute
letter_number_pattern = make_pattern(letter_numbers, with_breaks=True)
module-attribute
numeric_number_pattern = make_pattern(numeric_numbers, name='number')
module-attribute
number_pattern = '({letter_number_pattern}|{numeric_number_pattern})'
module-attribute
directions
preceding_directions = ['(?P<direction_PAST>depuis|depuis\\s+le|il\\s+y\\s+a)', '(?P<direction_FUTURE>dans)']
module-attribute
following_directions = ['(?P<direction_FUTURE>prochaine?s?|suivante?s?|plus\\s+tard)', '(?P<direction_PAST>derni[eè]re?s?|passée?s?|pr[ée]c[ée]dente?s?|plus\\s+t[ôo]t)']
module-attribute
preceding_direction_pattern = make_pattern(preceding_directions, with_breaks=True)
module-attribute
following_direction_pattern = make_pattern(following_directions, with_breaks=True)
module-attribute
months
letter_months = ['(?P<month_01>janvier|janv\\.?)', '(?P<month_02>f[ée]vrier|f[ée]v\\.?)', '(?P<month_03>mars|mar\\.?)', '(?P<month_04>avril|avr\\.?)', '(?P<month_05>mai)', '(?P<month_06>juin)', '(?P<month_07>juillet|juill?\\.?)', '(?P<month_08>ao[uû]t)', '(?P<month_09>septembre|sept?\\.?)', '(?P<month_10>octobre|oct\\.?)', '(?P<month_11>novembre|nov\\.)', '(?P<month_12>d[ée]cembre|d[ée]c\\.?)']
module-attribute
letter_month_pattern = make_pattern(letter_months, with_breaks=True)
module-attribute
numeric_month_pattern = '(?P<month>{numeric_month_pattern})'
module-attribute
lz_numeric_month_pattern = '(?P<month>{lz_numeric_month_pattern})'
module-attribute
month_pattern = '({letter_month_pattern}|{numeric_month_pattern})'
module-attribute
years
year_patterns: List[str] = ['19\\d\\d'] + [str(year) for year in range(2000, date.today().year + 2)]
module-attribute
full_year_pattern = '(?<!\\d)' + full_year_pattern + '(?!\\d)'
module-attribute
year_pattern = '(?<!\\d)' + year_pattern + '(?!\\d)'
module-attribute
modes
modes = ['(?P<mode_FROM>depuis|depuis\\s+le|[àa]\\s+partir\\s+d[eu]|du)', "(?P<mode_UNTIL>jusqu'[àa]u?|au)"]
module-attribute
mode_pattern = make_pattern(modes, with_breaks=True)
module-attribute
measures
factory
DEFAULT_CONFIG = dict(attr='NORM', ignore_excluded=False, measures=['eds.measures.size', 'eds.measures.weight', 'eds.measures.angle'])
module-attribute
create_component(nlp, name, measures, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/measures/factory.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
patterns
CompositeSize
Bases: CompositeMeasure
Composite size measure. Supports the following units: - mm - cm - dm - m
Source code in edsnlp/pipelines/misc/measures/patterns.py
11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
mm = property(make_multi_getter('mm'))
class-attribute
cm = property(make_multi_getter('cm'))
class-attribute
dm = property(make_multi_getter('dm'))
class-attribute
m = property(make_multi_getter('m'))
class-attribute
Size
Bases: SimpleMeasure
Size measure. Supports the following units: - mm - cm - dm - m
Source code in edsnlp/pipelines/misc/measures/patterns.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
COMPOSITE = CompositeSize
class-attribute
UNITS = {'mm': {'prefix': 'mill?im', 'abbr': 'mm', 'value': 1}, 'cm': {'prefix': 'centim', 'abbr': 'cm', 'value': 10}, 'dm': {'prefix': 'decim', 'abbr': 'dm', 'value': 100}, 'm': {'prefix': 'metre', 'abbr': 'm', 'value': 1000}}
class-attribute
mm = property(make_simple_getter('mm'))
class-attribute
cm = property(make_simple_getter('cm'))
class-attribute
dm = property(make_simple_getter('dm'))
class-attribute
m = property(make_simple_getter('m'))
class-attribute
parse(int_part, dec_part, unit, infix=False)
Source code in edsnlp/pipelines/misc/measures/patterns.py
44 45 46 47 |
|
Weight
Bases: SimpleMeasure
Weight measure. Supports the following units: - mg - cg - dg - g - kg
Source code in edsnlp/pipelines/misc/measures/patterns.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
COMPOSITE = None
class-attribute
UNITS = {'mg': {'prefix': 'mill?ig', 'abbr': 'mg', 'value': 1}, 'cg': {'prefix': 'centig', 'abbr': 'cg', 'value': 10}, 'dg': {'prefix': 'decig', 'abbr': 'dg', 'value': 100}, 'g': {'prefix': 'gram', 'abbr': 'g', 'value': 1000}, 'kg': {'prefix': 'kilo', 'abbr': 'kg', 'value': 1000000}}
class-attribute
mg = property(make_simple_getter('mg'))
class-attribute
cg = property(make_simple_getter('cg'))
class-attribute
dg = property(make_simple_getter('dg'))
class-attribute
g = property(make_simple_getter('g'))
class-attribute
kg = property(make_simple_getter('kg'))
class-attribute
parse(int_part, dec_part, unit, infix=False)
Source code in edsnlp/pipelines/misc/measures/patterns.py
75 76 77 78 |
|
Angle
Bases: SimpleMeasure
Angle measure. Supports the following units: - h
Source code in edsnlp/pipelines/misc/measures/patterns.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
COMPOSITE = None
class-attribute
UNITS = {'h': {'prefix': 'heur', 'abbr': 'h', 'value': 1}}
class-attribute
h = property(make_simple_getter('h'))
class-attribute
parse(int_part, dec_part, unit, infix=False)
Source code in edsnlp/pipelines/misc/measures/patterns.py
99 100 101 102 103 104 105 |
|
measures
Measure
Bases: abc.ABC
Source code in edsnlp/pipelines/misc/measures/measures.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
INTEGER = '(?:[0-9]+)'
class-attribute
CONJUNCTIONS = 'et|ou'
class-attribute
COMPOSERS = '[x*]|par'
class-attribute
UNITS = {}
class-attribute
COMPOSITE = None
class-attribute
__iter__()
Iter over items of the measure (only one for SimpleMeasure)
RETURNS | DESCRIPTION |
---|---|
iterable
|
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
131 132 133 134 135 136 137 138 139 |
|
__getitem__(item)
Access items of the measure (only one for SimpleMeasure)
PARAMETER | DESCRIPTION |
---|---|
item |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
measure
|
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
SimpleMeasure
Bases: Measure
Source code in edsnlp/pipelines/misc/measures/measures.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
value = value
instance-attribute
unit = unit
instance-attribute
__init__(value, unit)
The SimpleMeasure class contains the value and unit for a single non-composite measure
PARAMETER | DESCRIPTION |
---|---|
value |
TYPE:
|
unit |
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
parse(int_part, dec_part, unit, infix)
Class method to create an instance from the match groups
int_part : str The integer part of the match (eg 12 in 12 metres 50 or 12.50metres) dec_part : str The decimal part of the match (eg 50 in 12 metres 50 or 12.50metres) unit : str The normalized variant of the unit (eg "m" for 12 metre 50) infix : bool Whether the unit was in the before (True) or after (False) the decimal part
Source code in edsnlp/pipelines/misc/measures/measures.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
_get_scale_to(unit)
Source code in edsnlp/pipelines/misc/measures/measures.py
189 190 |
|
__iter__()
Source code in edsnlp/pipelines/misc/measures/measures.py
192 193 |
|
__getitem__(item)
Source code in edsnlp/pipelines/misc/measures/measures.py
195 196 197 |
|
__str__()
Source code in edsnlp/pipelines/misc/measures/measures.py
199 200 |
|
__repr__()
Source code in edsnlp/pipelines/misc/measures/measures.py
202 203 |
|
__eq__(other)
Source code in edsnlp/pipelines/misc/measures/measures.py
205 206 |
|
__lt__(other)
Source code in edsnlp/pipelines/misc/measures/measures.py
208 209 |
|
__le__(other)
Source code in edsnlp/pipelines/misc/measures/measures.py
211 212 |
|
CompositeMeasure
Bases: Measure
The CompositeMeasure class contains a sequence of multiple SimpleMeasure instances
PARAMETER | DESCRIPTION |
---|---|
measures |
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
measures = list(measures)
instance-attribute
__init__(measures)
Source code in edsnlp/pipelines/misc/measures/measures.py
225 226 227 |
|
__iter__()
Source code in edsnlp/pipelines/misc/measures/measures.py
229 230 |
|
__getitem__(item)
Source code in edsnlp/pipelines/misc/measures/measures.py
232 233 234 235 |
|
__str__()
Source code in edsnlp/pipelines/misc/measures/measures.py
237 238 |
|
__repr__()
Source code in edsnlp/pipelines/misc/measures/measures.py
240 241 |
|
Measures
Bases: BaseComponent
Matcher component to extract measures. A measures is most often composed of a number and a unit like
1,26 cm The unit can also be positioned in place of the decimal dot/comma 1 cm 26 Some measures can be composite 1,26 cm x 2,34 mm And sometimes they are factorized Les trois kystes mesurent 1, 2 et 3cm.
The recognized measures are stored in the "measures" SpanGroup.
Each span has a Measure
object stored in the "value" extension attribute.
PARAMETER | DESCRIPTION |
---|---|
nlp |
The SpaCy object.
TYPE:
|
measures |
The registry names of the measures to extract
TYPE:
|
attr |
Whether to match on the text ('TEXT') or on the normalized text ('NORM')
TYPE:
|
ignore_excluded |
Whether to exclude pollution patterns when matching in the text
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
regex_matcher = RegexMatcher(attr=attr, ignore_excluded=ignore_excluded)
instance-attribute
extraction_regexes = {}
instance-attribute
measures: Dict[str, Measure] = {}
instance-attribute
__init__(nlp, measures, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/measures/measures.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/measures/measures.py
295 296 297 298 299 |
|
__call__(doc)
Adds measures to document's "measures" SpanGroup.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for extracted terms. |
Source code in edsnlp/pipelines/misc/measures/measures.py
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
disj_capture(regexes, capture=True)
Source code in edsnlp/pipelines/misc/measures/measures.py
14 15 16 17 18 19 20 |
|
rightmost_largest_sort_key(span)
Source code in edsnlp/pipelines/misc/measures/measures.py
23 24 |
|
make_patterns(measure)
Build recognition and extraction patterns for a given Measure class
PARAMETER | DESCRIPTION |
---|---|
measure |
The measure to build recognition and extraction patterns for
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
trigger
|
TYPE:
|
extraction
|
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
make_simple_getter(name)
Source code in edsnlp/pipelines/misc/measures/measures.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
make_multi_getter(name)
Source code in edsnlp/pipelines/misc/measures/measures.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
reason
factory
DEFAULT_CONFIG = dict(reasons=None, attr='TEXT', use_sections=False, ignore_excluded=False)
module-attribute
create_component(nlp, name, reasons, attr, use_sections, ignore_excluded)
Source code in edsnlp/pipelines/misc/reason/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
reason
Reason
Bases: GenericMatcher
Pipeline to identify the reason of the hospitalisation.
It declares a Span extension called ents_reason
and adds
the key reasons
to doc.spans.
It also declares the boolean extension is_reason
.
This extension is set to True for the Reason Spans but also
for the entities that overlap the reason span.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
reasons |
The terminology of reasons.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr'. We can also add a key for each regex.
TYPE:
|
use_sections |
whether or not use the
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/reason/reason.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
use_sections = use_sections and 'eds.sections' in self.nlp.pipe_names or 'sections' in self.nlp.pipe_names
instance-attribute
__init__(nlp, reasons, attr, use_sections, ignore_excluded)
Source code in edsnlp/pipelines/misc/reason/reason.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/reason/reason.py
71 72 73 74 75 76 77 78 |
|
_enhance_with_sections(sections, reasons)
Enhance the list of reasons with the section information. If the reason overlaps with history, so it will be removed from the list
PARAMETER | DESCRIPTION |
---|---|
sections |
Spans of sections identified with the
TYPE:
|
reasons |
Reasons list identified by the regex
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List
|
Updated list of spans reasons |
Source code in edsnlp/pipelines/misc/reason/reason.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
__call__(doc)
Find spans related to the reasons of the hospitalisation
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
Source code in edsnlp/pipelines/misc/reason/reason.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|