edsnlp.pipelines.misc.dates
dates
eds.dates
pipeline.
PERIOD_PROXIMITY_THRESHOLD = 3
module-attribute
Dates
Bases: BaseComponent
Tags and normalizes dates, using the open-source dateparser
library.
The pipeline uses spaCy's filter_spans
function.
It filters out false positives, and introduce a hierarchy between patterns.
For instance, in case of ambiguity, the pipeline will decide that a date is a
date without a year rather than a date without a day.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
absolute |
List of regular expressions for absolute dates.
TYPE:
|
relative |
List of regular expressions for relative dates
(eg
TYPE:
|
duration |
List of regular expressions for durations
(eg
TYPE:
|
false_positive |
List of regular expressions for false positive (eg phone numbers, etc).
TYPE:
|
on_ents_only |
Wether to look on dates in the whole document or in specific sentences:
TYPE:
|
detect_periods |
Wether to detect periods (experimental)
TYPE:
|
attr |
spaCy attribute to use
TYPE:
|
Source code in edsnlp/pipelines/misc/dates/dates.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
nlp = nlp
instance-attribute
on_ents_only = on_ents_only
instance-attribute
regex_matcher = RegexMatcher(attr=attr, alignment_mode='strict')
instance-attribute
detect_periods = detect_periods
instance-attribute
__init__(nlp, absolute, relative, duration, false_positive, on_ents_only, detect_periods, attr)
Source code in edsnlp/pipelines/misc/dates/dates.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
set_extensions()
Set extensions for the dates pipeline.
Source code in edsnlp/pipelines/misc/dates/dates.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
process(doc)
Find dates in doc.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dates
|
list of date spans |
Source code in edsnlp/pipelines/misc/dates/dates.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
parse(dates)
Parse dates using the groupdict returned by the matcher.
PARAMETER | DESCRIPTION |
---|---|
dates |
List of tuples containing the spans and groupdict returned by the matcher.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of processed spans, with the date parsed. |
Source code in edsnlp/pipelines/misc/dates/dates.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
process_periods(dates)
Experimental period detection.
PARAMETER | DESCRIPTION |
---|---|
dates |
List of detected dates.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of detected periods. |
Source code in edsnlp/pipelines/misc/dates/dates.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
__call__(doc)
Tags dates.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for dates
TYPE:
|
Source code in edsnlp/pipelines/misc/dates/dates.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
models
Direction
Bases: Enum
Source code in edsnlp/pipelines/misc/dates/models.py
12 13 14 15 16 |
|
FUTURE = 'FUTURE'
class-attribute
PAST = 'PAST'
class-attribute
CURRENT = 'CURRENT'
class-attribute
Mode
Bases: Enum
Source code in edsnlp/pipelines/misc/dates/models.py
19 20 21 22 23 |
|
FROM = 'FROM'
class-attribute
UNTIL = 'UNTIL'
class-attribute
DURATION = 'DURATION'
class-attribute
Period
Bases: BaseModel
Source code in edsnlp/pipelines/misc/dates/models.py
26 27 28 29 30 31 32 |
|
FROM: Optional[Span] = None
class-attribute
UNTIL: Optional[Span] = None
class-attribute
DURATION: Optional[Span] = None
class-attribute
Config
Source code in edsnlp/pipelines/misc/dates/models.py
31 32 |
|
arbitrary_types_allowed = True
class-attribute
BaseDate
Bases: BaseModel
Source code in edsnlp/pipelines/misc/dates/models.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
mode: Optional[Mode] = None
class-attribute
validate_strings(d)
Source code in edsnlp/pipelines/misc/dates/models.py
39 40 41 42 43 44 45 46 47 48 |
|
AbsoluteDate
Bases: BaseDate
Source code in edsnlp/pipelines/misc/dates/models.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
year: Optional[int] = None
class-attribute
month: Optional[int] = None
class-attribute
day: Optional[int] = None
class-attribute
hour: Optional[int] = None
class-attribute
minute: Optional[int] = None
class-attribute
second: Optional[int] = None
class-attribute
to_datetime(tz='Europe/Paris', **kwargs)
Source code in edsnlp/pipelines/misc/dates/models.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
validate_year(v)
Source code in edsnlp/pipelines/misc/dates/models.py
95 96 97 98 99 100 101 |
|
Relative
Bases: BaseDate
Source code in edsnlp/pipelines/misc/dates/models.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
year: Optional[int] = None
class-attribute
month: Optional[int] = None
class-attribute
week: Optional[int] = None
class-attribute
day: Optional[int] = None
class-attribute
hour: Optional[int] = None
class-attribute
minute: Optional[int] = None
class-attribute
second: Optional[int] = None
class-attribute
parse_unit(d)
Units need to be handled separately.
This validator modifies the key corresponding to the unit with the detected value
PARAMETER | DESCRIPTION |
---|---|
d |
Original data
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, str]
|
Transformed data |
Source code in edsnlp/pipelines/misc/dates/models.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
to_datetime(**kwargs)
Source code in edsnlp/pipelines/misc/dates/models.py
139 140 141 142 143 144 145 146 147 148 149 150 |
|
RelativeDate
Bases: Relative
Source code in edsnlp/pipelines/misc/dates/models.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
direction: Direction = Direction.CURRENT
class-attribute
to_datetime(note_datetime=None)
Source code in edsnlp/pipelines/misc/dates/models.py
156 157 158 159 160 161 162 163 164 |
|
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
handle_specifics(d)
Specific patterns such as aujourd'hui
, hier
, etc,
need to be handled separately.
PARAMETER | DESCRIPTION |
---|---|
d |
Original data.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, str]
|
Modified data. |
Source code in edsnlp/pipelines/misc/dates/models.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
Duration
Bases: Relative
Source code in edsnlp/pipelines/misc/dates/models.py
209 210 211 212 213 214 215 |
|
mode: Mode = Mode.DURATION
class-attribute
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
212 213 214 215 |
|
factory
DEFAULT_CONFIG = dict(absolute=None, relative=None, duration=None, false_positive=None, detect_periods=False, on_ents_only=False, attr='LOWER')
module-attribute
create_component(nlp, name, absolute, relative, duration, false_positive, on_ents_only, detect_periods, attr)
Source code in edsnlp/pipelines/misc/dates/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
patterns
duration
cue_pattern = '(pendant|durant|pdt)'
module-attribute
duration_pattern = [cue_pattern + '.{,3}' + numbers.number_pattern + '\\s*' + units.unit_pattern]
module-attribute
relative
specific = {'minus1': ('hier', dict(direction='PAST', day=1)), 'minus2': ('avant[-\\s]hier', dict(direction='PAST', day=2)), 'plus1': ('demain', dict(direction='FUTURE', day=1)), 'plus2': ('après[-\\s]demain', dict(direction='FUTURE', day=2))}
module-attribute
specific_pattern = make_pattern(['(?P<specific_{k}>{p})' for (k, (p, _)) in specific.items()])
module-attribute
specific_dict = {k: v for (k, (_, v)) in specific.items()}
module-attribute
relative_pattern = ['(?<=' + mode_pattern + '.{,3})?' + p for p in relative_pattern]
module-attribute
make_specific_pattern(mode='forward')
Source code in edsnlp/pipelines/misc/dates/patterns/relative.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|