edsnlp
EDS-NLP
__version__ = '0.5.0'
module-attribute
BASE_DIR = Path(__file__).parent
module-attribute
conjugator
conjugate_verb(verb, conjugator)
Conjugates the verb using an instance of mlconjug3,
and formats the results in a pandas DataFrame
.
PARAMETER | DESCRIPTION |
---|---|
verb |
Verb to conjugate.
TYPE:
|
conjugator |
mlconjug3 instance for conjugating.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Normalized dataframe containing all conjugated forms for the verb. |
Source code in edsnlp/conjugator.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
conjugate(verbs, language='fr')
Conjugate a list of verbs.
PARAMETER | DESCRIPTION |
---|---|
verbs |
List of verbs to conjugate
TYPE:
|
language |
Language to conjugate. Defaults to French (
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Dataframe containing the conjugations for the provided verbs.
Columns: |
Source code in edsnlp/conjugator.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
get_conjugated_verbs(verbs, matches, language='fr')
Get a list of conjugated verbs.
PARAMETER | DESCRIPTION |
---|---|
verbs |
List of verbs to conjugate.
TYPE:
|
matches |
List of dictionary describing the mode/tense/persons to keep.
TYPE:
|
language |
[description], by default "fr" (French)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
List of terms to look for. |
Examples:
>>> get_conjugated_verbs(
"aimer",
dict(mode="Indicatif", tense="Présent", person="1p"),
)
['aimons']
Source code in edsnlp/conjugator.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
language
__all__ = ['EDSLanguage']
module-attribute
EDSDefaults
Bases: FrenchDefaults
Defaults for the EDSLanguage class Mostly identical to the FrenchDefaults, but without tokenization info
Source code in edsnlp/language.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
tokenizer_exceptions = {}
class-attribute
infixes = []
class-attribute
lex_attr_getters = LEX_ATTRS
class-attribute
syntax_iterators = SYNTAX_ITERATORS
class-attribute
stop_words = STOP_WORDS
class-attribute
config = FrenchDefaults.config.merge({'nlp': {'tokenizer': {'@tokenizers': 'eds.tokenizer'}}})
class-attribute
EDSLanguage
Bases: French
French clinical language.
It is shipped with the EDSTokenizer
tokenizer that better handles
tokenization for French clinical documents
Source code in edsnlp/language.py
32 33 34 35 36 37 38 39 40 41 42 |
|
lang = 'eds'
class-attribute
Defaults = EDSDefaults
class-attribute
default_config = Defaults
class-attribute
EDSTokenizer
Bases: DummyTokenizer
Source code in edsnlp/language.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
vocab = vocab
instance-attribute
word_regex = regex.compile('({num_like}|[{punct}]|\\n|[ ]+|{default})([ ])?')
instance-attribute
__init__(vocab)
Tokenizer class for French clinical documents.
It better handles tokenization around:
- numbers: "ACR5" -> ["ACR", "5"] instead of ["ACR5"]
- newlines: "
" -> [" ", " ", " "] instead of ["
"] and should be around 5-6 times faster than its standard French counterpart.
Parameters
----------
vocab: Vocab
The spacy vocabulary
Source code in edsnlp/language.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
__call__(text)
Tokenizes the text using the EDSTokenizer
PARAMETER | DESCRIPTION |
---|---|
text |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
Source code in edsnlp/language.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
create_eds_tokenizer()
Creates a factory that returns new EDSTokenizer instances
RETURNS | DESCRIPTION |
---|---|
EDSTokenizer
|
Source code in edsnlp/language.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
extensions
components
pipelines
base
BaseComponent
Bases: object
The BaseComponent
adds a set_extensions
method,
called at the creation of the object.
It helps decouple the initialisation of the pipeline from the creation of extensions, and is particularly usefull when distributing EDSNLP on a cluster, since the serialisation mechanism imposes that the extensions be reset.
Source code in edsnlp/pipelines/base.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
__init__(*args, **kwargs)
Source code in edsnlp/pipelines/base.py
17 18 19 20 |
|
set_extensions()
Set Doc
, Span
and Token
extensions.
Source code in edsnlp/pipelines/base.py
22 23 24 25 26 27 |
|
_boundaries(doc, terminations=None)
Create sub sentences based sentences and terminations found in text.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
terminations |
List of tuples with (match_id, start, end)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
boundaries
|
List of tuples with (start, end) of spans |
Source code in edsnlp/pipelines/base.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
terminations
termination: List[str] = ['et', 'bien que', 'même si', 'mais', 'or', 'alors que', 'sauf', 'cependant', 'pourtant', 'cause de', 'source de', 'hormis', 'car', 'parce que', 'pourtant', 'puisque', 'ni', 'en raison de', 'qui', 'que', 'ainsi que', 'avec', 'toutefois', 'en dehors', 'dans le cadre', 'du fait', '.', ',', ';', '...', '…', '(', ')', '"']
module-attribute
factories
ner
scores
base_score
Score
Bases: AdvancedRegex
Matcher component to extract a numeric score
PARAMETER | DESCRIPTION |
---|---|
nlp |
The spaCy object.
TYPE:
|
score_name |
The name of the extracted score
TYPE:
|
regex |
A list of regexes to identify the score
TYPE:
|
attr |
Wether to match on the text ('TEXT') or on the normalized text ('NORM')
TYPE:
|
after_extract |
Regex with capturing group to get the score value
TYPE:
|
score_normalization |
Function that takes the "raw" value extracted from the
TYPE:
|
window |
Number of token to include after the score's mention to find the score's value
TYPE:
|
Source code in edsnlp/pipelines/ner/scores/base_score.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
score_name = score_name
instance-attribute
score_normalization = registry.get('misc', score_normalization)
instance-attribute
__init__(nlp, score_name, regex, attr, after_extract, score_normalization, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/base_score.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
set_extensions()
Source code in edsnlp/pipelines/ner/scores/base_score.py
72 73 74 75 76 77 78 |
|
__call__(doc)
Adds spans to document.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for extracted terms. |
Source code in edsnlp/pipelines/ner/scores/base_score.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
score_filtering(ents)
Extracts, if available, the value of the score.
Normalizes the score via the provided self.score_normalization
method.
PARAMETER | DESCRIPTION |
---|---|
ents |
List of spaCy's spans extracted by the score matcher
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ents
|
List of spaCy's spans, with, if found, an added |
Source code in edsnlp/pipelines/ner/scores/base_score.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
factory
DEFAULT_CONFIG = dict(attr='NORM', window=7, verbose=0, ignore_excluded=False)
module-attribute
create_component(nlp, name, score_name, regex, after_extract, score_normalization, attr, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
emergency
gemsa
factory
DEFAULT_CONFIG = dict(regex=patterns.regex, after_extract=patterns.after_extract, score_normalization=patterns.score_normalization_str, attr='NORM', window=20, verbose=0, ignore_excluded=False)
module-attribute
create_component(nlp, name, regex, after_extract, score_normalization, attr, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/emergency/gemsa/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
patterns
regex = ['\\bgemsa\\b']
module-attribute
after_extract = 'gemsa.*?[\\n\\W]*?(\\d+)'
module-attribute
score_normalization_str = 'score_normalization.gemsa'
module-attribute
score_normalization(extracted_score)
GEMSA score normalization. If available, returns the integer value of the GEMSA score.
Source code in edsnlp/pipelines/ner/scores/emergency/gemsa/patterns.py
12 13 14 15 16 17 18 19 20 |
|
ccmu
factory
DEFAULT_CONFIG = dict(regex=patterns.regex, after_extract=patterns.after_extract, score_normalization=patterns.score_normalization_str, attr='NORM', window=20, verbose=0, ignore_excluded=False)
module-attribute
create_component(nlp, name, regex, after_extract, score_normalization, attr, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/emergency/ccmu/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
patterns
regex = ['\\bccmu\\b']
module-attribute
after_extract = 'ccmu.*?[\\n\\W]*?(\\d+)'
module-attribute
score_normalization_str = 'score_normalization.ccmu'
module-attribute
score_normalization(extracted_score)
CCMU score normalization. If available, returns the integer value of the CCMU score.
Source code in edsnlp/pipelines/ner/scores/emergency/ccmu/patterns.py
12 13 14 15 16 17 18 19 20 |
|
priority
factory
DEFAULT_CONFIG = dict(regex=patterns.regex, after_extract=patterns.after_extract, score_normalization=patterns.score_normalization_str, attr='NORM', window=7, verbose=0, ignore_excluded=False)
module-attribute
create_component(nlp, name, regex, after_extract, score_normalization, attr, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/emergency/priority/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
patterns
regex = ['\\bpriorite\\b']
module-attribute
after_extract = 'priorite.*?[\\n\\W]*?(\\d+)'
module-attribute
score_normalization_str = 'score_normalization.priority'
module-attribute
score_normalization(extracted_score)
Priority score normalization. If available, returns the integer value of the priority score.
Source code in edsnlp/pipelines/ner/scores/emergency/priority/patterns.py
12 13 14 15 16 17 18 19 20 |
|
charlson
factory
DEFAULT_CONFIG = dict(regex=patterns.regex, after_extract=patterns.after_extract, score_normalization=patterns.score_normalization_str, attr='NORM', window=7, verbose=0, ignore_excluded=False)
module-attribute
create_component(nlp, name, regex, after_extract, score_normalization, attr, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/charlson/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
patterns
regex = ['charlson']
module-attribute
after_extract = 'charlson.*?[\\n\\W]*?(\\d+)'
module-attribute
score_normalization_str = 'score_normalization.charlson'
module-attribute
score_normalization(extracted_score)
Charlson score normalization. If available, returns the integer value of the Charlson score.
Source code in edsnlp/pipelines/ner/scores/charlson/patterns.py
12 13 14 15 16 17 18 19 20 |
|
sofa
factory
DEFAULT_CONFIG = dict(regex=patterns.regex, method_regex=patterns.method_regex, value_regex=patterns.value_regex, score_normalization=patterns.score_normalization_str, attr='NORM', window=20, verbose=0, ignore_excluded=False)
module-attribute
create_component(nlp, name, regex, method_regex, value_regex, score_normalization, attr, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/sofa/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
patterns
regex = ['\\bsofa\\b']
module-attribute
method_regex = 'sofa.*?((?P<max>max\\w*)|(?P<vqheures>24h\\w*)|(?P<admission>admission\\w*))(?P<after_value>(.|\\n)*)'
module-attribute
value_regex = '.*?.[\\n\\W]*?(\\d+)[^h\\d]'
module-attribute
score_normalization_str = 'score_normalization.sofa'
module-attribute
score_normalization(extracted_score)
Sofa score normalization. If available, returns the integer value of the SOFA score.
Source code in edsnlp/pipelines/ner/scores/sofa/patterns.py
17 18 19 20 21 22 23 24 25 |
|
sofa
Sofa
Bases: Score
Matcher component to extract the SOFA score
PARAMETER | DESCRIPTION |
---|---|
nlp |
The spaCy object.
TYPE:
|
score_name |
The name of the extracted score
TYPE:
|
regex |
A list of regexes to identify the SOFA score
TYPE:
|
attr |
Wether to match on the text ('TEXT') or on the normalized text ('CUSTOM_NORM')
TYPE:
|
method_regex |
Regex with capturing group to get the score extraction method (e.g. "à l'admission", "à 24H", "Maximum")
TYPE:
|
value_regex |
Regex to extract the score value
TYPE:
|
score_normalization |
Function that takes the "raw" value extracted from the
TYPE:
|
window |
Number of token to include after the score's mention to find the score's value
TYPE:
|
Source code in edsnlp/pipelines/ner/scores/sofa/sofa.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
method_regex = method_regex
instance-attribute
value_regex = value_regex
instance-attribute
__init__(nlp, score_name, regex, attr, method_regex, value_regex, score_normalization, window, verbose, ignore_excluded)
Source code in edsnlp/pipelines/ner/scores/sofa/sofa.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
set_extensions()
Source code in edsnlp/pipelines/ner/scores/sofa/sofa.py
71 72 73 74 75 |
|
score_filtering(ents)
Extracts, if available, the value of the score.
Normalizes the score via the provided self.score_normalization
method.
PARAMETER | DESCRIPTION |
---|---|
ents |
List of spaCy's spans extracted by the score matcher
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ents
|
List of spaCy's spans, with, if found, an added |
Source code in edsnlp/pipelines/ner/scores/sofa/sofa.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
covid
factory
DEFAULT_CONFIG = dict(attr='LOWER', ignore_excluded=False)
module-attribute
create_component(nlp, name, attr, ignore_excluded)
Source code in edsnlp/pipelines/ner/covid/factory.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
patterns
covid = ['covid([-\\s]?19)?', 'sars[-\\s]?cov[-\\s]?2', 'corona[-\\s]?virus']
module-attribute
diseases = ['pneumopathies?', 'infections?']
module-attribute
pattern = '(' + make_pattern(diseases) + '\\s[àa]u?\\s)?' + make_pattern(covid)
module-attribute
core
endlines
factory
create_component(nlp, name, model_path)
Source code in edsnlp/pipelines/core/endlines/factory.py
10 11 12 13 14 15 16 17 |
|
functional
_get_label(prediction)
Returns the label for the prediction PREDICTED_END_LINE
PARAMETER | DESCRIPTION |
---|---|
prediction |
value of
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Label for |
Source code in edsnlp/pipelines/core/endlines/functional.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
get_dir_path(file)
Source code in edsnlp/pipelines/core/endlines/functional.py
26 27 28 |
|
build_path(file, relative_path)
Function to build an absolut path.
PARAMETER | DESCRIPTION |
---|---|
file |
|
relative_path |
relative path from the main file to the desired output
|
RETURNS | DESCRIPTION |
---|---|
path
|
Source code in edsnlp/pipelines/core/endlines/functional.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
_convert_series_to_array(s)
Converts pandas series of n elements to an array of shape (n,1).
PARAMETER | DESCRIPTION |
---|---|
s |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Source code in edsnlp/pipelines/core/endlines/functional.py
50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
endlinesmodel
EndLinesModel
Model to classify if an end line is a real one or it should be a space.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 |
|
nlp = nlp
instance-attribute
__init__(nlp)
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
28 29 |
|
_preprocess_data(corpus)
PARAMETER | DESCRIPTION |
---|---|
corpus |
Corpus of documents
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Preprocessed data |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
|
fit_and_predict(corpus)
Fit the model and predict for the training data
PARAMETER | DESCRIPTION |
---|---|
corpus |
An iterable of Documents
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
one line by end_line prediction |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
predict(df)
Use the model for inference
The df should have the following columns:
["A1","A2","A3","A4","B1","B2","BLANK_LINE"]
PARAMETER | DESCRIPTION |
---|---|
df |
The df should have the following columns:
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
The result is added to the column |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
save(path='base_model.pkl')
Save a pickle of the model. It could be read by the pipeline later.
PARAMETER | DESCRIPTION |
---|---|
path |
path to file .pkl, by default
TYPE:
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
213 214 215 216 217 218 219 220 221 222 223 |
|
_convert_A(df, col)
PARAMETER | DESCRIPTION |
---|---|
df |
TYPE:
|
col |
column to translate
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
|
_convert_B(df, col)
PARAMETER | DESCRIPTION |
---|---|
df |
[description]
TYPE:
|
col |
column to translate
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
[description] |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
|
_convert_raw_data_to_codes(df)
Function to translate data as extracted from spacy to the model codes.
A1
and A2
are not translated cause are supposed to be already
in good encoding.
PARAMETER | DESCRIPTION |
---|---|
df |
It should have columns
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 |
|
_convert_line_to_attribute(df, expr, col)
Function to convert a line into an attribute (column) of the previous row. Particularly we use it to identify "\n" and "\n\n" that are considered tokens, express this information as an attribute of the previous token.
PARAMETER | DESCRIPTION |
---|---|
df |
TYPE:
|
expr |
pattern to search in the text. Ex.: "\n"
TYPE:
|
col |
name of the new column
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 |
|
_compute_a3(df)
A3 (A4 respectively): typographic form of left word (or right) :
- All in capital letter
- It starts with a capital letter
- Starts by lowercase
- It's a number
- Strong punctuation
- Soft punctuation
- A number followed or preced by a punctuation (it's the case of enumerations)
PARAMETER | DESCRIPTION |
---|---|
df |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
df
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
|
_fit_M1(A1, A2, A3, A4, label)
Function to train M1 classifier (Naive Bayes)
PARAMETER | DESCRIPTION |
---|---|
A1 |
[description]
TYPE:
|
A2 |
[description]
TYPE:
|
A3 |
[description]
TYPE:
|
A4 |
[description]
TYPE:
|
label |
[description]
TYPE:
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 |
|
_fit_M2(B1, B2, label)
Function to train M2 classifier (Naive Bayes)
PARAMETER | DESCRIPTION |
---|---|
B1 |
TYPE:
|
B2 |
TYPE:
|
label |
TYPE:
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 |
|
_get_X_for_M1(A1, A2, A3, A4)
Get X matrix for classifier
PARAMETER | DESCRIPTION |
---|---|
A1 |
TYPE:
|
A2 |
TYPE:
|
A3 |
TYPE:
|
A4 |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 |
|
_get_X_for_M2(B1, B2)
Get X matrix for classifier
PARAMETER | DESCRIPTION |
---|---|
B1 |
TYPE:
|
B2 |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 |
|
_predict_M1(A1, A2, A3, A4)
Use M1 for prediction
PARAMETER | DESCRIPTION |
---|---|
A1 |
TYPE:
|
A2 |
TYPE:
|
A3 |
TYPE:
|
A4 |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 |
|
_predict_M2(B1, B2)
Use M2 for prediction
PARAMETER | DESCRIPTION |
---|---|
B1 |
TYPE:
|
B2 |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 |
|
_fit_encoder_2S(S1, S2)
Fit a one hot encoder with 2 Series. It concatenates the series and after it fits.
PARAMETER | DESCRIPTION |
---|---|
S1 |
TYPE:
|
S2 |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
OneHotEncoder
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 |
|
_fit_encoder_1S(S1)
Fit a one hot encoder with 1 Series.
PARAMETER | DESCRIPTION |
---|---|
S1 |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
OneHotEncoder
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
540 541 542 543 544 545 546 547 548 549 550 551 552 553 |
|
_encode_series(encoder, S)
Use the one hot encoder to transform a series.
PARAMETER | DESCRIPTION |
---|---|
encoder |
TYPE:
|
S |
a series to encode (transform)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
np.ndarray
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 |
|
set_spans(corpus, df)
Function to set the results of the algorithm (pd.DataFrame) as spans of the spaCy document.
PARAMETER | DESCRIPTION |
---|---|
corpus |
Iterable of spaCy Documents
TYPE:
|
df |
It should have the columns: ["DOC_ID","original_token_index","PREDICTED_END_LINE"]
TYPE:
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 |
|
_retrieve_lines(dfg)
Function to give a sentence_id to each token.
PARAMETER | DESCRIPTION |
---|---|
dfg |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrameGroupBy
|
Same DataFrameGroupBy with the column |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 |
|
_create_vocabulary(x)
Function to create a vocabulary for attributes in the training set.
PARAMETER | DESCRIPTION |
---|---|
x |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 |
|
_compute_B(df)
Function to compute B1 and B2
PARAMETER | DESCRIPTION |
---|---|
df |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 |
|
_shift_col(df, col, new_col, direction='backward', fill=None)
Shifts a column one position into backward / forward direction.
PARAMETER | DESCRIPTION |
---|---|
df |
TYPE:
|
col |
column to shift
TYPE:
|
new_col |
column name to save the results
TYPE:
|
direction |
one of {"backward", "forward"}, by default "backward"
TYPE:
|
fill |
, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
same df with |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 |
|
_get_attributes(doc, i=0)
Function to get the attributes of tokens of a spacy doc in a pd.DataFrame format.
PARAMETER | DESCRIPTION |
---|---|
doc |
spacy Doc
TYPE:
|
i |
document id, by default 0
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Returns a dataframe with one line per token. It has the following columns :
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 |
|
_get_string(_id, string_store)
Returns the string corresponding to the token_id
PARAMETER | DESCRIPTION |
---|---|
_id |
token id
TYPE:
|
string_store |
spaCy Language String Store
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
string representation of the token. |
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 |
|
_fit_one_hot_encoder(X)
Fit a one hot encoder.
PARAMETER | DESCRIPTION |
---|---|
X |
of shape (n,1)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
OneHotEncoder
|
Source code in edsnlp/pipelines/core/endlines/endlinesmodel.py
775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 |
|
endlines
EndLines
Bases: GenericMatcher
spaCy Pipeline to detect whether a newline character should be considered a space (ie introduced by the PDF).
The pipeline will add the extension end_line
to spans
and tokens. The end_line
attribute is a boolean or None
,
set to True
if the pipeline predicts that the new line
is an end line character. Otherwise, it is set to False
if the new line is classified as a space. If no classification
has been done over that token, it will remain None
.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
end_lines_model : Optional[Union[str, EndLinesModel]], by default None path to trained model. If None, it will use a default model
Source code in edsnlp/pipelines/core/endlines/endlines.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
__init__(nlp, end_lines_model, **kwargs)
Source code in edsnlp/pipelines/core/endlines/endlines.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
_read_model(end_lines_model)
PARAMETER | DESCRIPTION |
---|---|
end_lines_model |
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
Source code in edsnlp/pipelines/core/endlines/endlines.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
_spacy_compute_a3a4(token)
Function to compute A3 and A4
PARAMETER | DESCRIPTION |
---|---|
token |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Source code in edsnlp/pipelines/core/endlines/endlines.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
_compute_length(doc, start, end)
Compute length without spaces
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
start |
TYPE:
|
end |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
Source code in edsnlp/pipelines/core/endlines/endlines.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
_get_df(doc, new_lines)
Get a pandas DataFrame to call the classifier
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
new_lines |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Source code in edsnlp/pipelines/core/endlines/endlines.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
__call__(doc)
Predict for each new line if it's an end of line or a space.
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
Source code in edsnlp/pipelines/core/endlines/endlines.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
normalizer
factory
DEFAULT_CONFIG = dict(accents=True, lowercase=True, quotes=True, pollution=True)
module-attribute
create_component(nlp, name, accents, lowercase, quotes, pollution)
Source code in edsnlp/pipelines/core/normalizer/factory.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
normalizer
Normalizer
Bases: object
Normalisation pipeline. Modifies the NORM
attribute,
acting on four dimensions :
lowercase
: using the defaultNORM
accents
: deterministic and fixed-length normalisation of accents.quotes
: deterministic and fixed-length normalisation of quotation marks.pollution
: removal of pollutions.
PARAMETER | DESCRIPTION |
---|---|
lowercase |
Whether to remove case.
TYPE:
|
accents |
Optional
TYPE:
|
quotes |
Optional
TYPE:
|
pollution |
Optional
TYPE:
|
Source code in edsnlp/pipelines/core/normalizer/normalizer.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
lowercase = lowercase
instance-attribute
accents = accents
instance-attribute
quotes = quotes
instance-attribute
pollution = pollution
instance-attribute
__init__(lowercase, accents, quotes, pollution)
Source code in edsnlp/pipelines/core/normalizer/normalizer.py
33 34 35 36 37 38 39 40 41 42 43 |
|
__call__(doc)
Apply the normalisation pipeline, one component at a time.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
Doc object with |
Source code in edsnlp/pipelines/core/normalizer/normalizer.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
lowercase
factory
remove_lowercase(doc)
Add case on the NORM
custom attribute. Should always be applied first.
PARAMETER | DESCRIPTION |
---|---|
doc |
The spaCy
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
The document, with case put back in |
Source code in edsnlp/pipelines/core/normalizer/lowercase/factory.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
pollution
factory
DEFAULT_CONFIG = dict(pollution=None)
module-attribute
create_component(nlp, name, pollution)
Source code in edsnlp/pipelines/core/normalizer/pollution/factory.py
14 15 16 17 18 19 20 21 22 23 24 |
|
pollution
Pollution
Bases: BaseComponent
Tags pollution tokens.
Populates a number of spaCy extensions :
Token._.pollution
: indicates whether the token is a pollutionDoc._.clean
: lists non-pollution tokensDoc._.clean_
: original text with pollutions removed.Doc._.char_clean_span
: method to create a Span using character indices extracted using the cleaned text.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
pollution |
Dictionary containing regular expressions of pollution.
TYPE:
|
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
nlp = nlp
instance-attribute
pollution = pollution
instance-attribute
regex_matcher = RegexMatcher()
instance-attribute
__init__(nlp, pollution)
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
build_patterns()
Builds the patterns for phrase matching.
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
55 56 57 58 59 60 61 62 |
|
process(doc)
Find pollutions in doc and clean candidate negations to remove pseudo negations
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pollution
|
list of pollution spans |
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
__call__(doc)
Tags pollutions.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for pollutions. |
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
patterns
information = "(?s)(=====+\\s*)?(L\\s*e\\s*s\\sdonnées\\s*administratives,\\s*sociales\\s*|I?nfo\\s*rmation\\s*aux?\\s*patients?|L[’']AP-HP\\s*collecte\\s*vos\\s*données\\s*administratives|L[’']Assistance\\s*Publique\\s*-\\s*Hôpitaux\\s*de\\s*Paris\\s*\\(?AP-HP\\)?\\s*a\\s*créé\\s*une\\s*base\\s*de\\s*données).{,2000}https?:\\/\\/recherche\\.aphp\\.fr\\/eds\\/droit-opposition[\\s\\.]*"
module-attribute
bars = '(?i)([nbw]|_|-|=){5,}'
module-attribute
pollution = dict(information=information, bars=bars)
module-attribute
quotes
factory
DEFAULT_CONFIG = dict(quotes=None)
module-attribute
create_component(nlp, name, quotes)
Source code in edsnlp/pipelines/core/normalizer/quotes/factory.py
14 15 16 17 18 19 20 21 22 23 |
|
quotes
Quotes
Bases: object
We normalise quotes, following this
source <https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html>
_.
PARAMETER | DESCRIPTION |
---|---|
quotes |
List of quotation characters and their transcription.
TYPE:
|
Source code in edsnlp/pipelines/core/normalizer/quotes/quotes.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
translation_table = str.maketrans(''.join(quote_group for (quote_group, _) in quotes), ''.join(rep * len(quote_group) for (quote_group, rep) in quotes))
instance-attribute
__init__(quotes)
Source code in edsnlp/pipelines/core/normalizer/quotes/quotes.py
19 20 21 22 23 24 25 26 |
|
__call__(doc)
Normalises quotes.
PARAMETER | DESCRIPTION |
---|---|
doc |
Document to process.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
Same document, with quotes normalised. |
Source code in edsnlp/pipelines/core/normalizer/quotes/quotes.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
patterns
quotes: List[str] = ['"', '〃', 'ײ', '᳓', '″', '״', '‶', '˶', 'ʺ', '“', '”', '˝', '‟']
module-attribute
apostrophes: List[str] = ['`', '΄', ''', 'ˈ', 'ˊ', 'ᑊ', 'ˋ', 'ꞌ', 'ᛌ', '𖽒', '𖽑', '‘', '’', 'י', '՚', '‛', '՝', '`', '`', '′', '׳', '´', 'ʹ', '˴', 'ߴ', '‵', 'ߵ', 'ʹ', 'ʻ', 'ʼ', '´', '᾽', 'ʽ', '῾', 'ʾ', '᾿']
module-attribute
quotes_and_apostrophes: List[Tuple[str, str]] = [(''.join(quotes), '"'), (''.join(apostrophes), "'")]
module-attribute
accents
factory
DEFAULT_CONFIG = dict(accents=None)
module-attribute
create_component(nlp, name, accents)
Source code in edsnlp/pipelines/core/normalizer/accents/factory.py
14 15 16 17 18 19 20 21 22 23 |
|
patterns
accents: List[Tuple[str, str]] = [('ç', 'c'), ('àáâä', 'a'), ('èéêë', 'e'), ('ìíîï', 'i'), ('òóôö', 'o'), ('ùúûü', 'u')]
module-attribute
accents
Accents
Bases: object
Normalises accents, using a same-length strategy.
PARAMETER | DESCRIPTION |
---|---|
accents |
List of accentuated characters and their transcription.
TYPE:
|
Source code in edsnlp/pipelines/core/normalizer/accents/accents.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
translation_table = str.maketrans(''.join(accent_group for (accent_group, _) in accents), ''.join(rep * len(accent_group) for (accent_group, rep) in accents))
instance-attribute
__init__(accents)
Source code in edsnlp/pipelines/core/normalizer/accents/accents.py
18 19 20 21 22 23 24 25 |
|
__call__(doc)
Remove accents from spacy NORM
attribute.
PARAMETER | DESCRIPTION |
---|---|
doc |
The spaCy
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
The document, with accents removed in |
Source code in edsnlp/pipelines/core/normalizer/accents/accents.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
matcher
matcher
GenericMatcher
Bases: BaseComponent
Provides a generic matcher component.
PARAMETER | DESCRIPTION |
---|---|
nlp |
The spaCy object.
TYPE:
|
terms |
A dictionary of terms.
TYPE:
|
regex |
A dictionary of regular expressions.
TYPE:
|
attr |
The default attribute to use for matching.
Can be overiden using the
TYPE:
|
filter_matches |
Whether to filter out matches.
TYPE:
|
on_ents_only |
Whether to to look for matches around pre-extracted entities only.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens (requires an upstream pipeline to mark excluded tokens).
TYPE:
|
Source code in edsnlp/pipelines/core/matcher/matcher.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
nlp = nlp
instance-attribute
attr = attr
instance-attribute
phrase_matcher = EDSPhraseMatcher(self.nlp.vocab, attr=attr, ignore_excluded=ignore_excluded)
instance-attribute
regex_matcher = RegexMatcher(attr=attr, ignore_excluded=ignore_excluded)
instance-attribute
__init__(nlp, terms, regex, attr, ignore_excluded)
Source code in edsnlp/pipelines/core/matcher/matcher.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
process(doc)
Find matching spans in doc.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
spans
|
List of Spans returned by the matchers. |
Source code in edsnlp/pipelines/core/matcher/matcher.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
__call__(doc)
Adds spans to document.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for extracted terms. |
Source code in edsnlp/pipelines/core/matcher/matcher.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
factory
DEFAULT_CONFIG = dict(terms=None, regex=None, attr='TEXT', ignore_excluded=False)
module-attribute
create_component(nlp, name, terms, attr, regex, ignore_excluded)
Source code in edsnlp/pipelines/core/matcher/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
context
factory
DEFAULT_CONFIG = dict(context=['note_id'])
module-attribute
create_component(nlp, name, context)
Source code in edsnlp/pipelines/core/context/factory.py
12 13 14 15 16 17 18 19 20 21 22 |
|
context
ContextAdder
Bases: BaseComponent
Provides a generic context adder component.
PARAMETER | DESCRIPTION |
---|---|
nlp |
The spaCy object.
TYPE:
|
context |
The list of extensions to add to the
TYPE:
|
Source code in edsnlp/pipelines/core/context/context.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
nlp = nlp
instance-attribute
context = context
instance-attribute
__init__(nlp, context)
Source code in edsnlp/pipelines/core/context/context.py
21 22 23 24 25 26 27 28 29 |
|
set_extensions()
Source code in edsnlp/pipelines/core/context/context.py
31 32 33 34 |
|
__call__(doc)
Source code in edsnlp/pipelines/core/context/context.py
36 37 |
|
sentences
factory
DEFAULT_CONFIG = dict(punct_chars=None, use_endlines=True)
module-attribute
create_component(nlp, name, punct_chars, use_endlines)
Source code in edsnlp/pipelines/core/sentences/factory.py
15 16 17 18 19 20 21 22 23 24 25 26 |
|
sentences
SentenceSegmenter
Bases: object
Segments the Doc into sentences using a rule-based strategy, specific to AP-HP documents.
Applies the same rule-based pipeline as spaCy's sentencizer, and adds a simple rule on the new lines : if a new line is followed by a capitalised word, then it is also an end of sentence.
DOCS: https://spacy.io/api/sentencizer
Arguments
punct_chars : Optional[List[str]] Punctuation characters. use_endlines : bool Whether to use endlines prediction.
Source code in edsnlp/pipelines/core/sentences/sentences.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
punct_chars = set(punct_chars)
instance-attribute
use_endlines = use_endlines
instance-attribute
__init__(punct_chars, use_endlines)
Source code in edsnlp/pipelines/core/sentences/sentences.py
27 28 29 30 31 32 33 34 35 36 37 |
|
__call__(doc)
Segments the document in sentences.
Arguments
doc: A spacy Doc object.
RETURNS | DESCRIPTION |
---|---|
doc
|
A spaCy Doc object, annotated for sentences. |
Source code in edsnlp/pipelines/core/sentences/sentences.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
terms
punctuation = ['!', '.', '?', '։', '؟', '۔', '܀', '܁', '܂', '߹', '।', '॥', '၊', '။', '።', '፧', '፨', '᙮', '᜵', '᜶', '᠃', '᠉', '᥄', '᥅', '᪨', '᪩', '᪪', '᪫', '᭚', '᭛', '᭞', '᭟', '᰻', '᰼', '᱾', '᱿', '‼', '‽', '⁇', '⁈', '⁉', '⸮', '⸼', '꓿', '꘎', '꘏', '꛳', '꛷', '꡶', '꡷', '꣎', '꣏', '꤯', '꧈', '꧉', '꩝', '꩞', '꩟', '꫰', '꫱', '꯫', '﹒', '﹖', '﹗', '!', '.', '?', '𐩖', '𐩗', '𑁇', '𑁈', '𑂾', '𑂿', '𑃀', '𑃁', '𑅁', '𑅂', '𑅃', '𑇅', '𑇆', '𑇍', '𑇞', '𑇟', '𑈸', '𑈹', '𑈻', '𑈼', '𑊩', '𑑋', '𑑌', '𑗂', '𑗃', '𑗉', '𑗊', '𑗋', '𑗌', '𑗍', '𑗎', '𑗏', '𑗐', '𑗑', '𑗒', '𑗓', '𑗔', '𑗕', '𑗖', '𑗗', '𑙁', '𑙂', '𑜼', '𑜽', '𑜾', '𑩂', '𑩃', '𑪛', '𑪜', '𑱁', '𑱂', '𖩮', '𖩯', '𖫵', '𖬷', '𖬸', '𖭄', '𛲟', '𝪈', '。', '。']
module-attribute
advanced
advanced
AdvancedRegex
Bases: GenericMatcher
Allows additional matching in the surrounding context of the main match group, for qualification/filtering.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy
TYPE:
|
regex_config |
Configuration for the main expression.
TYPE:
|
window |
Number of tokens to consider before and after the main expression.
TYPE:
|
attr |
Attribute to match on, eg
TYPE:
|
verbose |
Verbosity level, useful for debugging.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/core/advanced/advanced.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
|
regex_config = _check_regex_config(regex_config)
instance-attribute
window = window
instance-attribute
verbose = verbose
instance-attribute
ignore_excluded = ignore_excluded
instance-attribute
__init__(nlp, regex_config, window, attr, verbose, ignore_excluded)
Source code in edsnlp/pipelines/core/advanced/advanced.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
set_extensions()
Source code in edsnlp/pipelines/core/advanced/advanced.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
process(doc)
Process the document, looking for named entities.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of detected spans. |
Source code in edsnlp/pipelines/core/advanced/advanced.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
__call__(doc)
Adds spans to document.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for extracted terms. |
Source code in edsnlp/pipelines/core/advanced/advanced.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
_postprocessing_pipeline(ents)
Source code in edsnlp/pipelines/core/advanced/advanced.py
129 130 131 132 133 134 135 136 137 138 139 |
|
_add_window(ent)
Source code in edsnlp/pipelines/core/advanced/advanced.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
get_text(span, label)
Source code in edsnlp/pipelines/core/advanced/advanced.py
158 159 160 161 162 163 164 165 |
|
_exclude_filter(ent)
Source code in edsnlp/pipelines/core/advanced/advanced.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
|
_snippet_extraction(ent)
Source code in edsnlp/pipelines/core/advanced/advanced.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
|
_check_regex_config(regex_config)
Source code in edsnlp/pipelines/core/advanced/advanced.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
factory
DEFAULT_CONFIG = dict(window=10, verbose=0, ignore_excluded=False, attr='NORM')
module-attribute
create_component(nlp, name, regex_config, window, verbose, ignore_excluded, attr)
Source code in edsnlp/pipelines/core/advanced/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
misc
reason
factory
DEFAULT_CONFIG = dict(reasons=None, attr='TEXT', use_sections=False, ignore_excluded=False)
module-attribute
create_component(nlp, name, reasons, attr, use_sections, ignore_excluded)
Source code in edsnlp/pipelines/misc/reason/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
patterns
reasons = dict(reasons=['(?i)motif de l.?hospitalisation : .+', '(?i)hospitalis[ée].?.*(pour|. cause|suite [àa]).+', '(?i)(consulte|prise en charge(?!\\set\\svous\\sassurer\\sun\\straitement\\sadapté)).*pour.+', '(?i)motif\\sd.hospitalisation\\s:.+', '(?i)au total\\s?\\:?\\s?\\n?.+', '(?i)motif\\sde\\sla\\sconsultation', '(?i)motif\\sd.admission', '(?i)conclusion\\smedicale'])
module-attribute
sections_reason = ['motif', 'conclusion']
module-attribute
section_exclude = ['antécédents', 'antécédents familiaux', 'histoire de la maladie']
module-attribute
reason
Reason
Bases: GenericMatcher
Pipeline to identify the reason of the hospitalisation.
It declares a Span extension called ents_reason
and adds
the key reasons
to doc.spans.
It also declares the boolean extension is_reason
.
This extension is set to True for the Reason Spans but also
for the entities that overlap the reason span.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
reasons |
The terminology of reasons.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr'. We can also add a key for each regex.
TYPE:
|
use_sections |
whether or not use the
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/reason/reason.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
use_sections = use_sections and 'eds.sections' in self.nlp.pipe_names or 'sections' in self.nlp.pipe_names
instance-attribute
__init__(nlp, reasons, attr, use_sections, ignore_excluded)
Source code in edsnlp/pipelines/misc/reason/reason.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/reason/reason.py
71 72 73 74 75 76 77 78 |
|
_enhance_with_sections(sections, reasons)
Enhance the list of reasons with the section information. If the reason overlaps with history, so it will be removed from the list
PARAMETER | DESCRIPTION |
---|---|
sections |
Spans of sections identified with the
TYPE:
|
reasons |
Reasons list identified by the regex
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List
|
Updated list of spans reasons |
Source code in edsnlp/pipelines/misc/reason/reason.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
__call__(doc)
Find spans related to the reasons of the hospitalisation
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
Source code in edsnlp/pipelines/misc/reason/reason.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
consultation_dates
factory
DEFAULT_CONFIG = dict(consultation_mention=True, town_mention=False, document_date_mention=False, attr='NORM')
module-attribute
create_component(nlp, name, attr, consultation_mention, town_mention, document_date_mention)
Source code in edsnlp/pipelines/misc/consultation_dates/factory.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
patterns
consultation_mention = ['rendez-vous pris', 'consultation', 'consultation.{1,8}examen', 'examen clinique', 'de compte rendu', "date de l'examen", 'examen realise le', 'date de la visite']
module-attribute
town_mention = ['paris', 'kremlin.bicetre', 'creteil', 'boulogne.billancourt', 'villejuif', 'clamart', 'bobigny', 'clichy', 'ivry.sur.seine', 'issy.les.moulineaux', 'draveil', 'limeil', 'champcueil', 'roche.guyon', 'bondy', 'colombes', 'hendaye', 'herck.sur.mer', 'labruyere', 'garches', 'sevran', 'hyeres']
module-attribute
document_date_mention = ['imprime le', 'signe electroniquement', 'signe le', 'saisi le', 'dicte le', 'tape le', 'date de reference', 'date\\s*:', 'dactylographie le', 'date du rapport']
module-attribute
consultation_dates
ConsultationDates
Bases: GenericMatcher
Class to extract consultation dates from "CR-CONS" documents.
The pipeline populates the doc.spans['consultation_dates']
list.
For each extraction s
in this list, the corresponding date is available
as s._.consultation_date
.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
consultation_mention |
List of RegEx for consultation mentions.
TYPE:
|
town_mention : Union[List[str], bool] List of RegEx for all AP-HP hospitals' towns mentions.
- If `type==list`: Overrides the default list
- If `type==bool`: Uses the default list of True, disable if False
document_date_mention : Union[List[str], bool] List of RegEx for document date.
- If `type==list`: Overrides the default list
- If `type==bool`: Uses the default list of True, disable if False
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
date_matcher = Dates(nlp, None=config)
instance-attribute
__init__(nlp, consultation_mention, town_mention, document_date_mention, attr, **kwargs)
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
109 110 111 112 |
|
__call__(doc)
Finds entities
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object with additional
|
Source code in edsnlp/pipelines/misc/consultation_dates/consultation_dates.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
sections
factory
DEFAULT_CONFIG = dict(sections=None, add_patterns=True, attr='NORM', ignore_excluded=True)
module-attribute
create_component(nlp, name, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/factory.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
sections
Sections
Bases: GenericMatcher
Divides the document into sections.
By default, we are using a dataset of documents annotated for section titles, using the work done by Ivan Lerner, reviewed by Gilles Chatellier.
Detected sections are :
- allergies ;
- antécédents ;
- antécédents familiaux ;
- traitements entrée ;
- conclusion ;
- conclusion entrée ;
- habitus ;
- correspondants ;
- diagnostic ;
- données biométriques entrée ;
- examens ;
- examens complémentaires ;
- facteurs de risques ;
- histoire de la maladie ;
- actes ;
- motif ;
- prescriptions ;
- traitements sortie.
The component looks for section titles within the document,
and stores them in the section_title
extension.
For ease-of-use, the component also populates a section
extension,
which contains a list of spans corresponding to the "sections" of the
document. These span from the start of one section title to the next,
which can introduce obvious bias should an intermediate section title
goes undetected.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy pipeline object.
TYPE:
|
sections |
Dictionary of terms to look for.
TYPE:
|
attr |
Default attribute to match on.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
Source code in edsnlp/pipelines/misc/sections/sections.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
add_patterns = add_patterns
instance-attribute
__init__(nlp, sections, add_patterns, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/sections/sections.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/sections/sections.py
97 98 99 100 101 102 103 104 |
|
__call__(doc)
Divides the doc into sections
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for sections |
Source code in edsnlp/pipelines/misc/sections/sections.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
patterns
These section titles were extracted from a work performed by Ivan Lerner at AP-HP. It supplied a number of documents annotated for section titles.
The section titles were reviewed by Gilles Chatellier, who gave meaningful insights.
See sections/section-dataset notebook for detail.
allergies = ['allergies']
module-attribute
antecedents = ['antecedents', 'antecedents medicaux et chirurgicaux', 'antecedents personnels', 'antecedents medicaux', 'antecedents chirurgicaux', 'atcd']
module-attribute
antecedents_familiaux = ['antecedents familiaux']
module-attribute
traitements_entree = ['attitude therapeutique initiale', "traitement a l'entree", 'traitement actuel', 'traitement en cours', "traitements a l'entree"]
module-attribute
conclusion = ['au total', 'conclusion', 'conclusion de sortie', 'syntese medicale / conclusion', 'synthese', 'synthese medicale', 'synthese medicale/conclusion', 'conclusion medicale']
module-attribute
conclusion_entree = ["conclusion a l'entree"]
module-attribute
habitus = ['contexte familial et social', 'habitus', 'mode de vie', 'mode de vie - scolarite', 'situation sociale, mode de vie']
module-attribute
correspondants = ['correspondants']
module-attribute
diagnostic = ['diagnostic retenu']
module-attribute
donnees_biometriques_entree = ["donnees biometriques et parametres vitaux a l'entree", "parametres vitaux et donnees biometriques a l'entree"]
module-attribute
examens = ['examen clinique', "examen clinique a l'entree"]
module-attribute
examens_complementaires = ['examen(s) complementaire(s)', 'examens complementaires', "examens complementaires a l'entree", 'examens complementaires realises pendant le sejour', 'examens para-cliniques']
module-attribute
facteurs_de_risques = ['facteurs de risque', 'facteurs de risques']
module-attribute
histoire_de_la_maladie = ['histoire de la maladie', 'histoire de la maladie - explorations', 'histoire de la maladie actuelle', 'histoire du poids', 'histoire recente', 'histoire recente de la maladie', 'rappel clinique', 'resume', 'resume clinique']
module-attribute
actes = ['intervention']
module-attribute
motif = ['motif', "motif d'hospitalisation", "motif de l'hospitalisation", 'motif medical']
module-attribute
prescriptions = ['prescriptions de sortie', 'prescriptions medicales de sortie']
module-attribute
traitements_sortie = ['traitement de sortie']
module-attribute
sections = {'allergies': allergies, 'antécédents': antecedents, 'antécédents familiaux': antecedents_familiaux, 'traitements entrée': traitements_entree, 'conclusion': conclusion, 'conclusion entrée': conclusion_entree, 'habitus': habitus, 'correspondants': correspondants, 'diagnostic': diagnostic, 'données biométriques entrée': donnees_biometriques_entree, 'examens': examens, 'examens complémentaires': examens_complementaires, 'facteurs de risques': facteurs_de_risques, 'histoire de la maladie': histoire_de_la_maladie, 'actes': actes, 'motif': motif, 'prescriptions': prescriptions, 'traitements sortie': traitements_sortie}
module-attribute
dates
dates
eds.dates
pipeline.
PERIOD_PROXIMITY_THRESHOLD = 3
module-attribute
Dates
Bases: BaseComponent
Tags and normalizes dates, using the open-source dateparser
library.
The pipeline uses spaCy's filter_spans
function.
It filters out false positives, and introduce a hierarchy between patterns.
For instance, in case of ambiguity, the pipeline will decide that a date is a
date without a year rather than a date without a day.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
absolute |
List of regular expressions for absolute dates.
TYPE:
|
relative |
List of regular expressions for relative dates
(eg
TYPE:
|
duration |
List of regular expressions for durations
(eg
TYPE:
|
false_positive |
List of regular expressions for false positive (eg phone numbers, etc).
TYPE:
|
on_ents_only |
Wether to look on dates in the whole document or in specific sentences:
TYPE:
|
detect_periods |
Wether to detect periods (experimental)
TYPE:
|
attr |
spaCy attribute to use
TYPE:
|
Source code in edsnlp/pipelines/misc/dates/dates.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
nlp = nlp
instance-attribute
on_ents_only = on_ents_only
instance-attribute
regex_matcher = RegexMatcher(attr=attr, alignment_mode='strict')
instance-attribute
detect_periods = detect_periods
instance-attribute
__init__(nlp, absolute, relative, duration, false_positive, on_ents_only, detect_periods, attr)
Source code in edsnlp/pipelines/misc/dates/dates.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
set_extensions()
Set extensions for the dates pipeline.
Source code in edsnlp/pipelines/misc/dates/dates.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
process(doc)
Find dates in doc.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dates
|
list of date spans |
Source code in edsnlp/pipelines/misc/dates/dates.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
parse(dates)
Parse dates using the groupdict returned by the matcher.
PARAMETER | DESCRIPTION |
---|---|
dates |
List of tuples containing the spans and groupdict returned by the matcher.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of processed spans, with the date parsed. |
Source code in edsnlp/pipelines/misc/dates/dates.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
process_periods(dates)
Experimental period detection.
PARAMETER | DESCRIPTION |
---|---|
dates |
List of detected dates.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of detected periods. |
Source code in edsnlp/pipelines/misc/dates/dates.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
__call__(doc)
Tags dates.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for dates
TYPE:
|
Source code in edsnlp/pipelines/misc/dates/dates.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
factory
DEFAULT_CONFIG = dict(absolute=None, relative=None, duration=None, false_positive=None, detect_periods=False, on_ents_only=False, attr='LOWER')
module-attribute
create_component(nlp, name, absolute, relative, duration, false_positive, on_ents_only, detect_periods, attr)
Source code in edsnlp/pipelines/misc/dates/factory.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
models
Direction
Bases: Enum
Source code in edsnlp/pipelines/misc/dates/models.py
12 13 14 15 16 |
|
FUTURE = 'FUTURE'
class-attribute
PAST = 'PAST'
class-attribute
CURRENT = 'CURRENT'
class-attribute
Mode
Bases: Enum
Source code in edsnlp/pipelines/misc/dates/models.py
19 20 21 22 23 |
|
FROM = 'FROM'
class-attribute
UNTIL = 'UNTIL'
class-attribute
DURATION = 'DURATION'
class-attribute
Period
Bases: BaseModel
Source code in edsnlp/pipelines/misc/dates/models.py
26 27 28 29 30 31 32 |
|
FROM: Optional[Span] = None
class-attribute
UNTIL: Optional[Span] = None
class-attribute
DURATION: Optional[Span] = None
class-attribute
Config
Source code in edsnlp/pipelines/misc/dates/models.py
31 32 |
|
arbitrary_types_allowed = True
class-attribute
BaseDate
Bases: BaseModel
Source code in edsnlp/pipelines/misc/dates/models.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
mode: Optional[Mode] = None
class-attribute
validate_strings(d)
Source code in edsnlp/pipelines/misc/dates/models.py
39 40 41 42 43 44 45 46 47 48 |
|
AbsoluteDate
Bases: BaseDate
Source code in edsnlp/pipelines/misc/dates/models.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
year: Optional[int] = None
class-attribute
month: Optional[int] = None
class-attribute
day: Optional[int] = None
class-attribute
hour: Optional[int] = None
class-attribute
minute: Optional[int] = None
class-attribute
second: Optional[int] = None
class-attribute
to_datetime(tz='Europe/Paris', **kwargs)
Source code in edsnlp/pipelines/misc/dates/models.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
validate_year(v)
Source code in edsnlp/pipelines/misc/dates/models.py
95 96 97 98 99 100 101 |
|
Relative
Bases: BaseDate
Source code in edsnlp/pipelines/misc/dates/models.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
year: Optional[int] = None
class-attribute
month: Optional[int] = None
class-attribute
week: Optional[int] = None
class-attribute
day: Optional[int] = None
class-attribute
hour: Optional[int] = None
class-attribute
minute: Optional[int] = None
class-attribute
second: Optional[int] = None
class-attribute
parse_unit(d)
Units need to be handled separately.
This validator modifies the key corresponding to the unit with the detected value
PARAMETER | DESCRIPTION |
---|---|
d |
Original data
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, str]
|
Transformed data |
Source code in edsnlp/pipelines/misc/dates/models.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
to_datetime(**kwargs)
Source code in edsnlp/pipelines/misc/dates/models.py
139 140 141 142 143 144 145 146 147 148 149 150 |
|
RelativeDate
Bases: Relative
Source code in edsnlp/pipelines/misc/dates/models.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
direction: Direction = Direction.CURRENT
class-attribute
to_datetime(note_datetime=None)
Source code in edsnlp/pipelines/misc/dates/models.py
156 157 158 159 160 161 162 163 164 |
|
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
handle_specifics(d)
Specific patterns such as aujourd'hui
, hier
, etc,
need to be handled separately.
PARAMETER | DESCRIPTION |
---|---|
d |
Original data.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Dict[str, str]
|
Modified data. |
Source code in edsnlp/pipelines/misc/dates/models.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
Duration
Bases: Relative
Source code in edsnlp/pipelines/misc/dates/models.py
209 210 211 212 213 214 215 |
|
mode: Mode = Mode.DURATION
class-attribute
norm()
Source code in edsnlp/pipelines/misc/dates/models.py
212 213 214 215 |
|
patterns
absolute
no_year_pattern = [day + raw_delimiter_with_spaces_pattern + month + time_pattern + post_num_pattern for day in [ante_num_pattern + numeric_day_pattern, letter_day_pattern] for month in [numeric_month_pattern + post_num_pattern, letter_month_pattern]]
module-attribute
no_day_pattern = [letter_month_pattern + raw_delimiter_with_spaces_pattern + year_pattern + post_num_pattern, ante_num_pattern + lz_numeric_month_pattern + raw_delimiter_with_spaces_pattern + year_pattern + post_num_pattern]
module-attribute
full_year_pattern = ante_num_pattern + fy_pattern + post_num_pattern
module-attribute
absolute_pattern = ['(?<=' + mode_pattern + '.{,3})?' + p for p in absolute_pattern]
module-attribute
current
current_patterns: List[str] = ['(?P<year_0>cette\\s+ann[ée]e)(?![-\\s]l[àa])', "(?P<day_0>ce\\s+jour|aujourd['\\s]?hui)", '(?P<week_0>cette\\s+semaine|ces\\sjours[-\\s]ci)', '(?P<month_0>ce\\smois([-\\s]ci)?)']
module-attribute
current_pattern = make_pattern(current_patterns, with_breaks=True)
module-attribute
false_positive
false_positive_pattern = make_pattern(['(\\d+' + delimiter_pattern + '){3,}\\d+(?!:\\d\\d)\\b', '\\d\\/\\d'])
module-attribute
relative
specific = {'minus1': ('hier', dict(direction='PAST', day=1)), 'minus2': ('avant[-\\s]hier', dict(direction='PAST', day=2)), 'plus1': ('demain', dict(direction='FUTURE', day=1)), 'plus2': ('après[-\\s]demain', dict(direction='FUTURE', day=2))}
module-attribute
specific_pattern = make_pattern(['(?P<specific_{k}>{p})' for (k, (p, _)) in specific.items()])
module-attribute
specific_dict = {k: v for (k, (_, v)) in specific.items()}
module-attribute
relative_pattern = ['(?<=' + mode_pattern + '.{,3})?' + p for p in relative_pattern]
module-attribute
make_specific_pattern(mode='forward')
Source code in edsnlp/pipelines/misc/dates/patterns/relative.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
duration
cue_pattern = '(pendant|durant|pdt)'
module-attribute
duration_pattern = [cue_pattern + '.{,3}' + numbers.number_pattern + '\\s*' + units.unit_pattern]
module-attribute
atomic
units
units = ['(?P<unit_year>ans?|ann[ée]es?)', '(?P<unit_semester>semestres?)', '(?P<unit_trimester>trimestres?)', '(?P<unit_month>mois)', '(?P<unit_week>semaines?)', '(?P<unit_day>jours?|journ[ée]es?)', '(?P<unit_hour>h|heures?)', '(?P<unit_minute>min|minutes?)', '(?P<unit_second>sec|secondes?|s)']
module-attribute
unit_pattern = make_pattern(units, with_breaks=True)
module-attribute
numbers
letter_numbers = ["(?P<number_01>l'|le|la|une?|ce|cette|cet)", '(?P<number_02>deux)', '(?P<number_03>trois)', '(?P<number_04>quatre)', '(?P<number_05>cinq)', '(?P<number_06>six)', '(?P<number_07>sept)', '(?P<number_08>huit)', '(?P<number_09>neuf)', '(?P<number_10>dix)', '(?P<number_11>onze)', '(?P<number_12>douze)', '(?P<number_12>treize)', '(?P<number_13>quatorze)', '(?P<number_14>quinze)', '(?P<number_15>seize)', '(?P<number_16>dix[-\\s]sept)', '(?P<number_17>dix[-\\s]huit)', '(?P<number_18>dix[-\\s]neuf)', '(?P<number_20>vingt)', '(?P<number_21>vingt[-\\s]et[-\\s]un)', '(?P<number_22>vingt[-\\s]deux)', '(?P<number_23>vingt[-\\s]trois)', '(?P<number_24>vingt[-\\s]quatre)', '(?P<number_25>vingt[-\\s]cinq)', '(?P<number_26>vingt[-\\s]six)', '(?P<number_27>vingt[-\\s]sept)', '(?P<number_28>vingt[-\\s]huit)', '(?P<number_29>vingt[-\\s]neuf)', '(?P<number_30>trente)']
module-attribute
numeric_numbers = [str(i) for i in range(1, 100)]
module-attribute
letter_number_pattern = make_pattern(letter_numbers, with_breaks=True)
module-attribute
numeric_number_pattern = make_pattern(numeric_numbers, name='number')
module-attribute
number_pattern = '({letter_number_pattern}|{numeric_number_pattern})'
module-attribute
days
letter_days = ['(?P<day_01>premier|1\\s*er)', '(?P<day_02>deux)', '(?P<day_03>trois)', '(?P<day_04>quatre)', '(?P<day_05>cinq)', '(?P<day_06>six)', '(?P<day_07>sept)', '(?P<day_08>huit)', '(?P<day_09>neuf)', '(?P<day_10>dix)', '(?P<day_11>onze)', '(?P<day_12>douze)', '(?P<day_13>treize)', '(?P<day_14>quatorze)', '(?P<day_15>quinze)', '(?P<day_16>seize)', '(?P<day_17>dix\\-?\\s*sept)', '(?P<day_18>dix\\-?\\s*huit)', '(?P<day_19>dix\\-?\\s*neuf)', '(?P<day_20>vingt)', '(?P<day_21>vingt\\-?\\s*et\\-?\\s*un)', '(?P<day_22>vingt\\-?\\s*deux)', '(?P<day_23>vingt\\-?\\s*trois)', '(?P<day_24>vingt\\-?\\s*quatre)', '(?P<day_25>vingt\\-?\\s*cinq)', '(?P<day_26>vingt\\-?\\s*six)', '(?P<day_27>vingt\\-?\\s*sept)', '(?P<day_28>vingt\\-?\\s*huit)', '(?P<day_29>vingt\\-?\\s*neuf)', '(?P<day_30>trente)', '(?P<day_31>trente\\-?\\s*et\\-?\\s*un)']
module-attribute
letter_day_pattern = make_pattern(letter_days)
module-attribute
nlz_numeric_day_pattern = '(?<!\\d)([1-9]|[12]\\d|3[01])(?!\\d)'
module-attribute
numeric_day_pattern = '(?P<day>{numeric_day_pattern})'
module-attribute
lz_numeric_day_pattern = '(?P<day>{lz_numeric_day_pattern})'
module-attribute
day_pattern = '({letter_day_pattern}|{numeric_day_pattern})'
module-attribute
months
letter_months = ['(?P<month_01>janvier|janv\\.?)', '(?P<month_02>f[ée]vrier|f[ée]v\\.?)', '(?P<month_03>mars|mar\\.?)', '(?P<month_04>avril|avr\\.?)', '(?P<month_05>mai)', '(?P<month_06>juin)', '(?P<month_07>juillet|juill?\\.?)', '(?P<month_08>ao[uû]t)', '(?P<month_09>septembre|sept?\\.?)', '(?P<month_10>octobre|oct\\.?)', '(?P<month_11>novembre|nov\\.)', '(?P<month_12>d[ée]cembre|d[ée]c\\.?)']
module-attribute
letter_month_pattern = make_pattern(letter_months, with_breaks=True)
module-attribute
numeric_month_pattern = '(?P<month>{numeric_month_pattern})'
module-attribute
lz_numeric_month_pattern = '(?P<month>{lz_numeric_month_pattern})'
module-attribute
month_pattern = '({letter_month_pattern}|{numeric_month_pattern})'
module-attribute
years
year_patterns: List[str] = ['19\\d\\d'] + [str(year) for year in range(2000, date.today().year + 2)]
module-attribute
full_year_pattern = '(?<!\\d)' + full_year_pattern + '(?!\\d)'
module-attribute
year_pattern = '(?<!\\d)' + year_pattern + '(?!\\d)'
module-attribute
directions
preceding_directions = ['(?P<direction_PAST>depuis|depuis\\s+le|il\\s+y\\s+a)', '(?P<direction_FUTURE>dans)']
module-attribute
following_directions = ['(?P<direction_FUTURE>prochaine?s?|suivante?s?|plus\\s+tard)', '(?P<direction_PAST>derni[eè]re?s?|passée?s?|pr[ée]c[ée]dente?s?|plus\\s+t[ôo]t)']
module-attribute
preceding_direction_pattern = make_pattern(preceding_directions, with_breaks=True)
module-attribute
following_direction_pattern = make_pattern(following_directions, with_breaks=True)
module-attribute
delimiters
raw_delimiters = ['\\/', '\\-']
module-attribute
delimiters = raw_delimiters + ['\\.', '[^\\S\\r\\n]+']
module-attribute
raw_delimiter_pattern = make_pattern(raw_delimiters)
module-attribute
raw_delimiter_with_spaces_pattern = make_pattern(raw_delimiters + ['[^\\S\\r\\n]+'])
module-attribute
delimiter_pattern = make_pattern(delimiters)
module-attribute
ante_num_pattern = '(?<!.(?:{raw_delimiter_pattern})|[0-9][.,])'
module-attribute
post_num_pattern = '(?!{raw_delimiter_pattern})'
module-attribute
modes
modes = ['(?P<mode_FROM>depuis|depuis\\s+le|[àa]\\s+partir\\s+d[eu]|du)', "(?P<mode_UNTIL>jusqu'[àa]u?|au)"]
module-attribute
mode_pattern = make_pattern(modes, with_breaks=True)
module-attribute
time
hour_pattern = '(?<!\\d)(?P<hour>0?[1-9]|1\\d|2[0-3])(?!\\d)'
module-attribute
lz_hour_pattern = '(?<!\\d)(?P<hour>0[1-9]|[12]\\d|3[01])(?!\\d)'
module-attribute
minute_pattern = '(?<!\\d)(?P<minute>0?[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
lz_minute_pattern = '(?<!\\d)(?P<minute>0[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
second_pattern = '(?<!\\d)(?P<second>0?[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
lz_second_pattern = '(?<!\\d)(?P<second>0[1-9]|[1-5]\\d)(?!\\d)'
module-attribute
time_pattern = '(\\s.{,3}' + '{hour_pattern}[h:]({lz_minute_pattern})?' + '((:|m|min){lz_second_pattern})?' + ')?'
module-attribute
measures
factory
DEFAULT_CONFIG = dict(attr='NORM', ignore_excluded=False, measures=['eds.measures.size', 'eds.measures.weight', 'eds.measures.angle'])
module-attribute
create_component(nlp, name, measures, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/measures/factory.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
patterns
CompositeSize
Bases: CompositeMeasure
Composite size measure. Supports the following units: - mm - cm - dm - m
Source code in edsnlp/pipelines/misc/measures/patterns.py
11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
mm = property(make_multi_getter('mm'))
class-attribute
cm = property(make_multi_getter('cm'))
class-attribute
dm = property(make_multi_getter('dm'))
class-attribute
m = property(make_multi_getter('m'))
class-attribute
Size
Bases: SimpleMeasure
Size measure. Supports the following units: - mm - cm - dm - m
Source code in edsnlp/pipelines/misc/measures/patterns.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
COMPOSITE = CompositeSize
class-attribute
UNITS = {'mm': {'prefix': 'mill?im', 'abbr': 'mm', 'value': 1}, 'cm': {'prefix': 'centim', 'abbr': 'cm', 'value': 10}, 'dm': {'prefix': 'decim', 'abbr': 'dm', 'value': 100}, 'm': {'prefix': 'metre', 'abbr': 'm', 'value': 1000}}
class-attribute
mm = property(make_simple_getter('mm'))
class-attribute
cm = property(make_simple_getter('cm'))
class-attribute
dm = property(make_simple_getter('dm'))
class-attribute
m = property(make_simple_getter('m'))
class-attribute
parse(int_part, dec_part, unit, infix=False)
Source code in edsnlp/pipelines/misc/measures/patterns.py
44 45 46 47 |
|
Weight
Bases: SimpleMeasure
Weight measure. Supports the following units: - mg - cg - dg - g - kg
Source code in edsnlp/pipelines/misc/measures/patterns.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
COMPOSITE = None
class-attribute
UNITS = {'mg': {'prefix': 'mill?ig', 'abbr': 'mg', 'value': 1}, 'cg': {'prefix': 'centig', 'abbr': 'cg', 'value': 10}, 'dg': {'prefix': 'decig', 'abbr': 'dg', 'value': 100}, 'g': {'prefix': 'gram', 'abbr': 'g', 'value': 1000}, 'kg': {'prefix': 'kilo', 'abbr': 'kg', 'value': 1000000}}
class-attribute
mg = property(make_simple_getter('mg'))
class-attribute
cg = property(make_simple_getter('cg'))
class-attribute
dg = property(make_simple_getter('dg'))
class-attribute
g = property(make_simple_getter('g'))
class-attribute
kg = property(make_simple_getter('kg'))
class-attribute
parse(int_part, dec_part, unit, infix=False)
Source code in edsnlp/pipelines/misc/measures/patterns.py
75 76 77 78 |
|
Angle
Bases: SimpleMeasure
Angle measure. Supports the following units: - h
Source code in edsnlp/pipelines/misc/measures/patterns.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
COMPOSITE = None
class-attribute
UNITS = {'h': {'prefix': 'heur', 'abbr': 'h', 'value': 1}}
class-attribute
h = property(make_simple_getter('h'))
class-attribute
parse(int_part, dec_part, unit, infix=False)
Source code in edsnlp/pipelines/misc/measures/patterns.py
99 100 101 102 103 104 105 |
|
measures
Measure
Bases: abc.ABC
Source code in edsnlp/pipelines/misc/measures/measures.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
INTEGER = '(?:[0-9]+)'
class-attribute
CONJUNCTIONS = 'et|ou'
class-attribute
COMPOSERS = '[x*]|par'
class-attribute
UNITS = {}
class-attribute
COMPOSITE = None
class-attribute
__iter__()
Iter over items of the measure (only one for SimpleMeasure)
RETURNS | DESCRIPTION |
---|---|
iterable
|
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
131 132 133 134 135 136 137 138 139 |
|
__getitem__(item)
Access items of the measure (only one for SimpleMeasure)
PARAMETER | DESCRIPTION |
---|---|
item |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
measure
|
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
SimpleMeasure
Bases: Measure
Source code in edsnlp/pipelines/misc/measures/measures.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
value = value
instance-attribute
unit = unit
instance-attribute
__init__(value, unit)
The SimpleMeasure class contains the value and unit for a single non-composite measure
PARAMETER | DESCRIPTION |
---|---|
value |
TYPE:
|
unit |
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
parse(int_part, dec_part, unit, infix)
Class method to create an instance from the match groups
int_part : str The integer part of the match (eg 12 in 12 metres 50 or 12.50metres) dec_part : str The decimal part of the match (eg 50 in 12 metres 50 or 12.50metres) unit : str The normalized variant of the unit (eg "m" for 12 metre 50) infix : bool Whether the unit was in the before (True) or after (False) the decimal part
Source code in edsnlp/pipelines/misc/measures/measures.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
_get_scale_to(unit)
Source code in edsnlp/pipelines/misc/measures/measures.py
189 190 |
|
__iter__()
Source code in edsnlp/pipelines/misc/measures/measures.py
192 193 |
|
__getitem__(item)
Source code in edsnlp/pipelines/misc/measures/measures.py
195 196 197 |
|
__str__()
Source code in edsnlp/pipelines/misc/measures/measures.py
199 200 |
|
__repr__()
Source code in edsnlp/pipelines/misc/measures/measures.py
202 203 |
|
__eq__(other)
Source code in edsnlp/pipelines/misc/measures/measures.py
205 206 |
|
__lt__(other)
Source code in edsnlp/pipelines/misc/measures/measures.py
208 209 |
|
__le__(other)
Source code in edsnlp/pipelines/misc/measures/measures.py
211 212 |
|
CompositeMeasure
Bases: Measure
The CompositeMeasure class contains a sequence of multiple SimpleMeasure instances
PARAMETER | DESCRIPTION |
---|---|
measures |
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
measures = list(measures)
instance-attribute
__init__(measures)
Source code in edsnlp/pipelines/misc/measures/measures.py
225 226 227 |
|
__iter__()
Source code in edsnlp/pipelines/misc/measures/measures.py
229 230 |
|
__getitem__(item)
Source code in edsnlp/pipelines/misc/measures/measures.py
232 233 234 235 |
|
__str__()
Source code in edsnlp/pipelines/misc/measures/measures.py
237 238 |
|
__repr__()
Source code in edsnlp/pipelines/misc/measures/measures.py
240 241 |
|
Measures
Bases: BaseComponent
Matcher component to extract measures. A measures is most often composed of a number and a unit like
1,26 cm The unit can also be positioned in place of the decimal dot/comma 1 cm 26 Some measures can be composite 1,26 cm x 2,34 mm And sometimes they are factorized Les trois kystes mesurent 1, 2 et 3cm.
The recognized measures are stored in the "measures" SpanGroup.
Each span has a Measure
object stored in the "value" extension attribute.
PARAMETER | DESCRIPTION |
---|---|
nlp |
The SpaCy object.
TYPE:
|
measures |
The registry names of the measures to extract
TYPE:
|
attr |
Whether to match on the text ('TEXT') or on the normalized text ('NORM')
TYPE:
|
ignore_excluded |
Whether to exclude pollution patterns when matching in the text
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
regex_matcher = RegexMatcher(attr=attr, ignore_excluded=ignore_excluded)
instance-attribute
extraction_regexes = {}
instance-attribute
measures: Dict[str, Measure] = {}
instance-attribute
__init__(nlp, measures, attr, ignore_excluded)
Source code in edsnlp/pipelines/misc/measures/measures.py
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
set_extensions()
Source code in edsnlp/pipelines/misc/measures/measures.py
295 296 297 298 299 |
|
__call__(doc)
Adds measures to document's "measures" SpanGroup.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for extracted terms. |
Source code in edsnlp/pipelines/misc/measures/measures.py
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
disj_capture(regexes, capture=True)
Source code in edsnlp/pipelines/misc/measures/measures.py
14 15 16 17 18 19 20 |
|
rightmost_largest_sort_key(span)
Source code in edsnlp/pipelines/misc/measures/measures.py
23 24 |
|
make_patterns(measure)
Build recognition and extraction patterns for a given Measure class
PARAMETER | DESCRIPTION |
---|---|
measure |
The measure to build recognition and extraction patterns for
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
trigger
|
TYPE:
|
extraction
|
TYPE:
|
Source code in edsnlp/pipelines/misc/measures/measures.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
make_simple_getter(name)
Source code in edsnlp/pipelines/misc/measures/measures.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
make_multi_getter(name)
Source code in edsnlp/pipelines/misc/measures/measures.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
qualifiers
base
Qualifier
Bases: BaseComponent
Implements the NegEx algorithm.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.
TYPE:
|
on_ents_only |
Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE:
|
explain |
Whether to keep track of cues for each entity.
TYPE:
|
**terms |
Terms to look for.
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/base.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
defaults = dict()
class-attribute
phrase_matcher = EDSPhraseMatcher(vocab=nlp.vocab, attr=attr)
instance-attribute
on_ents_only = on_ents_only
instance-attribute
explain = explain
instance-attribute
__init__(nlp, attr, on_ents_only, explain, **terms)
Source code in edsnlp/pipelines/qualifiers/base.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
get_defaults(**kwargs)
Merge terms with their defaults. Null keys are replaced with defaults.
RETURNS | DESCRIPTION |
---|---|
Dict[str, List[str]]
|
Merged dictionary |
Source code in edsnlp/pipelines/qualifiers/base.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
get_matches(doc)
Extract matches.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
List of detected spans |
Source code in edsnlp/pipelines/qualifiers/base.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
__call__(doc)
Source code in edsnlp/pipelines/qualifiers/base.py
114 115 |
|
check_normalizer(nlp)
Source code in edsnlp/pipelines/qualifiers/base.py
12 13 14 15 16 17 18 19 20 21 22 |
|
factories
hypothesis
hypothesis
Hypothesis
Bases: Qualifier
Hypothesis detection with spaCy.
The component looks for five kinds of expressions in the text :
- preceding hypothesis, ie cues that precede a hypothetic expression
- following hypothesis, ie cues that follow a hypothetic expression
- pseudo hypothesis : contain a hypothesis cue, but are not hypothesis (eg "pas de doute"/"no doubt")
- hypothetic verbs : verbs indicating hypothesis (eg "douter")
- classic verbs conjugated to the conditional, thus indicating hypothesis
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
pseudo |
List of pseudo hypothesis cues.
TYPE:
|
preceding |
List of preceding hypothesis cues
TYPE:
|
following |
List of following hypothesis cues.
TYPE:
|
verbs_hyp |
List of hypothetic verbs.
TYPE:
|
verbs_eds |
List of mainstream verbs.
TYPE:
|
filter_matches |
Whether to filter out overlapping matches.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.
TYPE:
|
on_ents_only |
Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE:
|
within_ents |
Whether to consider cues within entities.
TYPE:
|
explain |
Whether to keep track of cues for each entity.
TYPE:
|
regex |
A dictionnary of regex patterns.
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/hypothesis/hypothesis.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
|
defaults = dict(following=following, preceding=preceding, pseudo=pseudo, termination=termination, verbs_eds=verbs_eds, verbs_hyp=verbs_hyp)
class-attribute
within_ents = within_ents
instance-attribute
__init__(nlp, attr, pseudo, preceding, following, termination, verbs_eds, verbs_hyp, on_ents_only, within_ents, explain)
Source code in edsnlp/pipelines/qualifiers/hypothesis/hypothesis.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
set_extensions()
Source code in edsnlp/pipelines/qualifiers/hypothesis/hypothesis.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
load_verbs(verbs_hyp, verbs_eds)
Conjugate "classic" verbs to conditional, and add hypothesis verbs conjugated to all tenses.
PARAMETER | DESCRIPTION |
---|---|
verbs_hyp |
TYPE:
|
verbs_eds |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list of hypothesis verbs conjugated at all tenses and classic
|
|
verbs conjugated to conditional.
|
Source code in edsnlp/pipelines/qualifiers/hypothesis/hypothesis.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
process(doc)
Finds entities related to hypothesis.
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
Source code in edsnlp/pipelines/qualifiers/hypothesis/hypothesis.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
|
factory
DEFAULT_CONFIG = dict(pseudo=None, preceding=None, following=None, termination=None, verbs_hyp=None, verbs_eds=None, attr='NORM', on_ents_only=True, within_ents=False, explain=False)
module-attribute
create_component(nlp, name, attr, pseudo, preceding, following, termination, verbs_eds, verbs_hyp, on_ents_only, within_ents, explain)
Source code in edsnlp/pipelines/qualifiers/hypothesis/factory.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
patterns
pseudo: List[str] = ['aucun doute', 'même si', 'pas de condition', 'pas de doute', 'sans aucun doute', 'sans condition', 'sans risque']
module-attribute
confirmation: List[str] = ['certain', 'certaine', 'certainement', 'certaines', 'certains', 'confirmer', 'évidemment', 'évident', 'évidente', 'montrer que', 'visiblement']
module-attribute
preceding: List[str] = ['à condition', 'à la condition que', 'à moins que', 'au cas où', 'conditionnellement', 'doute', 'en admettant que', 'en cas', 'en considérant que', 'en supposant que', 'éventuellement', 'faudrait', 'hypothèse', 'hypothèses', 'idée depas confirmer', 'pas sûr', 'pas sûre', 'peut correspondre', 'peut-être', 'peuvent correspondre', 'possible', 'possiblement', 'potentiel', 'potentielle', 'potentiellement', 'potentielles', 'potentiels', 'prédisposant à', 'probable', 'probablement', 'probables', "recherche d'recherche de", 'recherche des', 'risque', 'sauf si', 'selon', 'si', "s'il", 'soit', 'sous condition', 'sous réserve', 'suspicion']
module-attribute
following: List[str] = ['?', 'envisagé', 'envisageable', 'envisageables', 'envisagées', 'envisagés', 'hypothétique', 'hypothétiquement', 'hypothétiques', 'pas certain', 'pas certaine', 'pas clair', 'pas claire', 'pas confirmé', 'pas confirmée', 'pas confirmées', 'pas confirmés', 'pas évident', 'pas évidente', 'pas sûr', 'pas sûre', 'possible', 'potentiel', 'potentielle', 'potentiels', 'probable', 'probables', ': \n', ':\n']
module-attribute
verbs_hyp: List[str] = ['douter', 'envisager', "s'apparenter", 'sembler', 'soupçonner', 'suggérer', 'suspecter']
module-attribute
verbs_eds: List[str] = ['abandonner', 'abolir', 'aborder', 'accepter', 'accidenter', 'accompagnemer', 'accompagner', 'acoller', 'acquérir', 'activer', 'actualiser', 'adapter', 'adhérer', 'adjuver', 'admettre', 'administrer', 'adopter', 'adresser', 'aggraver', 'agir', 'agréer', 'aider', 'aimer', 'alcooliser', 'alerter', 'alimenter', 'aller', 'allonger', 'alléger', 'alterner', 'altérer', 'amender', 'amener', 'améliorer', 'amyotrophier', 'améliorer', 'analyser', 'anesthésier', 'animer', 'annexer', 'annuler', 'anonymiser', 'anticiper', 'anticoaguler', 'apercevoir', 'aplatir', 'apparaître', 'appareiller', 'appeler', 'appliquer', 'apporter', 'apprendre', 'apprécier', 'appuyer', 'argumenter', 'arquer', 'arrêter', 'arriver', 'arrêter', 'articuler', 'aspirer', 'asseoir', 'assister', 'associer', 'assurer', 'assécher', 'attacher', 'atteindre', 'attendre', 'attribuer', 'augmenter', 'autonomiser', 'autoriser', 'avaler', 'avancer', 'avertir', 'avoir', 'avérer', 'aérer', 'baisser', 'ballonner', 'blesser', 'bloquer', 'boire', 'border', 'brancher', 'brûler', 'bénéficier', 'cadrer', 'calcifier', 'calculer', 'calmer', 'canaliser', 'capter', 'carencer', 'casser', 'centrer', 'cerner', 'certifier', 'changer', 'charger', 'chevaucher', 'choisir', 'chronomoduler', 'chuter', 'cicatriser', 'circoncire', 'circuler', 'classer', 'codéiner', 'coincer', 'colorer', 'combler', 'commander', 'commencer', 'communiquer', 'comparer', 'compliquer', 'compléter', 'comporter', 'comprendre', 'comprimer', 'concerner', 'conclure', 'condamner', 'conditionner', 'conduire', 'confiner', 'confirmer', 'confronter', 'congeler', 'conjoindre', 'conjuguer', 'connaître', 'connecter', 'conseiller', 'conserver', 'considérer', 'consommer', 'constater', 'constituer', 'consulter', 'contacter', 'contaminer', 'contenir', 'contentionner', 'continuer', 'contracter', 'contrarier', 'contribuer', 'contrôler', 'convaincre', 'convenir', 'convier', 'convoquer', 'copier', 'correspondre', 'corriger', 'corréler', 'coucher', 'coupler', 'couvrir', 'crapotter', 'creuser', 'croire', 'croiser', 'créer', 'crémer', 'crépiter', 'cumuler', 'curariser', 'céder', 'dater', 'demander', 'demeurer', 'destiner', 'devenir', 'devoir', 'diagnostiquer', 'dialyser', 'dicter', 'diffuser', 'différencier', 'différer', 'digérer', 'dilater', 'diluer', 'diminuer', 'diner', 'dire', 'diriger', 'discuter', 'disparaître', 'disposer', 'dissocier', 'disséminer', 'disséquer', 'distendre', 'distinguer', 'divorcer', 'documenter', 'donner', 'dorer', 'doser', 'doubler', 'durer', 'dyaliser', 'dyspner', 'débuter', 'décaler', 'déceler', 'décider', 'déclarer', 'déclencher', 'découvrir', 'décrire', 'décroître', 'décurariser', 'décéder', 'dédier', 'définir', 'dégrader', 'délivrer', 'dépasser', 'dépendre', 'déplacer', 'dépolir', 'déposer', 'dériver', 'dérouler', 'désappareiller', 'désigner', 'désinfecter', 'désorienter', 'détecter', 'déterminer', 'détruire', 'développer', 'dévouer', 'dîner', 'écraser', 'effacer', 'effectuer', 'effondrer', 'emboliser', 'emmener', 'empêcher', 'encadrer', 'encourager', 'endormir', 'endurer', 'enlever', 'enregistrer', 'entamer', 'entendre', 'entourer', 'entraîner', 'entreprendre', 'entrer', 'envahir', 'envisager', 'envoyer', 'espérer', 'essayer', 'estimer', 'être', 'examiner', 'excentrer', 'exciser', 'exclure', 'expirer', 'expliquer', 'explorer', 'exposer', 'exprimer', 'extérioriser', 'exécuter', 'faciliter', 'faire', 'fatiguer', 'favoriser', 'faxer', 'fermer', 'figurer', 'fixer', 'focaliser', 'foncer', 'former', 'fournir', 'fractionner', 'fragmenter', 'fuiter', 'fusionner', 'garder', 'graver', 'guider', 'gérer', 'gêner', 'honorer', 'hopsitaliser', 'hospitaliser', 'hydrater', 'hyperartérialiser', 'hyperfixer', 'hypertrophier', 'hésiter', 'identifier', 'illustrer', 'immuniser', 'impacter', 'implanter', 'impliquer', 'importer', 'imposer', 'impregner', 'imprimer', 'inclure', 'indifferencier', 'indiquer', 'infecter', 'infertiliser', 'infiltrer', 'informer', 'inhaler', 'initier', 'injecter', 'inscrire', 'insister', 'installer', 'interdire', 'interpréter', 'interrompre', 'intervenir', 'intituler', 'introduire', 'intéragir', 'inverser', 'inviter', 'ioder', 'ioniser', 'irradier', 'itérativer', 'joindre', 'juger', 'justifier', 'laisser', 'laminer', 'lancer', 'latéraliser', 'laver', 'lever', 'lier', 'ligaturer', 'limiter', 'lire', 'localiser', 'loger', 'louper', 'luire', 'lutter', 'lyricer', 'lyser', 'maculer', 'macérer', 'maintenir', 'majorer', 'malaiser', 'manger', 'manifester', 'manipuler', 'manquer', 'marcher', 'marier', 'marmoner', 'marquer', 'masquer', 'masser', 'mater', 'mener', 'mesurer', 'meteoriser', 'mettre', 'mitiger', 'modifier', 'moduler', 'modérer', 'monter', 'montrer', 'motiver', 'moucheter', 'mouler', 'mourir', 'multiopéréer', 'munir', 'muter', 'médicaliser', 'météoriser', 'naître', 'normaliser', 'noter', 'nuire', 'numériser', 'nécessiter', 'négativer', 'objectiver', 'observer', 'obstruer', 'obtenir', 'occasionner', 'occuper', 'opposer', 'opérer', 'organiser', 'orienter', 'ouvrir', 'palper', 'parasiter', 'paraître', 'parcourir', 'parer', 'paresthésier', 'parfaire', 'partager', 'partir', 'parvenir', 'passer', 'penser', 'percevoir', 'perdre', 'perforer', 'permettre', 'persister', 'personnaliser', 'peser', 'pigmenter', 'piloter', 'placer', 'plaindre', 'planifier', 'plier', 'plonger', 'porter', 'poser', 'positionner', 'posséder', 'poursuivre', 'pousser', 'pouvoir', 'pratiquer', 'preciser', 'prendre', 'prescrire', 'prier', 'produire', 'programmer', 'prolonger', 'prononcer', 'proposer', 'prouver', 'provoquer', 'préciser', 'précéder', 'prédominer', 'préexister', 'préférer', 'prélever', 'préparer', 'présenter', 'préserver', 'prévenir', 'prévoir', 'puruler', 'pénétrer', 'radiofréquencer', 'ralentir', 'ramener', 'rappeler', 'rapporter', 'rapprocher', 'rassurer', 'rattacher', 'rattraper', 'realiser', 'recenser', 'recevoir', 'rechercher', 'recommander', 'reconnaître', 'reconsulter', 'recontacter', 'recontrôler', 'reconvoquer', 'recouvrir', 'recueillir', 'recuperer', 'redescendre', 'rediscuter', 'refaire', 'refouler', 'refuser', 'regarder', 'rehausser', 'relancer', 'relayer', 'relever', 'relire', 'relâcher', 'remanier', 'remarquer', 'remercier', 'remettre', 'remonter', 'remplacer', 'remplir', 'rencontrer', 'rendormir', 'rendre', 'renfermer', 'renforcer', 'renouveler', 'renseigner', 'rentrer', 'reparler', 'repasser', 'reporter', 'reprendre', 'represcrire', 'reproduire', 'reprogrammer', 'représenter', 'repérer', 'requérir', 'respecter', 'ressembler', 'ressentir', 'rester', 'restreindre', 'retarder', 'retenir', 'retirer', 'retrouver', 'revasculariser', 'revenir', 'reverticaliser', 'revoir', 'rompre', 'rouler', 'réadapter', 'réadmettre', 'réadresser', 'réaliser', 'récidiver', 'récupérer', 'rédiger', 'réduire', 'réessayer', 'réexpliquer', 'référer', 'régler', 'régresser', 'réhausser', 'réopérer', 'répartir', 'répondre', 'répéter', 'réserver', 'résorber', 'résoudre', 'réséquer', 'réveiller', 'révéler', 'réévaluer', 'rêver', 'sacrer', 'saisir', 'satisfaire', 'savoir', 'scanner', 'scolariser', 'sembler', 'sensibiliser', 'sentir', 'serrer', 'servir', 'sevrer', 'signaler', 'signer', 'situer', 'siéger', 'soigner', 'sommeiller', 'sonder', 'sortir', 'souffler', 'souhaiter', 'soulager', 'soussigner', 'souvenir', 'spécialiser', 'stabiliser', 'statuer', 'stenter', 'stopper', 'stratifier', 'subir', 'substituer', 'sucrer', 'suggérer', 'suivre', 'supporter', 'supprimer', 'surajouter', 'surmonter', 'surveiller', 'survenir', 'suspecter', 'suspendre', 'suturer', 'synchroniser', 'systématiser', 'sécréter', 'sécuriser', 'sédater', 'séjourner', 'séparer', 'taire', 'taper', 'teinter', 'tendre', 'tenir', 'tenter', 'terminer', 'tester', 'thromboser', 'tirer', 'tiroir', 'tissulaire', 'titulariser', 'tolérer', 'tourner', 'tracer', 'trachéotomiser', 'traduire', 'traiter', 'transcrire', 'transférer', 'transmettre', 'transporter', 'trasnfixer', 'travailler', 'tronquer', 'trouver', 'téléphoner', 'ulcérer', 'uriner', 'utiliser', 'vacciner', 'valider', 'valoir', 'varier', 'vasculariser', 'venir', 'verifier', 'vieillir', 'viser', 'visualiser', 'vivre', 'voir', 'vouloir', 'vérifier', 'ébaucher', 'écarter', 'échographier', 'échoguider', 'échoir', 'échouer', 'éclairer', 'écraser', 'élargir', 'éliminer', 'émousser', 'épaissir', 'épargner', 'épuiser', 'épurer', 'équilibrer', 'établir', 'étager', 'étendre', 'étiqueter', 'étrangler', 'évaluer', 'éviter', 'évoluer', 'évoquer', 'être']
module-attribute
family
factory
DEFAULT_CONFIG = dict(family=None, termination=None, attr='NORM', use_sections=False, explain=False, on_ents_only=True)
module-attribute
create_component(nlp, name, family, termination, attr, explain, on_ents_only, use_sections)
Source code in edsnlp/pipelines/qualifiers/family/factory.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
family
FamilyContext
Bases: Qualifier
Implements a family context detection algorithm.
The components looks for terms indicating family references in the text.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
family |
List of terms indicating family reference.
TYPE:
|
terminations |
List of termination terms, to separate syntagmas.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.
TYPE:
|
on_ents_only |
Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE:
|
regex |
A dictionnary of regex patterns.
TYPE:
|
explain |
Whether to keep track of cues for each entity.
TYPE:
|
use_sections |
Whether to use annotated sections (namely
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/family/family.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
defaults = dict(family=family, termination=termination)
class-attribute
sections = use_sections and 'eds.sections' in nlp.pipe_names or 'sections' in nlp.pipe_names
instance-attribute
__init__(nlp, attr, family, termination, use_sections, explain, on_ents_only)
Source code in edsnlp/pipelines/qualifiers/family/family.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
set_extensions()
Source code in edsnlp/pipelines/qualifiers/family/family.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
process(doc)
Finds entities related to family context.
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
Source code in edsnlp/pipelines/qualifiers/family/family.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
patterns
family: List[str] = ['aïeul', 'aïeux', 'antécédent familial', 'antécédents familiaux', 'arrière-grand-mère', 'arrière-grand-père', 'arrière-grands-parents', 'cousin', 'cousine', 'cousines', 'cousins', 'enfant', 'enfants', 'épouse', 'époux', 'familial', 'familiale', 'familiales', 'familiaux', 'famille', 'fiancé', 'fiancée', 'fils', 'frère', 'frères', 'grand-mère', 'grand-père', 'grands-parents', 'maman', 'mari', 'mère', 'oncle', 'papa', 'parent', 'parents', 'père', 'soeur', 'sœur', 'sœurs', 'soeurs', 'tante']
module-attribute
negation
factory
DEFAULT_CONFIG = dict(pseudo=None, preceding=None, following=None, termination=None, verbs=None, attr='NORM', on_ents_only=True, within_ents=False, explain=False)
module-attribute
create_component(nlp, name, attr, pseudo, preceding, following, termination, verbs, on_ents_only, within_ents, explain)
Source code in edsnlp/pipelines/qualifiers/negation/factory.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
patterns
pseudo: List[str] = ['aucun changement', 'aucun doute', 'aucune hésitation', 'aucune diminution', 'ne permet pas d', 'ne permet pas de', "n'exclut pas", 'non négligeable', "pas d'amélioration", "pas d'augmentation", "pas d'autre", 'pas de changement', 'pas de diminution', 'pas de doute', 'pas exclu', 'pas exclue', 'pas exclues', 'pas exclus', 'pas immunisé', 'pas immunisée', 'pas immunisés', 'pas immunisées', 'sans amélioration', 'sans aucun doute', 'sans augmentation', 'sans certitude', 'sans changement', 'sans diminution', 'sans doute', 'sans être certain']
module-attribute
preceding: List[str] = ['à la place de', 'absence', 'absence de signe de', 'absence de', 'aucun signe de', 'aucun', 'aucune preuve', 'aucune', 'aucunes', 'aucuns', 'décline', 'décliné', 'dépourvu', 'dépourvue', 'dépourvues', 'dépourvus', 'disparition de', 'disparition des', 'excluent', 'exclut', 'impossibilité de', 'immunisé', 'immunisée', 'immunisés', 'immunisées', 'incompatible avec', 'incompatibles avec', 'jamais', 'ne manifestaient pas', 'ne manifestait pas', 'ne manifeste pas', 'ne manifestent pas', 'ne pas', 'ne présentaient pas', 'ne présentait pas', 'ne présente pas', 'ne présentent pas', 'ne ressemble pas', 'ne ressemblent pas', 'négatif pour', "n'est pas", "n'était pas", 'ni', 'niant', 'nie', 'nié', 'nullement', 'pas d', 'pas de cause de', 'pas de signe de', 'pas de signes de', 'pas de', 'pas nécessaire de', 'pas', "permet d'exclure", "plus d'aspect de", 'sans manifester de', 'sans présenter de', 'sans', 'symptôme atypique']
module-attribute
following: List[str] = [':0', ': 0', ':non', ': non', 'absent', 'absente', 'absentes', 'absents', 'dépourvu', 'dépourvue', 'dépourvues', 'dépourvus', 'disparaissent', 'disparait', 'est exclu', 'est exclue', 'immunisé', 'immunisée', 'immunisés', 'immunisées', 'impossible', 'improbable', 'négatif', 'négatifs', 'négative', 'négatives', 'négligeable', 'négligeables', 'nié', 'niée', 'non', 'pas nécessaire', 'peu probable', 'sont exclues', 'sont exclus']
module-attribute
verbs: List[str] = ['éliminer', 'exclure', 'interdire', 'nier', 'réfuter', 'rejeter']
module-attribute
negation
Negation
Bases: Qualifier
Implements the NegEx algorithm.
The component looks for five kinds of expressions in the text :
-
preceding negations, ie cues that precede a negated expression
-
following negations, ie cues that follow a negated expression
-
pseudo negations : contain a negation cue, but are not negations (eg "pas de doute"/"no doubt")
-
negation verbs, ie verbs that indicate a negation
-
terminations, ie words that delimit propositions. The negation spans from the preceding cue to the termination.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
attr |
spaCy's attribute to use
TYPE:
|
pseudo |
List of pseudo negation terms.
TYPE:
|
preceding |
List of preceding negation terms
TYPE:
|
following |
List of following negation terms.
TYPE:
|
termination |
List of termination terms.
TYPE:
|
verbs |
List of negation verbs.
TYPE:
|
on_ents_only |
Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE:
|
within_ents |
Whether to consider cues within entities.
TYPE:
|
explain |
Whether to keep track of cues for each entity.
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
|
defaults = dict(following=following, preceding=preceding, pseudo=pseudo, verbs=verbs, termination=termination)
class-attribute
within_ents = within_ents
instance-attribute
__init__(nlp, attr, pseudo, preceding, following, termination, verbs, on_ents_only, within_ents, explain)
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
set_extensions()
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
load_verbs(verbs)
Conjugate negating verbs to specific tenses.
PARAMETER | DESCRIPTION |
---|---|
verbs |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list_neg_verbs
|
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
annotate_entity(ent, sub_preceding, sub_following)
Annotate entities using preceding and following negations.
PARAMETER | DESCRIPTION |
---|---|
ent |
Entity to annotate
TYPE:
|
sub_preceding |
List of preceding negations cues
TYPE:
|
sub_following |
List of following negations cues
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
process(doc)
Finds entities related to negation.
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
__call__(doc)
Source code in edsnlp/pipelines/qualifiers/negation/negation.py
273 274 |
|
history
history
History
Bases: Qualifier
Implements an history detection algorithm.
The components looks for terms indicating history in the text.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
history |
List of terms indicating medical history reference.
TYPE:
|
termination |
List of syntagme termination terms.
TYPE:
|
use_sections |
Whether to use section pipeline to detect medical history section.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.
TYPE:
|
on_ents_only |
Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE:
|
regex |
A dictionnary of regex patterns.
TYPE:
|
explain |
Whether to keep track of cues for each entity.
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/history/history.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|
defaults = dict(history=history, termination=termination)
class-attribute
sections = use_sections and 'eds.sections' in nlp.pipe_names or 'sections' in nlp.pipe_names
instance-attribute
__init__(nlp, attr, history, termination, use_sections, explain, on_ents_only)
Source code in edsnlp/pipelines/qualifiers/history/history.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
set_extensions()
Source code in edsnlp/pipelines/qualifiers/history/history.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
process(doc)
Finds entities related to history.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for history |
Source code in edsnlp/pipelines/qualifiers/history/history.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|
factory
DEFAULT_CONFIG = dict(attr='NORM', history=patterns.history, termination=termination, use_sections=False, explain=False, on_ents_only=True)
module-attribute
create_component(nlp, name, history, termination, use_sections, attr, explain, on_ents_only)
Source code in edsnlp/pipelines/qualifiers/history/factory.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
patterns
history = ['antécédents', 'atcd', 'atcds', 'tacds', 'antécédent']
module-attribute
reported_speech
factory
DEFAULT_CONFIG = dict(pseudo=None, preceding=None, following=None, quotation=None, verbs=None, attr='NORM', on_ents_only=True, within_ents=False, explain=False)
module-attribute
create_component(nlp, name, attr, pseudo, preceding, following, quotation, verbs, on_ents_only, within_ents, explain)
Source code in edsnlp/pipelines/qualifiers/reported_speech/factory.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
patterns
verbs: List[str] = ['affirmer', 'ajouter', 'assurer', 'confirmer', 'demander', 'dire', 'déclarer', 'décrire', 'décrire', 'démontrer', 'expliquer', 'faire remarquer', 'indiquer', 'informer', 'insinuer', 'insister', 'jurer', 'nier', 'nier', 'noter', 'objecter', 'observer', 'parler', 'promettre', 'préciser', 'prétendre', 'prévenir', 'raconter', 'rappeler', 'rapporter', 'reconnaître', 'réfuter', 'répliquer', 'répondre', 'répéter', 'révéler', 'se plaindre', 'souhaiter', 'souligner', 'supplier', 'verbaliser', 'vouloir', 'vouloir']
module-attribute
following: List[str] = ["d'après le patient", "d'après la patiente"]
module-attribute
preceding: List[str] = ['pas de critique de', 'crainte de', 'menace de', 'insiste sur le fait que', "d'après le patient", "d'après la patiente", 'peur de']
module-attribute
quotation: str = '(\\".+\\")|(\\«.+\\»)'
module-attribute
reported_speech
ReportedSpeech
Bases: Qualifier
Implements a reported speech detection algorithm.
The components looks for terms indicating patient statements, and quotations to detect patient speech.
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy nlp pipeline to use for matching.
TYPE:
|
quotation |
String gathering all quotation cues.
TYPE:
|
verbs |
List of reported speech verbs.
TYPE:
|
following |
List of terms following a reported speech.
TYPE:
|
preceding |
List of terms preceding a reported speech.
TYPE:
|
filter_matches |
Whether to filter out overlapping matches.
TYPE:
|
attr |
spaCy's attribute to use: a string with the value "TEXT" or "NORM", or a dict with the key 'term_attr' we can also add a key for each regex.
TYPE:
|
on_ents_only |
Whether to look for matches around detected entities only. Useful for faster inference in downstream tasks.
TYPE:
|
within_ents |
Whether to consider cues within entities.
TYPE:
|
explain |
Whether to keep track of cues for each entity.
TYPE:
|
Source code in edsnlp/pipelines/qualifiers/reported_speech/reported_speech.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
defaults = dict(following=following, preceding=preceding, verbs=verbs, quotation=quotation)
class-attribute
regex_matcher = RegexMatcher(attr=attr)
instance-attribute
within_ents = within_ents
instance-attribute
__init__(nlp, attr, pseudo, preceding, following, quotation, verbs, on_ents_only, within_ents, explain)
Source code in edsnlp/pipelines/qualifiers/reported_speech/reported_speech.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
set_extensions()
Source code in edsnlp/pipelines/qualifiers/reported_speech/reported_speech.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
load_verbs(verbs)
Conjugate reporting verbs to specific tenses (trhid person)
PARAMETER | DESCRIPTION |
---|---|
verbs |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list_rep_verbs
|
Source code in edsnlp/pipelines/qualifiers/reported_speech/reported_speech.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
process(doc)
Finds entities related to reported speech.
PARAMETER | DESCRIPTION |
---|---|
doc |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
Source code in edsnlp/pipelines/qualifiers/reported_speech/reported_speech.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
matchers
regex
RegexMatcher
Bases: object
Simple RegExp matcher.
PARAMETER | DESCRIPTION |
---|---|
alignment_mode |
How spans should be aligned with tokens.
Possible values are
TYPE:
|
attr |
Default attribute to match on, by default "TEXT".
Can be overiden in the
TYPE:
|
ignore_excluded |
Whether to skip exclusions
TYPE:
|
Source code in edsnlp/matchers/regex.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
|
alignment_mode = alignment_mode
instance-attribute
regex = []
instance-attribute
default_attr = attr
instance-attribute
ignore_excluded = ignore_excluded
instance-attribute
__init__(alignment_mode='expand', attr='TEXT', ignore_excluded=False)
Source code in edsnlp/matchers/regex.py
135 136 137 138 139 140 141 142 143 144 145 146 |
|
build_patterns(regex)
Build patterns and adds them for matching. Helper function for pipelines using this matcher.
PARAMETER | DESCRIPTION |
---|---|
regex |
Dictionary of label/terms, or label/dictionary of terms/attribute.
TYPE:
|
Source code in edsnlp/matchers/regex.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
|
add(key, patterns, attr=None, ignore_excluded=None, alignment_mode=None)
Add a pattern to the registry.
PARAMETER | DESCRIPTION |
---|---|
key |
Key of the new/updated pattern.
TYPE:
|
patterns |
List of patterns to add.
TYPE:
|
attr |
Attribute to use for matching.
By default uses the
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens during matching.
TYPE:
|
alignment_mode |
Overwrite alignment mode.
TYPE:
|
Source code in edsnlp/matchers/regex.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
remove(key)
Remove a pattern for the registry.
PARAMETER | DESCRIPTION |
---|---|
key |
key of the pattern to remove.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the key is not present in the registered patterns. |
Source code in edsnlp/matchers/regex.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
__len__()
Source code in edsnlp/matchers/regex.py
238 239 |
|
match(doclike)
Iterates on the matches.
PARAMETER | DESCRIPTION |
---|---|
doclike |
spaCy Doc or Span object to match on.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
span
|
A match. |
Source code in edsnlp/matchers/regex.py
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
|
__call__(doclike, as_spans=False, return_groupdict=False)
Performs matching. Yields matches.
PARAMETER | DESCRIPTION |
---|---|
doclike |
spaCy Doc or Span object.
TYPE:
|
as_spans |
Returns matches as spans.
DEFAULT:
|
YIELDS | DESCRIPTION |
---|---|
span
|
A match. |
groupdict
|
Additional information coming from the named patterns in the regular expression. |
Source code in edsnlp/matchers/regex.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
|
get_first_included(doclike)
Source code in edsnlp/matchers/regex.py
13 14 15 16 17 18 |
|
create_span(doclike, start_char, end_char, key, attr, alignment_mode, ignore_excluded)
spaCy only allows strict alignment mode for char_span on Spans. This method circumvents this.
PARAMETER | DESCRIPTION |
---|---|
doclike |
TYPE:
|
start_char |
Character index within the Doc-like object.
TYPE:
|
end_char |
Character index of the end, within the Doc-like object.
TYPE:
|
key |
The key used to match.
TYPE:
|
alignment_mode |
The alignment mode.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
span
|
A span matched on the Doc-like object. |
Source code in edsnlp/matchers/regex.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
utils
ListOrStr = Union[List[str], str]
module-attribute
DictOrPattern = Union[Dict[str, ListOrStr], ListOrStr]
module-attribute
Patterns = Dict[str, DictOrPattern]
module-attribute
ATTRIBUTES = {'LOWER': 'lower_', 'TEXT': 'text', 'NORM': 'norm_', 'SHAPE': 'shape_'}
module-attribute
offset
token_length(token, custom, attr)
Source code in edsnlp/matchers/utils/offset.py
10 11 12 13 14 15 |
|
alignment(doc, attr='TEXT', ignore_excluded=True)
Align different representations of a Doc
or Span
object.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy
TYPE:
|
attr |
Attribute to use, by default
TYPE:
|
ignore_excluded |
Whether to remove excluded tokens, by default True
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[List[int], List[int]]
|
An alignment tuple: original and clean lists. |
Source code in edsnlp/matchers/utils/offset.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
offset(doc, attr, ignore_excluded, index)
Compute offset between the original text and a given representation
(defined by the couple attr
, ignore_excluded
).
The alignment itself is computed with
alignment
.
PARAMETER | DESCRIPTION |
---|---|
doc |
The spaCy
TYPE:
|
attr |
The attribute used by the
TYPE:
|
ignore_excluded |
Whether the RegexMatcher ignores excluded tokens.
TYPE:
|
index |
The index in the pre-processed text.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
The offset. To get the character index in the original document,
just do: |
Source code in edsnlp/matchers/utils/offset.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
text
get_text(doclike, attr, ignore_excluded)
Get text using a custom attribute, possibly ignoring excluded tokens.
PARAMETER | DESCRIPTION |
---|---|
doclike |
Doc or Span to get text from.
TYPE:
|
attr |
Attribute to use.
TYPE:
|
ignore_excluded |
Whether to skip excluded tokens, by default False
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Extracted text. |
Source code in edsnlp/matchers/utils/text.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
connectors
brat
BratConnector
Bases: object
Two-way connector with BRAT. Supports entities only.
PARAMETER | DESCRIPTION |
---|---|
directory |
Directory containing the BRAT files.
TYPE:
|
n_jobs |
Number of jobs for multiprocessing, by default 1
TYPE:
|
Source code in edsnlp/connectors/brat.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|
directory = directory
instance-attribute
n_jobs = n_jobs
instance-attribute
__init__(directory, n_jobs=1)
Source code in edsnlp/connectors/brat.py
66 67 68 69 70 |
|
full_path(filename)
Source code in edsnlp/connectors/brat.py
72 73 |
|
read_file(filename)
Reads a file within the BRAT directory.
PARAMETER | DESCRIPTION |
---|---|
filename |
The path to the file within the BRAT directory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
text
|
The text content of the file. |
Source code in edsnlp/connectors/brat.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
read_texts()
Reads all texts from the BRAT folder.
RETURNS | DESCRIPTION |
---|---|
texts
|
DataFrame containing all texts in the BRAT directory. |
Source code in edsnlp/connectors/brat.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
read_brat_annotation(note_id)
Reads BRAT annotation inside the BRAT directory.
PARAMETER | DESCRIPTION |
---|---|
note_id |
Note ID within the BRAT directory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
annotations
|
DataFrame containing the annotations for the given note. |
Source code in edsnlp/connectors/brat.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
|
read_annotations(texts)
Source code in edsnlp/connectors/brat.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
get_brat()
Reads texts and annotations, and returns two DataFrame objects.
RETURNS | DESCRIPTION |
---|---|
texts
|
A DataFrame containing two fields, |
annotations
|
A DataFrame containing the annotations. |
Source code in edsnlp/connectors/brat.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
brat2docs(nlp)
Transforms a BRAT folder to a list of spaCy documents.
PARAMETER | DESCRIPTION |
---|---|
nlp |
A spaCy pipeline.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
docs
|
List of spaCy documents, with annotations in the |
Source code in edsnlp/connectors/brat.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|
doc2brat(doc)
Writes a spaCy document to file in the BRAT directory.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object. The spans in
TYPE:
|
Source code in edsnlp/connectors/brat.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
|
docs2brat(docs)
Writes a list of spaCy documents to file.
PARAMETER | DESCRIPTION |
---|---|
docs |
List of spaCy documents.
TYPE:
|
Source code in edsnlp/connectors/brat.py
275 276 277 278 279 280 281 282 283 284 285 |
|
read_brat_annotation(filename)
Read BRAT annotation file and returns a pandas DataFrame.
PARAMETER | DESCRIPTION |
---|---|
filename |
Path to the annotation file.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
annotations
|
DataFrame containing the annotations. |
Source code in edsnlp/connectors/brat.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
omop
OmopConnector
Bases: object
[summary]
PARAMETER | DESCRIPTION |
---|---|
nlp |
spaCy language object.
TYPE:
|
start_char |
Name of the column containing the start character index of the entity, by default "start_char"
TYPE:
|
end_char |
Name of the column containing the end character index of the entity, by default "end_char"
TYPE:
|
Source code in edsnlp/connectors/omop.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
|
start_char = start_char
instance-attribute
end_char = end_char
instance-attribute
nlp = nlp
instance-attribute
__init__(nlp, start_char='start_char', end_char='end_char')
Source code in edsnlp/connectors/omop.py
201 202 203 204 205 206 207 208 209 210 211 |
|
preprocess(note, note_nlp)
Preprocess the input OMOP tables: modification of the column names.
PARAMETER | DESCRIPTION |
---|---|
note |
OMOP
TYPE:
|
note_nlp |
OMOP
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
note
|
OMOP
TYPE:
|
note_nlp
|
OMOP
TYPE:
|
Source code in edsnlp/connectors/omop.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
postprocess(note, note_nlp)
Postprocess the input OMOP tables: modification of the column names.
PARAMETER | DESCRIPTION |
---|---|
note |
OMOP
TYPE:
|
note_nlp |
OMOP
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
note
|
OMOP
TYPE:
|
note_nlp
|
OMOP
TYPE:
|
Source code in edsnlp/connectors/omop.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
omop2docs(note, note_nlp, extensions=None)
Transforms OMOP tables to a list of spaCy documents.
PARAMETER | DESCRIPTION |
---|---|
note |
OMOP
TYPE:
|
note_nlp |
OMOP
TYPE:
|
extensions |
Extensions to keep, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Doc]
|
List of spaCy documents. |
Source code in edsnlp/connectors/omop.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
|
docs2omop(docs, extensions=None)
Transforms a list of spaCy documents to a pair of OMOP tables.
PARAMETER | DESCRIPTION |
---|---|
docs |
List of spaCy documents.
TYPE:
|
extensions |
Extensions to keep, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
note
|
OMOP
TYPE:
|
note_nlp
|
OMOP
TYPE:
|
Source code in edsnlp/connectors/omop.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
|
omop2docs(note, note_nlp, nlp, extensions=None)
Transforms an OMOP-formatted pair of dataframes into a list of documents.
PARAMETER | DESCRIPTION |
---|---|
note |
The OMOP
TYPE:
|
note_nlp |
The OMOP
TYPE:
|
nlp |
spaCy language object.
TYPE:
|
extensions |
Extensions to keep, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Doc]
|
List of spaCy documents |
Source code in edsnlp/connectors/omop.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
docs2omop(docs, extensions=None)
Transforms a list of spaCy docs to a pair of OMOP tables.
PARAMETER | DESCRIPTION |
---|---|
docs |
List of documents to transform.
TYPE:
|
extensions |
Extensions to keep, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[pd.DataFrame, pd.DataFrame]
|
Pair of OMOP tables ( |
Source code in edsnlp/connectors/omop.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
|
labeltool
docs2labeltool(docs, extensions=None)
Returns a labeltool-ready dataframe from a list of annotated document.
PARAMETER | DESCRIPTION |
---|---|
docs |
List of annotated spacy docs.
TYPE:
|
extensions |
List of extensions to use by labeltool.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
df
|
DataFrame tailored for labeltool. |
Source code in edsnlp/connectors/labeltool.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
processing
distributed
pyspark_type_finder(obj)
Returns (when possible) the PySpark type of any python object
Source code in edsnlp/processing/distributed.py
20 21 22 23 24 25 26 27 28 29 |
|
module_checker(func, *args, **kwargs)
Source code in edsnlp/processing/distributed.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
pipe(note, nlp, context=[], additional_spans='discarded', extensions=[])
Function to apply a spaCy pipe to a pyspark or koalas DataFrame note
PARAMETER | DESCRIPTION |
---|---|
note |
A Pyspark or Koalas DataFrame with a
TYPE:
|
nlp |
A spaCy pipe
TYPE:
|
context |
A list of column to add to the generated SpaCy document as an extension.
For instance, if
TYPE:
|
additional_spans |
A name (or list of names) of SpanGroup on which to apply the pipe too:
SpanGroup are available as
TYPE:
|
extensions |
Spans extensions to add to the extracted results:
FOr instance, if
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A pyspark DataFrame with one line per extraction |
Source code in edsnlp/processing/distributed.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
wrapper
pipe(note, nlp, n_jobs=-2, context=[], additional_spans='discarded', extensions=[], **kwargs)
Function to apply a spaCy pipe to a pandas or pyspark DataFrame
PARAMETER | DESCRIPTION |
---|---|
note |
A pandas/pyspark/koalas DataFrame with a
TYPE:
|
nlp |
A spaCy pipe
TYPE:
|
context |
A list of column to add to the generated SpaCy document as an extension.
For instance, if
TYPE:
|
n_jobs |
Only used when providing a Pandas DataFrame
TYPE:
|
additional_spans |
A name (or list of names) of SpanGroup on which to apply the pipe too:
SpanGroup are available as
TYPE:
|
extensions |
Spans extensions to add to the extracted results:
For instance, if
TYPE:
|
kwargs |
Additional parameters depending on the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A DataFrame with one line per extraction |
Source code in edsnlp/processing/wrapper.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
|
parallel
nlp = spacy.blank('eds')
module-attribute
_define_nlp(new_nlp)
Set the global nlp variable Doing it this way saves non negligeable amount of time
Source code in edsnlp/processing/parallel.py
14 15 16 17 18 19 20 |
|
_chunker(iterable, total_length, chunksize)
Takes an iterable and chunk it.
Source code in edsnlp/processing/parallel.py
23 24 25 26 27 28 29 30 31 32 33 |
|
_process_chunk(note, **pipe_kwargs)
Source code in edsnlp/processing/parallel.py
36 37 38 39 40 41 42 43 |
|
pipe(note, nlp, context=[], additional_spans='discarded', extensions=[], chunksize=100, n_jobs=-2, progress_bar=True, **pipe_kwargs)
Function to apply a spaCy pipe to a pandas DataFrame note by using multiprocessing
PARAMETER | DESCRIPTION |
---|---|
note |
A pandas DataFrame with a
TYPE:
|
nlp |
A spaCy pipe
TYPE:
|
context |
A list of column to add to the generated SpaCy document as an extension.
For instance, if
TYPE:
|
additional_spans |
A name (or list of names) of SpanGroup on which to apply the pipe too:
SpanGroup are available as
TYPE:
|
extensions |
Spans extensions to add to the extracted results:
FOr instance, if
TYPE:
|
chunksize |
Batch size used to split tasks
TYPE:
|
n_jobs |
Max number of parallel jobs. The default value uses the maximum number of available cores.
TYPE:
|
progress_bar |
Whether to display a progress bar or not
TYPE:
|
**pipe_kwargs |
Arguments exposed in
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A pandas DataFrame with one line per extraction |
Source code in edsnlp/processing/parallel.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
simple
nlp = spacy.blank('eds')
module-attribute
ExtensionSchema = Union[str, List[str], Dict[str, Any]]
module-attribute
_df_to_spacy(note, nlp, context)
Takes a pandas DataFrame and return a generator that can be used in
nlp.pipe()
.
PARAMETER | DESCRIPTION |
---|---|
note |
A pandas DataFrame with at least
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
generator
|
A generator which items are of the form (text, context), with |
Source code in edsnlp/processing/simple.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
_flatten(list_of_lists)
Flatten a list of lists to a combined list.
Source code in edsnlp/processing/simple.py
64 65 66 67 68 |
|
_pipe_generator(note, nlp, context=[], additional_spans='discarded', extensions=[], batch_size=50, progress_bar=True)
Source code in edsnlp/processing/simple.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
_single_schema(ent, span_type='ents', extensions=[])
Source code in edsnlp/processing/simple.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
_full_schema(doc, additional_spans=[], extensions=[])
Function used when Parallelising tasks via joblib. Takes a Doc as input, and returns a list of serializable objects
Note
The parallelisation needs for output objects to be serializable: after splitting the task into separate jobs, intermediate results are saved on memory before being aggregated, thus the need to be serializable. For instance, spaCy's spans aren't serializable since they are merely a view of the parent document.
Check the source code of this function for an example.
Source code in edsnlp/processing/simple.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|
pipe(note, nlp, context=[], additional_spans='discarded', extensions=[], batch_size=1000, progress_bar=True)
Function to apply a spaCy pipe to a pandas DataFrame note For a large DataFrame, prefer the parallel version.
PARAMETER | DESCRIPTION |
---|---|
note |
A pandas DataFrame with a
TYPE:
|
nlp |
A spaCy pipe
TYPE:
|
context |
A list of column to add to the generated SpaCy document as an extension.
For instance, if
TYPE:
|
additional_spans |
A name (or list of names) of SpanGroup on which to apply the pipe too:
SpanGroup are available as
TYPE:
|
extensions |
Spans extensions to add to the extracted results:
For instance, if
TYPE:
|
batch_size |
Batch size used by spaCy's pipe
TYPE:
|
progress_bar |
Whether to display a progress bar or not
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A pandas DataFrame with one line per extraction |
Source code in edsnlp/processing/simple.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
helpers
DataFrames = None
module-attribute
spec = importlib.util.find_spec(module.value)
module-attribute
DataFrameModules
Bases: Enum
Source code in edsnlp/processing/helpers.py
9 10 11 12 |
|
PANDAS = 'pandas'
class-attribute
PYSPARK = 'pyspark.sql'
class-attribute
KOALAS = 'databricks.koalas'
class-attribute
get_module(df)
Source code in edsnlp/processing/helpers.py
26 27 28 29 |
|
check_spacy_version_for_context()
Source code in edsnlp/processing/helpers.py
32 33 34 35 36 37 38 39 40 41 |
|
utils
examples
entity_pattern = re.compile('(<ent[^<>]*>[^<>]+</ent>)')
module-attribute
text_pattern = re.compile('<ent.*>(.+)</ent>')
module-attribute
modifiers_pattern = re.compile('<ent\\s?(.*)>.+</ent>')
module-attribute
Match
Bases: BaseModel
Source code in edsnlp/utils/examples.py
7 8 9 10 11 |
|
start_char: int = None
class-attribute
end_char: int = None
class-attribute
text: str = None
class-attribute
modifiers: str = None
class-attribute
Modifier
Bases: BaseModel
Source code in edsnlp/utils/examples.py
14 15 16 |
|
key: str = None
class-attribute
value: Union[int, float, bool, str] = None
class-attribute
Entity
Bases: BaseModel
Source code in edsnlp/utils/examples.py
19 20 21 22 |
|
start_char: int = None
class-attribute
end_char: int = None
class-attribute
modifiers: List[Modifier] = None
class-attribute
find_matches(example)
Finds entities within the example.
PARAMETER | DESCRIPTION |
---|---|
example |
Example to process.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[re.Match]
|
List of matches for entities. |
Source code in edsnlp/utils/examples.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
parse_match(match)
Parse a regex match representing an entity.
PARAMETER | DESCRIPTION |
---|---|
match |
Match for an entity.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Match
|
Usable representation for the entity match. |
Source code in edsnlp/utils/examples.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
parse_example(example)
Parses an example : finds examples and removes the tags.
PARAMETER | DESCRIPTION |
---|---|
example |
Example to process.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[str, List[Entity]]
|
Cleaned text and extracted entities. |
Source code in edsnlp/utils/examples.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
blocs
Utility that extracts code blocs and runs them.
Largely inspired by https://github.com/koaning/mktestdocs
BLOCK_PATTERN = re.compile('((?P<skip><!-- no-check -->)\\s+)?(?P<indent> *)```(?P<title>.*?)\\n(?P<code>.+?)```', flags=re.DOTALL)
module-attribute
OUTPUT_PATTERN = '# Out: '
module-attribute
check_outputs(code)
Looks for output patterns, and modifies the bloc:
- The preceding line becomes
v = expr
- The output line becomes an
assert
statement
PARAMETER | DESCRIPTION |
---|---|
code |
Code block
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Modified code bloc with assert statements |
Source code in edsnlp/utils/blocs.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
remove_indentation(code, indent)
Remove indentation from a code bloc.
PARAMETER | DESCRIPTION |
---|---|
code |
Code bloc
TYPE:
|
indent |
Level of indentation
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Modified code bloc |
Source code in edsnlp/utils/blocs.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
grab_code_blocks(docstring, lang='python')
Given a docstring, grab all the markdown codeblocks found in docstring.
PARAMETER | DESCRIPTION |
---|---|
docstring |
Full text.
TYPE:
|
lang |
Language to execute, by default "python"
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
Extracted code blocks |
Source code in edsnlp/utils/blocs.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
printer(code)
Prints a code bloc with lines for easier debugging.
PARAMETER | DESCRIPTION |
---|---|
code |
Code bloc.
TYPE:
|
Source code in edsnlp/utils/blocs.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
check_docstring(obj, lang='')
Given a function, test the contents of the docstring.
Source code in edsnlp/utils/blocs.py
148 149 150 151 152 153 154 155 156 157 158 |
|
check_raw_string(raw, lang='python')
Given a raw string, test the contents.
Source code in edsnlp/utils/blocs.py
161 162 163 164 165 166 167 168 169 170 |
|
check_raw_file_full(raw, lang='python')
Source code in edsnlp/utils/blocs.py
173 174 175 176 177 178 179 |
|
check_md_file(path, memory=False)
Given a markdown file, parse the contents for Python code blocs and check that each independant bloc does not cause an error.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to the markdown file to execute.
TYPE:
|
memory |
Whether to keep results from one bloc to the next, by default
TYPE:
|
Source code in edsnlp/utils/blocs.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
filter
default_sort_key(span)
Returns the sort key for filtering spans.
PARAMETER | DESCRIPTION |
---|---|
span |
Span to sort.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
key
|
Sort key.
TYPE:
|
Source code in edsnlp/utils/filter.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
start_sort_key(span)
Returns the sort key for filtering spans by start order.
PARAMETER | DESCRIPTION |
---|---|
span |
Span to sort.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
key
|
Sort key.
TYPE:
|
Source code in edsnlp/utils/filter.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
filter_spans(spans, label_to_remove=None, return_discarded=False, sort_key=default_sort_key)
Re-definition of spacy's filtering function, that returns discarded spans as well as filtered ones.
Can also accept a label_to_remove
argument, useful for filtering out
pseudo cues. If set, results
can contain overlapping spans: only
spans overlapping with excluded labels are removed. The main expected
use case is for pseudo-cues.
It can handle an iterable of tuples instead of an iterable of Span
s.
The primary use-case is the use with the RegexMatcher
's capacity to
return the span's groupdict
.
The spaCy documentation states:
Filter a sequence of spans and remove duplicates or overlaps. Useful for creating named entities (where one token can only be part of one entity) or when merging spans with
Retokenizer.merge
. When spans overlap, the (first) longest span is preferred over shorter spans.
Filtering out spans
If the label_to_remove
argument is supplied, it might be tempting to
filter overlapping spans that are not part of a label to remove.
The reason we keep all other possibly overlapping labels is that in qualifier pipelines, the same cue can precede and follow a marked entity. Hence we need to keep every example.
PARAMETER | DESCRIPTION |
---|---|
spans |
Spans to filter.
TYPE:
|
return_discarded |
Whether to return discarded spans.
TYPE:
|
label_to_remove |
Label to remove. If set, results can contain overlapping spans.
TYPE:
|
sort_key |
Key to sorting spans before applying overlap conflict resolution. A span with a higher key will have precedence over another span. By default, the largest, leftmost spans are selected first.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
results
|
Filtered spans
TYPE:
|
discarded
|
Discarded spans
TYPE:
|
Source code in edsnlp/utils/filter.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
consume_spans(spans, filter, second_chance=None)
Consume a list of span, according to a filter.
Warning
This method makes the hard hypothesis that:
- Spans are sorted.
- Spans are consumed in sequence and only once.
The second item is problematic for the way we treat long entities,
hence the second_chance
parameter, which lets entities be seen
more than once.
PARAMETER | DESCRIPTION |
---|---|
spans |
List of spans to filter
TYPE:
|
filter |
Filtering function. Should return True when the item is to be included.
TYPE:
|
second_chance |
Optional list of spans to include again (useful for long entities), by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
matches
|
List of spans consumed by the filter.
TYPE:
|
remainder
|
List of remaining spans in the original
TYPE:
|
Source code in edsnlp/utils/filter.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
get_spans(spans, label)
Extracts spans with a given label. Prefer using hash label for performance reasons.
PARAMETER | DESCRIPTION |
---|---|
spans |
List of spans to filter.
TYPE:
|
label |
Label to filter on.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
Filtered spans. |
Source code in edsnlp/utils/filter.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
|
regex
make_pattern(patterns, with_breaks=False, name=None)
Create OR pattern from a list of patterns.
PARAMETER | DESCRIPTION |
---|---|
patterns |
List of patterns to merge.
TYPE:
|
with_breaks |
Whether to add breaks (
TYPE:
|
name |
Name of the group, using regex
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Merged pattern. |
Source code in edsnlp/utils/regex.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
compile_regex(reg)
This function tries to compile reg
using the re
module, and
fallbacks to the regex
module that is more permissive.
PARAMETER | DESCRIPTION |
---|---|
reg |
|
RETURNS | DESCRIPTION |
---|---|
Union[re.Pattern, regex.Pattern]
|
Source code in edsnlp/utils/regex.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
inclusion
check_inclusion(span, start, end)
Checks whether the span overlaps the boundaries.
PARAMETER | DESCRIPTION |
---|---|
span |
Span to check.
TYPE:
|
start |
Start of the boundary
TYPE:
|
end |
End of the boundary
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
Whether the span overlaps the boundaries. |
Source code in edsnlp/utils/inclusion.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
resources
get_verbs(verbs=None, check_contains=True)
Extract verbs from the resources, as a pandas dataframe.
PARAMETER | DESCRIPTION |
---|---|
verbs |
List of verbs to keep. Returns all verbs by default.
TYPE:
|
check_contains |
Whether to check that no verb is missing if a list of verbs was provided. By default True
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
DataFrame containing conjugated verbs. |
Source code in edsnlp/utils/resources.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
deprecation
deprecated_extension(name, new_name)
Source code in edsnlp/utils/deprecation.py
9 10 11 12 13 14 15 16 |
|
deprecated_getter_factory(name, new_name)
Source code in edsnlp/utils/deprecation.py
19 20 21 22 23 24 25 26 27 28 29 |
|
deprecation(name, new_name=None)
Source code in edsnlp/utils/deprecation.py
32 33 34 35 36 37 38 39 40 41 42 |
|
deprecated_factory(name, new_name=None, default_config=None, func=None)
Execute the Language.factory method on a modified factory function. The modification adds a deprecation warning.
PARAMETER | DESCRIPTION |
---|---|
name |
The deprecated name for the pipeline
TYPE:
|
new_name |
The new name for the pipeline, which should be used, by default None
TYPE:
|
default_config |
The configuration that should be passed to Language.factory, by default None
TYPE:
|
func |
The function to decorate, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable
|
Source code in edsnlp/utils/deprecation.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|