edsnlp.language
EDSDefaults
Bases: FrenchDefaults
Defaults for the EDSLanguage class Mostly identical to the FrenchDefaults, but without tokenization info
Source code in edsnlp/language.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
EDSLanguage
Bases: French
French clinical language.
It is shipped with the EDSTokenizer
tokenizer that better handles
tokenization for French clinical documents
Source code in edsnlp/language.py
32 33 34 35 36 37 38 39 40 41 42 |
|
EDSTokenizer
Bases: DummyTokenizer
Source code in edsnlp/language.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
__init__(vocab)
Tokenizer class for French clinical documents.
It better handles tokenization around:
- numbers: "ACR5" -> ["ACR", "5"] instead of ["ACR5"]
- newlines: "
" -> [" ", " ", " "] instead of ["
"] and should be around 5-6 times faster than its standard French counterpart. Parameters ---------- vocab: Vocab The spacy vocabulary
Source code in edsnlp/language.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
__call__(text)
Tokenizes the text using the EDSTokenizer
PARAMETER | DESCRIPTION |
---|---|
text |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Doc
|
Source code in edsnlp/language.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
create_eds_tokenizer()
Creates a factory that returns new EDSTokenizer instances
RETURNS | DESCRIPTION |
---|---|
EDSTokenizer
|
Source code in edsnlp/language.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|