edsnlp.pipelines.core.normalizer.pollution
patterns
information = "(?s)(=====+\\s*)?(L\\s*e\\s*s\\sdonnées\\s*administratives,\\s*sociales\\s*|I?nfo\\s*rmation\\s*aux?\\s*patients?|L[’']AP-HP\\s*collecte\\s*vos\\s*données\\s*administratives|L[’']Assistance\\s*Publique\\s*-\\s*Hôpitaux\\s*de\\s*Paris\\s*\\(?AP-HP\\)?\\s*a\\s*créé\\s*une\\s*base\\s*de\\s*données).{,2000}https?:\\/\\/recherche\\.aphp\\.fr\\/eds\\/droit-opposition[\\s\\.]*"
module-attribute
bars = '(?i)([nbw]|_|-|=){5,}'
module-attribute
pollution = dict(information=information, bars=bars)
module-attribute
pollution
Pollution
Bases: BaseComponent
Tags pollution tokens.
Populates a number of spaCy extensions :
Token._.pollution
: indicates whether the token is a pollutionDoc._.clean
: lists non-pollution tokensDoc._.clean_
: original text with pollutions removed.Doc._.char_clean_span
: method to create a Span using character indices extracted using the cleaned text.
PARAMETER | DESCRIPTION |
---|---|
nlp |
Language pipeline object
TYPE:
|
pollution |
Dictionary containing regular expressions of pollution.
TYPE:
|
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
nlp = nlp
instance-attribute
pollution = pollution
instance-attribute
regex_matcher = RegexMatcher()
instance-attribute
__init__(nlp, pollution)
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
build_patterns()
Builds the patterns for phrase matching.
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
54 55 56 57 58 59 60 61 |
|
process(doc)
Find pollutions in doc and clean candidate negations to remove pseudo negations
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pollution
|
list of pollution spans |
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
__call__(doc)
Tags pollutions.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy Doc object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
doc
|
spaCy Doc object, annotated for pollutions. |
Source code in edsnlp/pipelines/core/normalizer/pollution/pollution.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
factory
DEFAULT_CONFIG = dict(pollution=None)
module-attribute
create_component(nlp, name, pollution)
Source code in edsnlp/pipelines/core/normalizer/pollution/factory.py
14 15 16 17 18 19 20 21 22 23 24 |
|