edsnlp.pipes.core.normalizer.pollution.factory
create_component = registry.factory.register('eds.pollution', assigns=['doc.spans'], deprecated=['pollution'])(PollutionTagger) module-attribute
Tags pollution tokens.
Populates a number of spaCy extensions :
Token._.pollution: indicates whether the token is a pollutionDoc._.clean: lists non-pollution tokensDoc._.clean_: original text with pollutions removed.Doc._.char_clean_span: method to create a Span using character indices extracted using the cleaned text.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
nlp | The pipeline object TYPE: |
name | The component name. TYPE: |
pollution | Dictionary containing regular expressions of pollution. TYPE: |