edsnlp.pipes.core.normalizer.pollution.factory
create_component = registry.factory.register('eds.pollution', assigns=['doc.spans'], deprecated=['pollution'])(PollutionTagger)
module-attribute
Tags pollution tokens.
Populates a number of spaCy extensions :
Token._.pollution
: indicates whether the token is a pollutionDoc._.clean
: lists non-pollution tokensDoc._.clean_
: original text with pollutions removed.Doc._.char_clean_span
: method to create a Span using character indices extracted using the cleaned text.
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | The pipeline object TYPE: |
name | The component name. TYPE: |
pollution | Dictionary containing regular expressions of pollution. TYPE: |