edsnlp.matchers.regex
RegexMatcher
[source]
Bases: object
Simple RegExp matcher.
Parameters
PARAMETER | DESCRIPTION |
---|---|
alignment_mode | How spans should be aligned with tokens. Possible values are TYPE: |
attr | Default attribute to match on, by default "TEXT". Can be overiden in the TYPE: |
flags | Additional flags provided to the TYPE: |
ignore_excluded | Whether to skip exclusions TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of TYPE: |
span_from_group | If set to TYPE: |
build_patterns
[source]
Build patterns and adds them for matching. Helper function for pipelines using this matcher.
Parameters
PARAMETER | DESCRIPTION |
---|---|
regex | Dictionary of label/terms, or label/dictionary of terms/attribute. TYPE: |
add
[source]
Add a pattern to the registry.
Parameters
PARAMETER | DESCRIPTION |
---|---|
key | Key of the new/updated pattern. TYPE: |
patterns | List of patterns to add. TYPE: |
attr | Attribute to use for matching. By default, uses the TYPE: |
ignore_excluded | Whether to skip excluded tokens during matching. TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of TYPE: |
alignment_mode : Optional[str] Overwrite alignment mode.
remove
[source]
Remove a pattern for the registry.
Parameters
PARAMETER | DESCRIPTION |
---|---|
key | key of the pattern to remove. TYPE: |
RAISES | DESCRIPTION |
---|---|
ValueError | If the key is not present in the registered patterns. |
match
[source]
Iterates on the matches.
Parameters
PARAMETER | DESCRIPTION |
---|---|
doclike | spaCy Doc or Span object to match on. TYPE: |
YIELDS | DESCRIPTION |
---|---|
span | A match. TYPE:: |
match_with_groupdict_as_spans
[source]
Iterates on the matches.
Parameters
PARAMETER | DESCRIPTION |
---|---|
doclike | spaCy Doc or Span object to match on. TYPE: |
YIELDS | DESCRIPTION |
---|---|
span | A match. TYPE:: |
__call__
[source]
Performs matching. Yields matches.
Parameters
PARAMETER | DESCRIPTION |
---|---|
doclike | spaCy Doc or Span object. TYPE: |
as_spans | Returns matches as spans. DEFAULT: |
YIELDS | DESCRIPTION |
---|---|
span | A match. TYPE:: |
groupdict | Additional information coming from the named patterns in the regular expression. TYPE:: |
spans_generator
[source]
Iterates over every group, and then yields the full match
Parameters
PARAMETER | DESCRIPTION |
---|---|
match | A match object TYPE: |
YIELDS | DESCRIPTION |
---|---|
Tuple[int, int] | A tuple containing the start and end of the group or match |
span_from_match
[source]
Return the span (as a (start, end) tuple) of the first matching group. If span_from_group=True
, returns the full match instead.
Parameters
PARAMETER | DESCRIPTION |
---|---|
match | The Match object TYPE: |
span_from_group | Whether to work on groups or on the full match TYPE: |
RETURNS | DESCRIPTION |
---|---|
Tuple[int, int] | A tuple containing the start and end of the group or match |
create_span
[source]
spaCy only allows strict alignment mode for char_span on Spans. This method circumvents this.
Parameters
PARAMETER | DESCRIPTION |
---|---|
doclike |
TYPE: |
start_char | Character index within the Doc-like object. TYPE: |
end_char | Character index of the end, within the Doc-like object. TYPE: |
key | The key used to match. TYPE: |
alignment_mode | The alignment mode. TYPE: |
ignore_excluded | Whether to skip excluded tokens. TYPE: |
ignore_space_tokens | Whether to skip space tokens. TYPE: |
RETURNS | DESCRIPTION |
---|---|
span | A span matched on the Doc-like object. TYPE: |