edsnlp.matchers.regex
RegexMatcher [source]
Bases: object
Simple RegExp matcher.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
alignment_mode | How spans should be aligned with tokens. Possible values are TYPE: |
attr | Default attribute to match on, by default "TEXT". Can be overiden in the TYPE: |
flags | Additional flags provided to the TYPE: |
ignore_excluded | Whether to skip exclusions TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of TYPE: |
span_from_group | If set to TYPE: |
build_patterns [source]
Build patterns and adds them for matching. Helper function for pipelines using this matcher.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
regex | Dictionary of label/terms, or label/dictionary of terms/attribute. TYPE: |
add [source]
Add a pattern to the registry.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
key | Key of the new/updated pattern. TYPE: |
patterns | List of patterns to add. TYPE: |
attr | Attribute to use for matching. By default, uses the TYPE: |
ignore_excluded | Whether to skip excluded tokens during matching. TYPE: |
ignore_space_tokens | Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of TYPE: |
alignment_mode | Overwrite alignment mode. TYPE: |
remove [source]
Remove a pattern for the registry.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
key | key of the pattern to remove. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
ValueError | If the key is not present in the registered patterns. |
match [source]
Iterates on the matches.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
doclike | spaCy Doc or Span object to match on. TYPE: |
| YIELDS | DESCRIPTION |
|---|---|
span | A match. TYPE:: |
match_with_groupdict_as_spans [source]
Iterates on the matches.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
doclike | spaCy Doc or Span object to match on. TYPE: |
| YIELDS | DESCRIPTION |
|---|---|
span | A match. TYPE:: |
__call__ [source]
Performs matching. Yields matches.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
doclike | spaCy Doc or Span object. TYPE: |
as_spans | Returns matches as spans. DEFAULT: |
| YIELDS | DESCRIPTION |
|---|---|
span | A match. TYPE:: |
groupdict | Additional information coming from the named patterns in the regular expression. TYPE:: |
spans_generator [source]
Iterates over every group, and then yields the full match
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
match | A match object TYPE: |
| YIELDS | DESCRIPTION |
|---|---|
Tuple[int, int] | A tuple containing the start and end of the group or match |
span_from_match [source]
Return the span (as a (start, end) tuple) of the first matching group. If span_from_group=True, returns the full match instead.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
match | The Match object TYPE: |
span_from_group | Whether to work on groups or on the full match TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
Tuple[int, int] | A tuple containing the start and end of the group or match |
create_span [source]
spaCy only allows strict alignment mode for char_span on Spans. This method circumvents this.
Parameters
| PARAMETER | DESCRIPTION |
|---|---|
doclike |
TYPE: |
start_char | Character index within the Doc-like object. TYPE: |
end_char | Character index of the end, within the Doc-like object. TYPE: |
key | The key used to match. TYPE: |
alignment_mode | The alignment mode. TYPE: |
ignore_excluded | Whether to skip excluded tokens. TYPE: |
ignore_space_tokens | Whether to skip space tokens. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
span | A span matched on the Doc-like object. TYPE: |