Skip to content

edsnlp.matchers.regex

RegexMatcher [source]

Bases: object

Simple RegExp matcher.

Parameters

PARAMETER DESCRIPTION
alignment_mode

How spans should be aligned with tokens. Possible values are strict (character indices must be aligned with token boundaries), "contract" (span of all tokens completely within the character span), "expand" (span of all tokens at least partially covered by the character span). Defaults to expand.

TYPE: str DEFAULT: 'expand'

attr

Default attribute to match on, by default "TEXT". Can be overiden in the add method.

TYPE: str DEFAULT: 'TEXT'

flags

Additional flags provided to the re module. Can be overiden in the add method.

TYPE: Union[RegexFlag, int] DEFAULT: 0

ignore_excluded

Whether to skip exclusions

TYPE: bool DEFAULT: False

ignore_space_tokens

Whether to skip space tokens during matching.

You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of eds.normalizer is enabled (by default).

TYPE: bool DEFAULT: False

span_from_group

If set to False, will create spans basede on the regex's full match. If set to True, will use the first matching capturing group as a span (and fall back to using the full match if no capturing group is matching)

TYPE: bool DEFAULT: False

build_patterns [source]

Build patterns and adds them for matching. Helper function for pipelines using this matcher.

Parameters

PARAMETER DESCRIPTION
regex

Dictionary of label/terms, or label/dictionary of terms/attribute.

TYPE: Patterns

add [source]

Add a pattern to the registry.

Parameters

PARAMETER DESCRIPTION
key

Key of the new/updated pattern.

TYPE: str

patterns

List of patterns to add.

TYPE: List[str]

attr

Attribute to use for matching. By default, uses the default_attr attribute

TYPE: Optional[str] DEFAULT: None

ignore_excluded

Whether to skip excluded tokens during matching.

TYPE: Optional[bool] DEFAULT: None

ignore_space_tokens

Whether to skip space tokens during matching.

You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of eds.normalizer is enabled (by default).

TYPE: Optional[bool] DEFAULT: None

alignment_mode : Optional[str] Overwrite alignment mode.

remove [source]

Remove a pattern for the registry.

Parameters

PARAMETER DESCRIPTION
key

key of the pattern to remove.

TYPE: str

RAISES DESCRIPTION
ValueError

If the key is not present in the registered patterns.

match [source]

Iterates on the matches.

Parameters

PARAMETER DESCRIPTION
doclike

spaCy Doc or Span object to match on.

TYPE: Union[Doc, Span]

YIELDS DESCRIPTION
span

A match.

TYPE:: Tuple[Span, Match]

match_with_groupdict_as_spans [source]

Iterates on the matches.

Parameters

PARAMETER DESCRIPTION
doclike

spaCy Doc or Span object to match on.

TYPE: Union[Doc, Span]

YIELDS DESCRIPTION
span

A match.

TYPE:: Tuple[Span, Dict[str, Span]]

__call__ [source]

Performs matching. Yields matches.

Parameters

PARAMETER DESCRIPTION
doclike

spaCy Doc or Span object.

TYPE: Union[Doc, Span]

as_spans

Returns matches as spans.

DEFAULT: False

YIELDS DESCRIPTION
span

A match.

TYPE:: Union[Span, Tuple[Span, Dict[str, Any]]]

groupdict

Additional information coming from the named patterns in the regular expression.

TYPE:: Union[Span, Tuple[Span, Dict[str, Any]]]

spans_generator [source]

Iterates over every group, and then yields the full match

Parameters

PARAMETER DESCRIPTION
match

A match object

TYPE: Match

YIELDS DESCRIPTION
Tuple[int, int]

A tuple containing the start and end of the group or match

span_from_match [source]

Return the span (as a (start, end) tuple) of the first matching group. If span_from_group=True, returns the full match instead.

Parameters

PARAMETER DESCRIPTION
match

The Match object

TYPE: Match

span_from_group

Whether to work on groups or on the full match

TYPE: bool

RETURNS DESCRIPTION
Tuple[int, int]

A tuple containing the start and end of the group or match

create_span [source]

spaCy only allows strict alignment mode for char_span on Spans. This method circumvents this.

Parameters

PARAMETER DESCRIPTION
doclike

Doc or Span.

TYPE: Union[Doc, Span]

start_char

Character index within the Doc-like object.

TYPE: int

end_char

Character index of the end, within the Doc-like object.

TYPE: int

key

The key used to match.

TYPE: str

alignment_mode

The alignment mode.

TYPE: str

ignore_excluded

Whether to skip excluded tokens.

TYPE: bool

ignore_space_tokens

Whether to skip space tokens.

TYPE: bool

RETURNS DESCRIPTION
span

A span matched on the Doc-like object.

TYPE: Optional[Span]