`edsnlp.matchers.regex`

`RegexMatcher` [source]

Bases: object

Simple RegExp matcher.

Parameters

PARAMETER	DESCRIPTION
`alignment_mode`	How spans should be aligned with tokens. Possible values are `strict` (character indices must be aligned with token boundaries), "contract" (span of all tokens completely within the character span), "expand" (span of all tokens at least partially covered by the character span). Defaults to `expand`. TYPE: `str` DEFAULT: `'expand'`
`attr`	Default attribute to match on, by default "TEXT". Can be overiden in the `add` method. TYPE: `str` DEFAULT: `'TEXT'`
`flags`	Additional flags provided to the `re` module. Can be overiden in the `add` method. TYPE: `Union[RegexFlag, int]` DEFAULT: `0`
`ignore_excluded`	Whether to skip exclusions TYPE: `bool` DEFAULT: `False`
`ignore_space_tokens`	Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of `eds.normalizer` is enabled (by default). TYPE: `bool` DEFAULT: `False`
`span_from_group`	If set to `False`, will create spans basede on the regex's full match. If set to `True`, will use the first matching capturing group as a span (and fall back to using the full match if no capturing group is matching) TYPE: `bool` DEFAULT: `False`

`build_patterns` [source]

Build patterns and adds them for matching. Helper function for pipelines using this matcher.

Parameters

PARAMETER DESCRIPTION

regex

Dictionary of label/terms, or label/dictionary of terms/attribute.

TYPE: Patterns

`add` [source]

Add a pattern to the registry.

Parameters

PARAMETER	DESCRIPTION
`key`	Key of the new/updated pattern. TYPE: `str`
`patterns`	List of patterns to add. TYPE: `List[str]`
`attr`	Attribute to use for matching. By default, uses the `default_attr` attribute TYPE: `Optional[str]` DEFAULT: `None`
`ignore_excluded`	Whether to skip excluded tokens during matching. TYPE: `Optional[bool]` DEFAULT: `None`
`ignore_space_tokens`	Whether to skip space tokens during matching. You won't be able to match on newlines if this is enabled and the "spaces"/"newline" option of `eds.normalizer` is enabled (by default). TYPE: `Optional[bool]` DEFAULT: `None`
`alignment_mode`	Overwrite alignment mode. TYPE: `Optional[str]` DEFAULT: `None`

`remove` [source]

Remove a pattern for the registry.

Parameters

PARAMETER DESCRIPTION

key

key of the pattern to remove.

TYPE: str

RAISES	DESCRIPTION
`ValueError`	If the key is not present in the registered patterns.

`match` [source]

Iterates on the matches.

Parameters

PARAMETER DESCRIPTION

doclike

spaCy Doc or Span object to match on.

TYPE: Union[Doc, Span]

YIELDS	DESCRIPTION
`span`	A match. TYPE:: `Tuple[Span, Match]`

`match_with_groupdict_as_spans` [source]

Iterates on the matches.

Parameters

PARAMETER DESCRIPTION

doclike

spaCy Doc or Span object to match on.

TYPE: Union[Doc, Span]

YIELDS	DESCRIPTION
`span`	A match. TYPE:: `Tuple[Span, Dict[str, Span]]`

`call` [source]

Performs matching. Yields matches.

Parameters

PARAMETER DESCRIPTION

doclike

spaCy Doc or Span object.

TYPE: Union[Doc, Span]

as_spans

Returns matches as spans.

DEFAULT: False

YIELDS	DESCRIPTION
`span`	A match. TYPE:: `Union[Span, Tuple[Span, Dict[str, Any]]]`
`groupdict`	Additional information coming from the named patterns in the regular expression. TYPE:: `Union[Span, Tuple[Span, Dict[str, Any]]]`

`spans_generator` [source]

Iterates over every group, and then yields the full match

Parameters

PARAMETER DESCRIPTION

match

A match object

TYPE: Match

YIELDS	DESCRIPTION
`Tuple[int, int]`	A tuple containing the start and end of the group or match

`span_from_match` [source]

Return the span (as a (start, end) tuple) of the first matching group. If span_from_group=True, returns the full match instead.

Parameters

PARAMETER DESCRIPTION

match

The Match object

TYPE: Match

span_from_group

Whether to work on groups or on the full match

TYPE: bool

RETURNS	DESCRIPTION
`Tuple[int, int]`	A tuple containing the start and end of the group or match

`create_span` [source]

spaCy only allows strict alignment mode for char_span on Spans. This method circumvents this.

Parameters

PARAMETER	DESCRIPTION
`doclike`	`Doc` or `Span`. TYPE: `Union[Doc, Span]`
`start_char`	Character index within the Doc-like object. TYPE: `int`
`end_char`	Character index of the end, within the Doc-like object. TYPE: `int`
`key`	The key used to match. TYPE: `str`
`alignment_mode`	The alignment mode. TYPE: `str`
`ignore_excluded`	Whether to skip excluded tokens. TYPE: `bool`
`ignore_space_tokens`	Whether to skip space tokens. TYPE: `bool`

RETURNS	DESCRIPTION
`span`	A span matched on the Doc-like object. TYPE: `Optional[Span]`

edsnlp.matchers.regex

RegexMatcher [source]

Parameters

build_patterns [source]

Parameters

add [source]

Parameters

remove [source]

Parameters

match [source]

Parameters

match_with_groupdict_as_spans [source]

Parameters

__call__ [source]

Parameters

spans_generator [source]

Parameters

span_from_match [source]

Parameters

create_span [source]

Parameters

`edsnlp.matchers.regex`

`RegexMatcher` [source]

`build_patterns` [source]

`add` [source]

`remove` [source]

`match` [source]

`match_with_groupdict_as_spans` [source]

`call` [source]

`spans_generator` [source]

`span_from_match` [source]

`create_span` [source]