edsnlp.utils
inclusion
check_inclusion(span, start, end)
Checks whether the span overlaps the boundaries.
PARAMETER | DESCRIPTION |
---|---|
span |
Span to check.
TYPE:
|
start |
Start of the boundary
TYPE:
|
end |
End of the boundary
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
Whether the span overlaps the boundaries. |
Source code in edsnlp/utils/inclusion.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
filter
default_sort_key(span)
Returns the sort key for filtering spans.
PARAMETER | DESCRIPTION |
---|---|
span |
Span to sort.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
key
|
Sort key.
TYPE:
|
Source code in edsnlp/utils/filter.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
filter_spans(spans, label_to_remove=None, return_discarded=False, sort_key=default_sort_key)
Re-definition of spacy's filtering function, that returns discarded spans as well as filtered ones.
Can also accept a label_to_remove
argument, useful for filtering out
pseudo cues. If set, results
can contain overlapping spans: only
spans overlapping with excluded labels are removed. The main expected
use case is for pseudo-cues.
The spaCy documentation states:
Filter a sequence of spans and remove duplicates or overlaps. Useful for creating named entities (where one token can only be part of one entity) or when merging spans with
Retokenizer.merge
. When spans overlap, the (first) longest span is preferred over shorter spans.
Filtering out spans
If the label_to_remove
argument is supplied, it might be tempting to
filter overlapping spans that are not part of a label to remove.
The reason we keep all other possibly overlapping labels is that in qualifier pipelines, the same cue can precede and follow a marked entity. Hence we need to keep every example.
PARAMETER | DESCRIPTION |
---|---|
spans |
Spans to filter.
TYPE:
|
return_discarded |
Whether to return discarded spans.
TYPE:
|
label_to_remove |
Label to remove. If set, results can contain overlapping spans.
TYPE:
|
sort_key |
Key to sorting spans before applying overlap conflict resolution. A span with a higher key will have precedence over another span. By default, the largest, leftmost spans are selected first.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
results
|
Filtered spans
TYPE:
|
discarded
|
Discarded spans
TYPE:
|
Source code in edsnlp/utils/filter.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
consume_spans(spans, filter, second_chance=None)
Consume a list of span, according to a filter.
Warning
This method makes the hard hypothesis that:
- Spans are sorted.
- Spans are consumed in sequence and only once.
The second item is problematic for the way we treat long entities,
hence the second_chance
parameter, which lets entities be seen
more than once.
PARAMETER | DESCRIPTION |
---|---|
spans |
List of spans to filter
TYPE:
|
filter |
Filtering function. Should return True when the item is to be included.
TYPE:
|
second_chance |
Optional list of spans to include again (useful for long entities), by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
matches
|
List of spans consumed by the filter.
TYPE:
|
remainder
|
List of remaining spans in the original
TYPE:
|
Source code in edsnlp/utils/filter.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
get_spans(spans, label)
Extracts spans with a given label. Prefer using hash label for performance reasons.
PARAMETER | DESCRIPTION |
---|---|
spans |
List of spans to filter.
TYPE:
|
label |
Label to filter on.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Span]
|
Filtered spans. |
Source code in edsnlp/utils/filter.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
resources
get_verbs(verbs=None, check_contains=True)
Extract verbs from the resources, as a pandas dataframe.
PARAMETER | DESCRIPTION |
---|---|
verbs |
List of verbs to keep. Returns all verbs by default.
TYPE:
|
check_contains |
Whether to check that no verb is missing if a list of verbs was provided. By default True
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
DataFrame containing conjugated verbs. |
Source code in edsnlp/utils/resources.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
examples
entity_pattern = re.compile('(<ent[^<>]*>[^<>]+</ent>)')
module-attribute
text_pattern = re.compile('<ent.*>(.+)</ent>')
module-attribute
modifiers_pattern = re.compile('<ent\\s?(.*)>.+</ent>')
module-attribute
Match
Bases: BaseModel
Source code in edsnlp/utils/examples.py
7 8 9 10 11 |
|
start_char: int = None
class-attribute
end_char: int = None
class-attribute
text: str = None
class-attribute
modifiers: str = None
class-attribute
Modifier
Bases: BaseModel
Source code in edsnlp/utils/examples.py
14 15 16 |
|
key: str = None
class-attribute
value: Union[int, float, bool, str] = None
class-attribute
Entity
Bases: BaseModel
Source code in edsnlp/utils/examples.py
19 20 21 22 |
|
start_char: int = None
class-attribute
end_char: int = None
class-attribute
modifiers: List[Modifier] = None
class-attribute
find_matches(example)
Finds entities within the example.
PARAMETER | DESCRIPTION |
---|---|
example |
Example to process.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[re.Match]
|
List of matches for entities. |
Source code in edsnlp/utils/examples.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
parse_match(match)
Parse a regex match representing an entity.
PARAMETER | DESCRIPTION |
---|---|
match |
Match for an entity.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Match
|
Usable representation for the entity match. |
Source code in edsnlp/utils/examples.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
parse_example(example)
Parses an example : finds examples and removes the tags.
PARAMETER | DESCRIPTION |
---|---|
example |
Example to process.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[str, List[Entity]]
|
Cleaned text and extracted entities. |
Source code in edsnlp/utils/examples.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
deprecation
deprecated_extension(name, new_name)
Source code in edsnlp/utils/deprecation.py
9 10 11 12 13 14 15 16 |
|
deprecated_getter_factory(name, new_name)
Source code in edsnlp/utils/deprecation.py
19 20 21 22 23 24 25 26 27 28 29 |
|
deprecation(name, new_name=None)
Source code in edsnlp/utils/deprecation.py
32 33 34 35 36 37 38 39 40 41 42 |
|
deprecated_factory(name, new_name=None, default_config=None, func=None)
Execute the Language.factory method on a modified factory function. The modification adds a deprecation warning.
PARAMETER | DESCRIPTION |
---|---|
name |
The deprecated name for the pipeline
TYPE:
|
new_name |
The new name for the pipeline, which should be used, by default None
TYPE:
|
default_config |
The configuration that should be passed to Language.factory, by default None
TYPE:
|
func |
The function to decorate, by default None
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Callable
|
Source code in edsnlp/utils/deprecation.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
regex
make_pattern(patterns, with_breaks=False, name=None)
Create OR pattern from a list of patterns.
PARAMETER | DESCRIPTION |
---|---|
patterns |
List of patterns to merge.
TYPE:
|
with_breaks |
Whether to add breaks (
TYPE:
|
name |
Name of the group, using regex
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
Merged pattern. |
Source code in edsnlp/utils/regex.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
compile_regex(reg)
This function tries to compile reg
using the re
module, and
fallbacks to the regex
module that is more permissive.
PARAMETER | DESCRIPTION |
---|---|
reg |
|
RETURNS | DESCRIPTION |
---|---|
Union[re.Pattern, regex.Pattern]
|
Source code in edsnlp/utils/regex.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|