edsnlp.matchers.utils.offset
alignment(doc, attr='TEXT', ignore_excluded=True)
Align different representations of a Doc
or Span
object.
PARAMETER | DESCRIPTION |
---|---|
doc |
spaCy
TYPE:
|
attr |
Attribute to use, by default
TYPE:
|
ignore_excluded |
Whether to remove excluded tokens, by default True
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[List[int], List[int]]
|
An alignment tuple: original and clean lists. |
Source code in edsnlp/matchers/utils/offset.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
offset(doc, attr, ignore_excluded, index)
Compute offset between the original text and a given representation
(defined by the couple attr
, ignore_excluded
).
The alignment itself is computed with
alignment
.
PARAMETER | DESCRIPTION |
---|---|
doc |
The spaCy
TYPE:
|
attr |
The attribute used by the
TYPE:
|
ignore_excluded |
Whether the RegexMatcher ignores excluded tokens.
TYPE:
|
index |
The index in the pre-processed text.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
The offset. To get the character index in the original document,
just do: |
Source code in edsnlp/matchers/utils/offset.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|