edspdf.classifiers
mask
MaskClassifier
Bases: BaseClassifier
Mask classifier, that reproduces the PdfBox behaviour.
Source code in edspdf/classifiers/mask.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |
align
align_labels(lines, labels, threshold=0.0001)
Align lines with possibly overlapping (and non-exhaustive) labels.
Possible matches are sorted by covered area. Lines with no overlap at all
| PARAMETER | DESCRIPTION |
|---|---|
lines |
DataFrame containing the lines
TYPE:
|
labels |
DataFrame containing the labels
TYPE:
|
threshold |
Threshold to use for discounting a label. Used if the
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
pd.DataFrame
|
A copy of the lines table, with the labels added. |
Source code in edspdf/classifiers/align.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
random
RandomClassifier
Bases: BaseClassifier
Random classifier, for chaos purposes. Classifies each line to a random element.
Source code in edspdf/classifiers/random.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
dummy
DummyClassifier
Bases: BaseClassifier
"Dummy" classifier, for testing purposes. Classifies every line to body.
Source code in edspdf/classifiers/dummy.py
10 11 12 13 14 15 16 17 | |
base
BaseClassifier
Bases: ABC
Source code in edspdf/classifiers/base.py
7 8 9 10 11 12 13 14 15 | |
predict(lines)
abstractmethod
Handles the classification.
Source code in edspdf/classifiers/base.py
8 9 10 11 12 | |