edspdf.extractors.functional
get_blocs(layout)
Extract text blocs from a PDFMiner layout generator.
Arguments
layout: PDFMiner layout generator.
| YIELDS | DESCRIPTION |
|---|---|
bloc
|
Text bloc
TYPE:
|
Source code in edspdf/extractors/functional.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
get_lines(layout)
Extract lines from a PDFMiner layout object.
The line is reframed such that the origin is the top left corner.
| PARAMETER | DESCRIPTION |
|---|---|
layout |
PDFMiner layout object.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
Iterator[Line]
|
Single line object. |
Source code in edspdf/extractors/functional.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
remove_outside_lines(lines, strict_mode=False)
Filter out lines that are outside the canvas.
| PARAMETER | DESCRIPTION |
|---|---|
lines |
Dataframe of extracted lines
TYPE:
|
strict_mode |
Whether to remove the line if any part of it is outside the canvas, by default False
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
pd.DataFrame
|
Filtered lines. |
Source code in edspdf/extractors/functional.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |