edspdf.structures
PDFDoc
Bases: BaseModel
This is the main data structure of the library to hold PDFs. It contains the content of the PDF, as well as box annotations and text outputs.
| ATTRIBUTE | DESCRIPTION |
|---|---|
content |
The content of the PDF document.
TYPE:
|
id |
The ID of the PDF document.
TYPE:
|
pages |
The pages of the PDF document.
TYPE:
|
error |
Whether there was an error when processing this PDF document.
TYPE:
|
content_boxes |
The content boxes/annotations of the PDF document.
TYPE:
|
aggregated_texts |
The aggregated text outputs of the PDF document.
TYPE:
|
text_boxes |
The text boxes of the PDF document.
TYPE:
|
Page
Bases: BaseModel
The Page class represents a page of a PDF document.
| ATTRIBUTE | DESCRIPTION |
|---|---|
page_num |
The page number of the page.
TYPE:
|
width |
The width of the page.
TYPE:
|
height |
The height of the page.
TYPE:
|
doc |
The PDF document that this page belongs to.
TYPE:
|
image |
The rendered image of the page, stored as a NumPy array.
TYPE:
|
text_boxes |
The text boxes of the page.
TYPE:
|
TextProperties
Bases: BaseModel
The TextProperties class represents the style properties of a span of text in a
TextBox.
| ATTRIBUTE | DESCRIPTION |
|---|---|
italic |
Whether the text is italic.
TYPE:
|
bold |
Whether the text is bold.
TYPE:
|
begin |
The beginning index of the span of text.
TYPE:
|
end |
The ending index of the span of text.
TYPE:
|
fontname |
The font name of the span of text.
TYPE:
|
Box
Bases: BaseModel
The Box class represents a box annotation in a PDF document. It is the base class
of TextBox.
| ATTRIBUTE | DESCRIPTION |
|---|---|
doc |
The PDF document that this box belongs to.
TYPE:
|
page_num |
The page number of the box.
TYPE:
|
x0 |
The left x-coordinate of the box.
TYPE:
|
x1 |
The right x-coordinate of the box.
TYPE:
|
y0 |
The top y-coordinate of the box.
TYPE:
|
y1 |
The bottom y-coordinate of the box.
TYPE:
|
label |
The label of the box.
TYPE:
|
page |
The page object that this box belongs to.
TYPE:
|
Text
Bases: BaseModel
The TextBox class represents text object, not bound to any box.
It can be used to store aggregated text from multiple boxes for example.
| ATTRIBUTE | DESCRIPTION |
|---|---|
text |
The text content.
TYPE:
|
properties |
The style properties of the text.
TYPE:
|
TextBox
Bases: Box
The TextBox class represents a text box annotation in a PDF document.
| ATTRIBUTE | DESCRIPTION |
|---|---|
text |
The text content of the text box.
TYPE:
|
props |
The style properties of the text box.
TYPE:
|