Skip to content

Alternatives & Comparison

EDS-PDF was developed to propose a more modular and extendable approach to PDF extraction than PDFBox, the legacy implementation at APHP's clinical data warehouse.

EDS-PDF takes inspiration from Explosion's spaCy pipelining system and closely follows its API. Therefore, the core object within EDS-PDF is the Pipeline, which organises the processing of PDF documents into multiple components. However, unlike spaCy, the library is built around a single deep learning framework, pytorch, which makes model development easier.