Skip to content

Changelog

v0.5.2 - 2022-08-30

Changed

  • Fix aggregation for empty documents

v0.5.1 - 2022-07-26

Changed

  • Drop the pdf2image dependency, replacing it with pypdfium2 (easier installation)

v0.5.0 - 2022-07-25

Changed

  • Major refactoring of the library. Moved from concepts (aggregation) to plural names (aggregators).

v0.4.3 - 2022-07-20

Fixed

  • Multi page boxes alignment

v0.4.2 - 2022-07-06

Added

  • package-resource.v1 in the misc registry

v0.4.1 - 2022-06-14

Fixed

  • Remove importlib.metadata dependency, which led to issues with Python 3.7

v0.4.0 - 2022-06-14

Added

  • Python 3.7 support, by relaxing dependency constraints
  • Support for package-resource pipeline for sklearn-pipeline.v1

v0.3.2 - 2022-06-03

Added

  • compare_results in visualisation

v0.3.1 - 2022-06-02

Fixed

  • Rescale transform now keeps origin on top-left corner

v0.3.0 - 2022-06-01

Added

  • Styles management within the extractor
  • styled.v1 aggregator, to handle styles
  • rescale.v1 transform, to go back to the original height and width

Changed

  • Styles and text extraction is handled by the extractor directly
  • The PDFMiner line object is not carried around any more

Removed

  • Outdated params entry in the EDS-PDF registry.

v0.2.2 - 2022-05-12

Changed

  • Fixed merge_lines bug when lines were empty
  • Modified the demo consequently

v0.2.1 - 2022-05-09

Changed

  • The extractor always returns a pandas DataFrame, be it empty. It enhances robustness and stability.

v0.2.0 - 2022-05-09

Added

  • aggregation submodule to handle the specifics of aggregating text blocs
  • Base classes for better-defined modules
  • Uniformise the columns to labels
  • Add arbitrary contextual information

Removed

  • typer legacy dependency
  • models submodule, which handled the configurations for Spark distribution (deferred to another package)
  • specific orbis context, which was APHP-specific

v0.1.0 - 2022-05-06

Inception ! 🎉

Features

  • spaCy-like configuration system
  • Available classifiers :
  • dummy.v1, that classifies everything to body
  • mask.v1, for simple rule-based classification
  • sklearn.v1, that uses a Scikit-Learn pipeline
  • random.v1, to better sow chaos
  • Merge different blocs together for easier visualisation
  • Streamlit demo with visualisation