Changelog
v0.5.3 - 2022-08-31
Added
- Add label mapping parameter to aggregators (to merge different types of blocks such as
title and body)
- Improved line aggregation formula
v0.5.2 - 2022-08-30
Fixed
- Fix aggregation for empty documents
v0.5.1 - 2022-07-26
Changed
- Drop the
pdf2image dependency, replacing it with pypdfium2 (easier installation)
v0.5.0 - 2022-07-25
Changed
- Major refactoring of the library. Moved from concepts (
aggregation) to plural names (aggregators).
v0.4.3 - 2022-07-20
Fixed
- Multi page boxes alignment
v0.4.2 - 2022-07-06
Added
package-resource.v1 in the misc registry
v0.4.1 - 2022-06-14
Fixed
- Remove
importlib.metadata dependency, which led to issues with Python 3.7
v0.4.0 - 2022-06-14
Added
- Python 3.7 support, by relaxing dependency constraints
- Support for package-resource pipeline for
sklearn-pipeline.v1
v0.3.2 - 2022-06-03
Added
compare_results in visualisation
v0.3.1 - 2022-06-02
Fixed
- Rescale transform now keeps origin on top-left corner
v0.3.0 - 2022-06-01
Added
- Styles management within the extractor
styled.v1 aggregator, to handle styles
rescale.v1 transform, to go back to the original height and width
Changed
- Styles and text extraction is handled by the extractor directly
- The PDFMiner
line object is not carried around any more
Removed
- Outdated
params entry in the EDS-PDF registry.
v0.2.2 - 2022-05-12
Changed
- Fixed
merge_lines bug when lines were empty
- Modified the demo consequently
v0.2.1 - 2022-05-09
Changed
- The extractor always returns a pandas DataFrame, be it empty. It enhances robustness and stability.
v0.2.0 - 2022-05-09
Added
aggregation submodule to handle the specifics of aggregating text blocs
- Base classes for better-defined modules
- Uniformise the columns to
labels
- Add arbitrary contextual information
Removed
typer legacy dependency
models submodule, which handled the configurations for Spark distribution (deferred to another package)
- specific
orbis context, which was APHP-specific
v0.1.0 - 2022-05-06
Inception ! 
Features
- spaCy-like configuration system
- Available classifiers :
dummy.v1, that classifies everything to body
mask.v1, for simple rule-based classification
sklearn.v1, that uses a Scikit-Learn pipeline
random.v1, to better sow chaos
- Merge different blocs together for easier visualisation
- Streamlit demo with visualisation