Skip to content

edspdf.utils.alignment

align_box_labels(src_boxes, dst_boxes, threshold=0.0001, group_by_source=False, pollution_label=None)

Align lines with possibly overlapping (and non-exhaustive) labels.

Possible matches are sorted by covered area. Lines with no overlap at all

PARAMETER DESCRIPTION
src_boxes

The labelled boxes that will be used to determine the label of the dst_boxes

TYPE: Sequence[Box]

dst_boxes

The non-labelled boxes that will be assigned a label

TYPE: Sequence[Box]

group_by_source

Whether to perform majority voting between different sources of annotations if any

TYPE: bool DEFAULT: False

threshold

Threshold to use for discounting a label. Used if the labels DataFrame does not provide a threshold column, or to fill NaN values thereof.

TYPE: float, default 1 DEFAULT: 0.0001

RETURNS DESCRIPTION
List[Box]

A copy of the boxes, with the labels mapped from the source boxes

Source code in edspdf/utils/alignment.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def align_box_labels(
    src_boxes: Sequence[Box],
    dst_boxes: Sequence[Box],
    threshold: float = 0.0001,
    group_by_source: bool = False,
    pollution_label: Any = None,
) -> Sequence[Box]:
    """
    Align lines with possibly overlapping (and non-exhaustive) labels.

    Possible matches are sorted by covered area. Lines with no overlap at all

    Parameters
    ----------
    src_boxes: Sequence[Box]
        The labelled boxes that will be used to determine the label of the dst_boxes
    dst_boxes: Sequence[Box]
        The non-labelled boxes that will be assigned a label
    group_by_source: bool
        Whether to perform majority voting between different sources of
        annotations if any
    threshold : float, default 1
        Threshold to use for discounting a label. Used if the `labels` DataFrame
        does not provide a `threshold` column, or to fill `NaN` values thereof.

    Returns
    -------
    List[Box]
        A copy of the boxes, with the labels mapped from the source boxes
    """

    return [
        b
        for page in sorted(set((b.page for b in dst_boxes)))
        for b in _align_box_labels_on_page(
            src_boxes=[
                b for b in src_boxes if page is None or b.page is None or b.page == page
            ],
            dst_boxes=[
                b for b in dst_boxes if page is None or b.page is None or b.page == page
            ],
            threshold=threshold,
            group_by_source=group_by_source,
            pollution_label=pollution_label,
        )
    ]