Skip to content

BoxLayoutEmbedding

This component encodes the geometrical features of a box, as extracted by the BoxLayoutPreprocessor module, into an embedding. For position modes, use:

  • "sin" to embed positions with a fixed SinusoidalEmbedding
  • "learned" to embed positions using a learned standard pytorch embedding layer

Each produces embedding is the concatenation of the box width, height and the top, left, bottom and right coordinates, each embedded depending on the *_mode param.

Parameters

PARAMETER DESCRIPTION
size

Size of the output box embedding

TYPE: int

n_positions

Number of position embeddings stored in the PositionEmbedding module

TYPE: int

x_mode

Position embedding mode of the x coordinates

TYPE: Literal['sin', 'learned'] DEFAULT: 'sin'

y_mode

Position embedding mode of the x coordinates

TYPE: Literal['sin', 'learned'] DEFAULT: 'sin'

w_mode

Position embedding mode of the width features

TYPE: Literal['sin', 'learned'] DEFAULT: 'sin'

h_mode

Position embedding mode of the height features

TYPE: Literal['sin', 'learned'] DEFAULT: 'sin'