Skip to content

BoxTransformerModule

Box Transformer architecture combining a multiple BoxTransformerLayer modules. It is mainly used in BoxTransformer.

Parameters

PARAMETER DESCRIPTION
input_size

Input embedding size

TYPE: Optional[int] DEFAULT: None

num_heads

Number of attention heads in the attention layers

TYPE: int DEFAULT: 2

n_relative_positions

Maximum range of embeddable relative positions between boxes (further distances are capped to ±n_relative_positions // 2)

TYPE: Optional[int] DEFAULT: None

dropout_p

Dropout probability both for the attention layers and embedding projections

TYPE: float DEFAULT: 0.0

head_size

Head sizes of the attention layers

TYPE: Optional[int] DEFAULT: None

activation

Activation function used in the linear->activation->linear transformations

TYPE: ActivationFunction DEFAULT: 'gelu'

init_resweight

Initial weight of the residual gates. At 0, the layer acts (initially) as an identity function, and at 1 as a standard Transformer layer. Initializing with a value close to 0 can help the training converge.

TYPE: float DEFAULT: 0.0

attention_mode

Mode of relative position infused attention layer. See the relative attention documentation for more information.

TYPE: Sequence[Literal['c2c', 'c2p', 'p2c']] DEFAULT: ('c2c', 'c2p', 'p2c')

n_layers

Number of layers in the Transformer

TYPE: int DEFAULT: 2

forward

Forward pass of the BoxTransformer

PARAMETER DESCRIPTION
embeds

Embeddings to contextualize Shape: n_samples * n_keys * input_size

TYPE: FoldedTensor

boxes

Layout features of the input elements

TYPE: Dict

RETURNS DESCRIPTION
Tuple[FloatTensor, List[FloatTensor]]
  • Output of the last BoxTransformerLayer Shape: n_samples * n_queries * n_keys
  • Attention logits of all layers Shape: n_samples * n_queries * n_keys * n_heads