BoxTransformerModule

Box Transformer architecture combining a multiple BoxTransformerLayer modules. It is mainly used in BoxTransformer.

Parameters

PARAMETER	DESCRIPTION
`input_size`	Input embedding size TYPE: `Optional[int]` DEFAULT: `None`
`num_heads`	Number of attention heads in the attention layers TYPE: `int` DEFAULT: `2`
`n_relative_positions`	Maximum range of embeddable relative positions between boxes (further distances are capped to ±n_relative_positions // 2) TYPE: `Optional[int]` DEFAULT: `None`
`dropout_p`	Dropout probability both for the attention layers and embedding projections TYPE: `float` DEFAULT: `0.0`
`head_size`	Head sizes of the attention layers TYPE: `Optional[int]` DEFAULT: `None`
`activation`	Activation function used in the linear->activation->linear transformations TYPE: `ActivationFunction` DEFAULT: `'gelu'`
`init_resweight`	Initial weight of the residual gates. At 0, the layer acts (initially) as an identity function, and at 1 as a standard Transformer layer. Initializing with a value close to 0 can help the training converge. TYPE: `float` DEFAULT: `0.0`
`attention_mode`	Mode of relative position infused attention layer. See the relative attention documentation for more information. TYPE: `Sequence[Literal['c2c', 'c2p', 'p2c']]` DEFAULT: `('c2c', 'c2p', 'p2c')`
`n_layers`	Number of layers in the Transformer TYPE: `int` DEFAULT: `2`

Forward pass of the BoxTransformer

PARAMETER DESCRIPTION

embeds

Embeddings to contextualize Shape: n_samples * n_keys * input_size

TYPE: FoldedTensor

boxes

Layout features of the input elements

TYPE: Dict

RETURNS	DESCRIPTION
`Tuple[FloatTensor, List[FloatTensor]]`	Output of the last BoxTransformerLayer Shape: `n_samples * n_queries * n_keys` Attention logits of all layers Shape: `n_samples * n_queries * n_keys * n_heads`