BoxTransformerLayer
BoxTransformerLayer combining a self attention layer and a linear->activation->linear transformation. This layer is used in the BoxTransformerModule module.
Parameters
PARAMETER | DESCRIPTION |
---|---|
input_size |
Input embedding size
TYPE:
|
num_heads |
Number of attention heads in the attention layer
TYPE:
|
dropout_p |
Dropout probability both for the attention layer and embedding projections
TYPE:
|
head_size |
Head sizes of the attention layer
TYPE:
|
activation |
Activation function used in the linear->activation->linear transformation
TYPE:
|
init_resweight |
Initial weight of the residual gates. At 0, the layer acts (initially) as an identity function, and at 1 as a standard Transformer layer. Initializing with a value close to 0 can help the training converge.
TYPE:
|
attention_mode |
Mode of relative position infused attention layer. See the relative attention documentation for more information.
TYPE:
|
position_embedding |
Position embedding to use as key/query position embedding in the attention computation.
TYPE:
|
forward
Forward pass of the BoxTransformerLayer
PARAMETER | DESCRIPTION |
---|---|
embeds |
Embeddings to contextualize
Shape:
TYPE:
|
mask |
Mask of the embeddings. 0 means padding element.
Shape:
TYPE:
|
relative_positions |
Position of the keys relatively to the query elements
Shape:
TYPE:
|
no_position_mask |
Key / query pairs for which the position attention terms should
be disabled.
Shape:
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[FloatTensor, FloatTensor]
|
|