RelativeAttention
A self/cross-attention layer that takes relative position of elements into account to compute the attention weights. When running a relative attention layer, key and queries are represented using content and position embeddings, where position embeddings are retrieved using the relative position of keys relative to queries
Parameters
PARAMETER | DESCRIPTION |
---|---|
size |
The size of the output embeddings Also serves as default if query_size, pos_size, or key_size is None
TYPE:
|
n_heads |
The number of attention heads
TYPE:
|
query_size |
The size of the query embeddings.
TYPE:
|
key_size |
The size of the key embeddings.
TYPE:
|
value_size |
The size of the value embeddings
TYPE:
|
head_size |
The size of each query / key / value chunk used in the attention dot product
Default:
TYPE:
|
position_embedding |
The position embedding used as key and query embeddings
TYPE:
|
dropout_p |
Dropout probability applied on the attention weights Default: 0.1
TYPE:
|
same_key_query_proj |
Whether to use the same projection operator for content key and queries when computing the pre-attention key and query embedding chunks Default: False
TYPE:
|
same_positional_key_query_proj |
Whether to use the same projection operator for content key and queries when computing the pre-attention key and query embedding chunks Default: False
TYPE:
|
n_coordinates |
The number of positional coordinates For instance, text is 1D so 1 coordinate, images are 2D so 2 coordinates ... Default: 1
TYPE:
|
head_bias |
Whether to learn a bias term to add to the attention logits This is only useful if you plan to use the attention logits for subsequent operations, since attention weights are unaffected by bias terms.
TYPE:
|
do_pooling |
Whether to compute the output embedding. If you only plan to use attention logits, you should disable this parameter. Default: True
TYPE:
|
mode |
Whether to compute content to content (c2c), content to position (c2p)
or position to content (p2c) attention terms.
Setting
TYPE:
|
n_additional_heads |
The number of additional head logits to compute. Those are not used to compute output embeddings, but may be useful in subsequent operation. Default: 0
TYPE:
|
forward
Forward pass of the RelativeAttention layer.
PARAMETER | DESCRIPTION |
---|---|
content_queries |
The content query embedding to use in the attention computation
Shape:
TYPE:
|
content_keys |
The content key embedding to use in the attention computation.
If None, defaults to the
TYPE:
|
content_values |
The content values embedding to use in the final pooling computation.
If None, pooling won't be performed.
Shape:
TYPE:
|
mask |
The content key embedding to use in the attention computation.
If None, defaults to the
TYPE:
|
relative_positions |
The relative position of keys relative to queries
If None, positional attention terms won't be computed.
Shape:
TYPE:
|
no_position_mask |
Key / query pairs for which the position attention terms should
be disabled.
Shape:
TYPE:
|
base_attn |
Attention logits to add to the computed attention logits
Shape:
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[Tuple[FloatTensor, FloatTensor], FloatTensor]
|
|