`edsnlp.core.torch_component`

`TorchComponent`

Bases: BaseComponent, Module, Generic[BatchOutput, BatchInput]

A TorchComponent is a Component that can be trained and inherits torch.nn.Module. You can use it either as a torch module inside a more complex neural network, or as a standalone component in a Pipeline.

In addition to the methods of a torch module, a TorchComponent adds a few methods to handle preprocessing and collating features, as well as caching intermediate results for components that share a common subcomponent.

`post_init`

This method completes the attributes of the component, by looking at some documents. It is especially useful to build vocabularies or detect the labels of a classification task.

Parameters

PARAMETER DESCRIPTION

gold_data

The documents to use for initialization.

TYPE: Iterable[Doc]

exclude

The names of components to exclude from initialization. This argument will be gradually updated with the names of initialized components

TYPE: Set[str]

`preprocess`

Preprocess the document to extract features that will be used by the neural network to perform its predictions.

Parameters

PARAMETER DESCRIPTION

doc

Document to preprocess

TYPE: Doc

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary (optionally nested) containing the features extracted from the document.

`collate`

Collate the batch of features into a single batch of tensors that can be used by the forward method of the component.

Parameters

PARAMETER DESCRIPTION

batch

Batch of features

TYPE: Dict[str, Any]

RETURNS	DESCRIPTION
`BatchInput`	Dictionary (optionally nested) containing the collated tensors

`batch_to_device`

Move the batch of tensors to the specified device.

Parameters

PARAMETER DESCRIPTION

batch

Batch of tensors

TYPE: BatchInput

device

Device to move the tensors to

TYPE: Optional[Union[str, device]]

RETURNS	DESCRIPTION
`BatchInput`

`forward`

Perform the forward pass of the neural network.

Parameters

PARAMETER DESCRIPTION

batch

Batch of tensors (nested dictionary) computed by the collate method

TYPE: BatchInput

RETURNS	DESCRIPTION
`BatchOutput`

`module_forward`

This is a wrapper around torch.nn.Module.__call__ to avoid conflict with the [Component.__call__][edspdf.component.Component.__call__] method.

`make_batch`

Convenience method to preprocess a batch of documents and collate them Features corresponding to the same path are grouped together in a list, under the same key.

Parameters

PARAMETER DESCRIPTION

docs

Batch of documents

TYPE: Sequence[Doc]

supervision

Whether to extract supervision features or not

TYPE: bool DEFAULT: False

RETURNS	DESCRIPTION
`Dict[str, Sequence[Any]]`

`batch_process`

Process a batch of documents using the neural network. This differs from the pipe method in that it does not return an iterator, but executes the component on the whole batch at once.

Parameters

PARAMETER DESCRIPTION

docs

Batch of documents

TYPE: Sequence[Doc]

RETURNS	DESCRIPTION
`Sequence[Doc]`	Batch of updated documents

`postprocess`

Update the documents with the predictions of the neural network. By default, this is a no-op.

Parameters

PARAMETER DESCRIPTION

docs

Batch of documents

TYPE: Sequence[Doc]

batch

Batch of predictions, as returned by the forward method

TYPE: BatchOutput

RETURNS	DESCRIPTION
`Sequence[Doc]`

`preprocess_supervised`

Preprocess the document to extract features that will be used by the neural network to perform its training. By default, this returns the same features as the preprocess method.

Parameters

PARAMETER DESCRIPTION

doc

Document to preprocess

TYPE: Doc

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary (optionally nested) containing the features extracted from the document.

`pipe`

Applies the component on a collection of documents. It is recommended to use the Pipeline.pipe method instead of this one to apply a pipeline on a collection of documents, to benefit from the caching of intermediate results.

Parameters

PARAMETER DESCRIPTION

docs

Input docs

TYPE: Iterable[Doc]

batch_size

Batch size to use when making batched to be process at once

DEFAULT: 1

`call`

Applies the component on a single doc. For multiple documents, prefer batch processing via the [batch_process][edspdf.trainable_pipe.TrainablePipe.batch_process] method. In general, prefer the Pipeline methods

Parameters

PARAMETER	DESCRIPTION
`doc`	TYPE: `Doc`

RETURNS	DESCRIPTION
`Doc`