Skip to content

edsnlp.core.torch_component

TorchComponent

Bases: BaseComponent, Module, Generic[BatchOutput, BatchInput]

A TorchComponent is a Component that can be trained and inherits torch.nn.Module. You can use it either as a torch module inside a more complex neural network, or as a standalone component in a Pipeline.

In addition to the methods of a torch module, a TorchComponent adds a few methods to handle preprocessing and collating features, as well as caching intermediate results for components that share a common subcomponent.

post_init

This method completes the attributes of the component, by looking at some documents. It is especially useful to build vocabularies or detect the labels of a classification task.

Parameters

PARAMETER DESCRIPTION
gold_data

The documents to use for initialization.

TYPE: Iterable[Doc]

exclude

The names of components to exclude from initialization. This argument will be gradually updated with the names of initialized components

TYPE: Set[str]

preprocess

Preprocess the document to extract features that will be used by the neural network to perform its predictions.

Parameters

PARAMETER DESCRIPTION
doc

Document to preprocess

TYPE: Doc

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary (optionally nested) containing the features extracted from the document.

collate

Collate the batch of features into a single batch of tensors that can be used by the forward method of the component.

Parameters

PARAMETER DESCRIPTION
batch

Batch of features

TYPE: Dict[str, Any]

RETURNS DESCRIPTION
BatchInput

Dictionary (optionally nested) containing the collated tensors

batch_to_device

Move the batch of tensors to the specified device.

Parameters

PARAMETER DESCRIPTION
batch

Batch of tensors

TYPE: BatchInput

device

Device to move the tensors to

TYPE: Optional[Union[str, device]]

RETURNS DESCRIPTION
BatchInput

forward

Perform the forward pass of the neural network.

Parameters

PARAMETER DESCRIPTION
batch

Batch of tensors (nested dictionary) computed by the collate method

TYPE: BatchInput

RETURNS DESCRIPTION
BatchOutput

module_forward

This is a wrapper around torch.nn.Module.__call__ to avoid conflict with the [Component.__call__][edspdf.component.Component.__call__] method.

make_batch

Convenience method to preprocess a batch of documents and collate them Features corresponding to the same path are grouped together in a list, under the same key.

Parameters

PARAMETER DESCRIPTION
docs

Batch of documents

TYPE: Sequence[Doc]

supervision

Whether to extract supervision features or not

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Dict[str, Sequence[Any]]

batch_process

Process a batch of documents using the neural network. This differs from the pipe method in that it does not return an iterator, but executes the component on the whole batch at once.

Parameters

PARAMETER DESCRIPTION
docs

Batch of documents

TYPE: Sequence[Doc]

RETURNS DESCRIPTION
Sequence[Doc]

Batch of updated documents

postprocess

Update the documents with the predictions of the neural network. By default, this is a no-op.

Parameters

PARAMETER DESCRIPTION
docs

Batch of documents

TYPE: Sequence[Doc]

batch

Batch of predictions, as returned by the forward method

TYPE: BatchOutput

RETURNS DESCRIPTION
Sequence[Doc]

preprocess_supervised

Preprocess the document to extract features that will be used by the neural network to perform its training. By default, this returns the same features as the preprocess method.

Parameters

PARAMETER DESCRIPTION
doc

Document to preprocess

TYPE: Doc

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary (optionally nested) containing the features extracted from the document.

pipe

Applies the component on a collection of documents. It is recommended to use the Pipeline.pipe method instead of this one to apply a pipeline on a collection of documents, to benefit from the caching of intermediate results.

Parameters

PARAMETER DESCRIPTION
docs

Input docs

TYPE: Iterable[Doc]

batch_size

Batch size to use when making batched to be process at once

DEFAULT: 1

__call__

Applies the component on a single doc. For multiple documents, prefer batch processing via the [batch_process][edspdf.trainable_pipe.TrainablePipe.batch_process] method. In general, prefer the Pipeline methods

Parameters

PARAMETER DESCRIPTION
doc

TYPE: Doc

RETURNS DESCRIPTION
Doc