edsnlp.core.torch_component
TorchComponent
Bases: BaseComponent
, Module
, Generic[BatchOutput, BatchInput]
A TorchComponent is a Component that can be trained and inherits torch.nn.Module
. You can use it either as a torch module inside a more complex neural network, or as a standalone component in a Pipeline.
In addition to the methods of a torch module, a TorchComponent adds a few methods to handle preprocessing and collating features, as well as caching intermediate results for components that share a common subcomponent.
post_init
This method completes the attributes of the component, by looking at some documents. It is especially useful to build vocabularies or detect the labels of a classification task.
Parameters
PARAMETER | DESCRIPTION |
---|---|
gold_data | The documents to use for initialization. TYPE: |
exclude | The names of components to exclude from initialization. This argument will be gradually updated with the names of initialized components TYPE: |
preprocess
Preprocess the document to extract features that will be used by the neural network to perform its predictions.
Parameters
PARAMETER | DESCRIPTION |
---|---|
doc | Document to preprocess TYPE: |
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any] | Dictionary (optionally nested) containing the features extracted from the document. |
collate
Collate the batch of features into a single batch of tensors that can be used by the forward method of the component.
Parameters
PARAMETER | DESCRIPTION |
---|---|
batch | Batch of features TYPE: |
RETURNS | DESCRIPTION |
---|---|
BatchInput | Dictionary (optionally nested) containing the collated tensors |
batch_to_device
Move the batch of tensors to the specified device.
Parameters
PARAMETER | DESCRIPTION |
---|---|
batch | Batch of tensors TYPE: |
device | Device to move the tensors to TYPE: |
RETURNS | DESCRIPTION |
---|---|
BatchInput | |
forward
Perform the forward pass of the neural network.
Parameters
PARAMETER | DESCRIPTION |
---|---|
batch | Batch of tensors (nested dictionary) computed by the collate method TYPE: |
RETURNS | DESCRIPTION |
---|---|
BatchOutput | |
module_forward
This is a wrapper around torch.nn.Module.__call__
to avoid conflict with the [Component.__call__
][edspdf.component.Component.__call__] method.
make_batch
Convenience method to preprocess a batch of documents and collate them Features corresponding to the same path are grouped together in a list, under the same key.
Parameters
PARAMETER | DESCRIPTION |
---|---|
docs | Batch of documents TYPE: |
supervision | Whether to extract supervision features or not TYPE: |
RETURNS | DESCRIPTION |
---|---|
Dict[str, Sequence[Any]] | |
batch_process
Process a batch of documents using the neural network. This differs from the pipe
method in that it does not return an iterator, but executes the component on the whole batch at once.
Parameters
PARAMETER | DESCRIPTION |
---|---|
docs | Batch of documents TYPE: |
RETURNS | DESCRIPTION |
---|---|
Sequence[Doc] | Batch of updated documents |
postprocess
Update the documents with the predictions of the neural network. By default, this is a no-op.
Parameters
PARAMETER | DESCRIPTION |
---|---|
docs | Batch of documents TYPE: |
batch | Batch of predictions, as returned by the forward method TYPE: |
RETURNS | DESCRIPTION |
---|---|
Sequence[Doc] | |
preprocess_supervised
Preprocess the document to extract features that will be used by the neural network to perform its training. By default, this returns the same features as the preprocess
method.
Parameters
PARAMETER | DESCRIPTION |
---|---|
doc | Document to preprocess TYPE: |
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any] | Dictionary (optionally nested) containing the features extracted from the document. |
pipe
Applies the component on a collection of documents. It is recommended to use the Pipeline.pipe
method instead of this one to apply a pipeline on a collection of documents, to benefit from the caching of intermediate results.
Parameters
PARAMETER | DESCRIPTION |
---|---|
docs | Input docs TYPE: |
batch_size | Batch size to use when making batched to be process at once DEFAULT: |
__call__
Applies the component on a single doc. For multiple documents, prefer batch processing via the [batch_process][edspdf.trainable_pipe.TrainablePipe.batch_process] method. In general, prefer the Pipeline methods
Parameters
PARAMETER | DESCRIPTION |
---|---|
doc | TYPE: |
RETURNS | DESCRIPTION |
---|---|
Doc | |