Inference

Once you have obtained a pipeline, either by composing rule-based components, training a model or loading a model from the disk, you can use it to make predictions on documents. This is referred to as inference. This page answers the following questions :

How do we leverage computational resources run a model on many documents?

How do we connect to various data sources to retrieve documents?

Inference on a single document

In EDS-NLP, computing the prediction on a single document is done by calling the pipeline on the document. The input can be either:

a text string
or a Doc object

from pathlib import Path

nlp = ...
text = "... my text ..."
doc = nlp(text)

If you're lucky enough to have a GPU, you can use it to speed up inference by moving the model to the GPU before calling the pipeline. To leverage multiple GPUs, refer to the multiprocessing accelerator description below.

nlp.to("cuda")  # same semantics as pytorch
doc = nlp(text)

Inference on multiple documents

When processing multiple documents, it is usually more efficient to use the nlp.pipe(...) method, especially when using deep learning components, since this allow matrix multiplications to be batched together. Depending on your computational resources and requirements, EDS-NLP comes with various "accelerators" to speed up inference (see the Accelerators section for more details). By default, the .pipe() method uses the simple accelerator but you can switch to a different one by passing the accelerator argument.

nlp = ...
docs = nlp.pipe(
    [text1, text2, ...],
    batch_size=16,  # optional, default to the one defined in the pipeline
    accelerator=my_accelerator,
)

The pipe method supports the following arguments :

Parameters

PARAMETER	DESCRIPTION
`inputs`	The inputs to create the Docs from, or Docs directly. TYPE: `Iterable[Union[str, Doc]]`
`batch_size`	The batch size to use. If not provided, the batch size of the pipeline object will be used. TYPE: `Optional[int]` DEFAULT: `None`
`accelerator`	The accelerator to use for processing the documents. If not provided, the default accelerator will be used. TYPE: `Optional[Union[str, Accelerator]]` DEFAULT: `None`
`to_doc`	The function to use to convert the inputs to PDFDoc objects. By default, the `content` field of the inputs will be used if dict-like objects are provided, otherwise the inputs will be passed directly to the pipeline. TYPE: `ToDoc` DEFAULT: `None`
`from_doc`	The function to use to convert the PDFDoc objects to outputs. By default, the PDFDoc objects will be returned directly. TYPE: `FromDoc` DEFAULT: `lambda : doc`

Accelerators

Simple accelerator

This is the simplest accelerator which batches the documents and process each batch on the main process (the one calling .pipe()).

Examples

docs = list(nlp.pipe([content1, content2, ...]))

or, if you want to override the model defined batch size

docs = list(nlp.pipe([content1, content2, ...], batch_size=8))

which is equivalent to passing a confit dict

docs = list(
    nlp.pipe(
        [text1, text2, ...],
        accelerator={
            "@accelerator": "simple",
            "batch_size": 8,
        },
    )
)

or the instantiated accelerator directly

from edsnlp.accelerators.simple import SimpleAccelerator

accelerator = SimpleAccelerator(batch_size=8)
docs = list(nlp.pipe([content1, content2, ...], accelerator=accelerator))

If you have a GPU, make sure to move the model to the appropriate device before calling .pipe(). If you have multiple GPUs, use the multiprocessing accelerator instead.

nlp.to("cuda")
docs = list(nlp.pipe([content1, content2, ...]))

Parameters

PARAMETER DESCRIPTION

batch_size

The number of documents to process in each batch.

TYPE: int DEFAULT: 32

Multiprocessing (GPU) accelerator

If you have multiple CPU cores, and optionally multiple GPUs, we provide a multiprocessing accelerator that allows to run the inference on multiple processes.

This accelerator dispatches the batches between multiple workers (data-parallelism), and distribute the computation of a given batch on one or two workers (model-parallelism). This is done by creating two types of workers:

a CPUWorker which handles the non deep-learning components and the preprocessing, collating and postprocessing of deep-learning components
a GPUWorker which handles the forward call of the deep-learning components

The advantage of dedicating a worker to the deep-learning components is that it allows to prepare multiple batches in parallel in multiple CPUWorker, and ensure that the GPUWorker never wait for a batch to be ready.

The overall architecture described in the following figure, for 3 CPU workers and 2 GPU workers.

Here is how a small pipeline with rule-based components and deep-learning components is distributed between the workers:

Examples

docs = list(
    nlp.pipe(
        [text1, text2, ...],
        accelerator={
            "@accelerator": "multiprocessing",
            "num_cpu_workers": 3,
            "num_gpu_workers": 2,
            "batch_size": 8,
        },
    )
)

Parameters

PARAMETER	DESCRIPTION
`batch_size`	Number of documents to process at a time in a CPU/GPU worker TYPE: `int`
`num_cpu_workers`	Number of CPU workers. A CPU worker handles the non deep-learning components and the preprocessing, collating and postprocessing of deep-learning components. TYPE: `int` DEFAULT: `None`
`num_gpu_workers`	Number of GPU workers. A GPU worker handles the forward call of the deep-learning components. TYPE: `Optional[int]` DEFAULT: `None`
`gpu_pipe_names`	List of pipe names to accelerate on a GPUWorker, defaults to all pipes that inherit from TorchComponent TYPE: `Optional[List[str]]` DEFAULT: `None`