Trainable classifier
This component predicts a label for each box over the whole document using machine learning.
Note
You must train the model your model to use this classifier. See Model training for more information
Examples
The classifier is composed of the following blocks:
- a configurable embedding layer
- a linear classification layer
In this example, we use a simple CNN-based embedding layer (sub-box-cnn-pooler
),
which applies a stack of CNN layers to the embeddings computed by a text embedding
layer (simple-text-embedding
).
pipeline.add_pipe(
"trainable-classifier",
name="classifier",
config={
# simple embedding computed by pooling embeddings of words in each box
"embedding": {
"@factory": "sub-box-cnn-pooler",
"out_channels": 64,
"kernel_sizes": (3, 4, 5),
"embedding": {
"@factory": "simple-text-embedding",
"size": 72,
},
},
"labels": ["body", "pollution"],
},
)
[components.classifier]
@factory = "trainable-classifier"
labels = ["body", "pollution"]
[components.classifier.embedding]
@factory = "sub-box-cnn-pooler"
out_channels = 64
kernel_sizes = (3, 4, 5)
[components.classifier.embedding.embedding]
@factory = "simple-text-embedding"
size = 72
Parameters
PARAMETER | DESCRIPTION |
---|---|
labels |
Initial labels of the classifier (will be completed during initialization)
TYPE:
|
embedding |
Embedding module to encode the PDF boxes
TYPE:
|