Skip to content

edsnlp.utils.training

make_spacy_corpus_config

Helper to create a spacy's corpus config from training and dev data by loading the documents accordingly and exporting the documents using spacy's DocBin.

Parameters

PARAMETER DESCRIPTION
train_data

The training data. Can be: - a list of spacy.Doc - a path to a given dataset

TYPE: Union[str, List[Doc]]

dev_data

The development data. Can be: - a list of spacy.Doc - a path to a given dataset - the number of documents to take from the training data - the fraction of documents to take from the training data

TYPE: Union[str, List[Doc], int, float]

data_format

Optional data format to determine how we should load the documents from the disk

TYPE: Union[Optional[DataFormat], str] DEFAULT: None

nlp

Optional spacy model to load documents from non-spacy formats (like brat)

TYPE: Optional[Language] DEFAULT: None

seed

The seed if we need to shuffle the data when splitting the dataset

TYPE: int DEFAULT: 0

reader

Which spacy reader to use when loading the data

TYPE: str DEFAULT: 'spacy.Corpus.v1'

RETURNS DESCRIPTION
Config

train

Training help to learn weight of trainable components in a pipeline. This function has been adapted from https://github.com/explosion/spaCy/blob/397197e/spacy/cli/train.py#L18

Parameters

PARAMETER DESCRIPTION
nlp

Spacy model to train

TYPE: Language

output_path

Path to save the model

TYPE: Union[Path, str]

config

Optional config overrides

TYPE: Union[Config, dict]

use_gpu

Which gpu to use for training (-1 means CPU)

TYPE: int DEFAULT: -1