`edsnlp.utils.training`

`make_spacy_corpus_config`

Helper to create a spacy's corpus config from training and dev data by loading the documents accordingly and exporting the documents using spacy's DocBin.

Parameters

PARAMETER	DESCRIPTION
`train_data`	The training data. Can be: - a list of spacy.Doc - a path to a given dataset TYPE: `Union[str, List[Doc]]`
`dev_data`	The development data. Can be: - a list of spacy.Doc - a path to a given dataset - the number of documents to take from the training data - the fraction of documents to take from the training data TYPE: `Union[str, List[Doc], int, float]`
`data_format`	Optional data format to determine how we should load the documents from the disk TYPE: `Union[Optional[DataFormat], str]` DEFAULT: `None`
`nlp`	Optional spacy model to load documents from non-spacy formats (like brat) TYPE: `Optional[Language]` DEFAULT: `None`
`seed`	The seed if we need to shuffle the data when splitting the dataset TYPE: `int` DEFAULT: `0`
`reader`	Which spacy reader to use when loading the data TYPE: `str` DEFAULT: `'spacy.Corpus.v1'`

RETURNS	DESCRIPTION
`Config`

`train`

Training help to learn weight of trainable components in a pipeline. This function has been adapted from https://github.com/explosion/spaCy/blob/397197e/spacy/cli/train.py#L18

Parameters

PARAMETER	DESCRIPTION
`nlp`	Spacy model to train TYPE: `Language`
`output_path`	Path to save the model TYPE: `Union[Path, str]`
`config`	Optional config overrides TYPE: `Union[Config, dict]`
`use_gpu`	Which gpu to use for training (-1 means CPU) TYPE: `int` DEFAULT: `-1`