edsnlp.utils.training
make_spacy_corpus_config
Helper to create a spacy's corpus config from training and dev data by loading the documents accordingly and exporting the documents using spacy's DocBin.
Parameters
PARAMETER | DESCRIPTION |
---|---|
train_data | The training data. Can be: - a list of spacy.Doc - a path to a given dataset TYPE: |
dev_data | The development data. Can be: - a list of spacy.Doc - a path to a given dataset - the number of documents to take from the training data - the fraction of documents to take from the training data TYPE: |
data_format | Optional data format to determine how we should load the documents from the disk TYPE: |
nlp | Optional spacy model to load documents from non-spacy formats (like brat) TYPE: |
seed | The seed if we need to shuffle the data when splitting the dataset TYPE: |
reader | Which spacy reader to use when loading the data TYPE: |
RETURNS | DESCRIPTION |
---|---|
Config | |
train
Training help to learn weight of trainable components in a pipeline. This function has been adapted from https://github.com/explosion/spaCy/blob/397197e/spacy/cli/train.py#L18
Parameters
PARAMETER | DESCRIPTION |
---|---|
nlp | Spacy model to train TYPE: |
output_path | Path to save the model TYPE: |
config | Optional config overrides TYPE: |
use_gpu | Which gpu to use for training (-1 means CPU) TYPE: |