Loggers

When training a model, it is important to keep track of the training process, model performance at different stages, and statistics about the training data over time. This is where loggers come in. Loggers are used to store such information to be able to analyze and visualize it later.

The EDS-NLP training API (edsnlp.train) relies on accelerate's integration of popular loggers, as well as a few custom loggers. You can configure loggers in edsnlp.train via the logger parameter of the train function by specifying:

a string or a class instance or partially initialized class instance of a logger, e.g.

Via the Python APIVia a config file

from edsnlp.training.loggers import CSVLogger
from edsnlp.training import train

logger = CSVLogger.draft()
train(..., logger=logger)
# or train(..., logger="csv")

train:
  ...
  logger:
    "@loggers": csv !draft
    ...

or a list of string / logger instances, e.g.

Via the Python APIVia a config file

from edsnlp.training.loggers import CSVLogger
from edsnlp.training import train

loggers = ["tensorboard", CSVLogger.draft(...)]
train(..., logger=loggers)

train:
  ...
  logger:
      - tensorboard  # as a string
      - "@loggers": csv !draft
        ...

Draft objects

edsnlp.train can provide a default project_name and logging_dir for loggers that require these parameters. For these loggers, if you don't want to set the project name yourself, you can either:

call CSVLogger.draft(...) without the normal init parameters minus the project_name or logging_dir parameters, which will cause a Draft[CSVLogger] object to be returned, which be instantiated later when the required parameters are available
or use "@loggers": csv !draft in the config file, which is the config file equivalent to the .draft() method above
use the string shorthands logger: ["csv", "tensorboard", ...], which will use the default project name and logging dir

The supported loggers are listed below.

RichLogger

A logger that displays logs in a Rich-based table using rich-logger. This logger is also available via the loggers registry as rich.

No Disk Logging

This logger doesn't save logs to disk. It's meant for displaying logs in a pretty table during training. If you need to save logs to disk, consider combining this logger with any other logger.

Parameters

PARAMETER DESCRIPTION

fields

Field descriptors containing goal ("lower_is_better" or "higher_is_better"), format and display name The key is a regex that will be used to match the fields to log Each entry of the dictionary should match the following scheme:

key: a regex to match columns
value: either a Dict or False to hide the column, the dict format is
- name: the name of the column
- goal: "lower_is_better" or "higher_is_better"

This defaults to a set of metrics and stats that are commonly logged during EDS-NLP training.

TYPE: Dict[str, Union[Dict, bool]] DEFAULT: None

key

Key to group the logs

TYPE: Optional[str] DEFAULT: None

hijack_tqdm

Whether to replace the tqdm progress bar with a rich progress bar. Indeed, rich progress bars integrate better with the rich table.

TYPE: bool DEFAULT: True

CSVLogger

A simple CSV-based logger that writes logs to a CSV file. By default, with edsnlp.train the CSV file is located under a local directory ${CWD}/artifact/metrics.csv.

Consistent Keys

This logger expects that the values dictionary passed to log has consistent keys across all calls. If a new key is encountered in a subsequent call, it will be ignored and a warning will be issued.

Parameters

PARAMETER DESCRIPTION

logging_dir

Directory in which to store the CSV.

TYPE: str or PathLike

file_name

Name of the CSV file. Defaults to "metrics.csv".

TYPE: str DEFAULT: 'metrics.csv'

JSONLogger

A simple JSON-based logger that writes logs to a JSON file as a list of dictionaries. By default, with edsnlp.train the JSON file is located under a local directory ${CWD}/artifact/metrics.json.

This method is not recommended for large and frequent logging, as it re-writes the entire JSON file on every call. Prefer CSVLogger for frequent and heavy logging.

Parameters

PARAMETER DESCRIPTION

logging_dir

Directory in which to store the JSON file.

TYPE: str or PathLike

file_name

Name of the JSON file. Defaults to "metrics.json".

TYPE: str DEFAULT: 'metrics.json'

TensorBoardLogger

Logger for TensorBoard. This logger is also available via the loggers registry as tensorboard.

Parameters

PARAMETER DESCRIPTION

project_name

Name of the project.

TYPE: str

logging_dir

Directory in which to store the TensorBoard logs. Logs of different runs will be stored in logging_dir/project_name. The environment variable TENSORBOARD_LOGGING_DIR takes precedence over this argument.

TYPE: Optional[Union[str, PathLike]] DEFAULT: None

AimLogger

Logger for Aim.

Parameters

PARAMETER DESCRIPTION

project_name

Name of the project.

TYPE: str

logging_dir

Directory in which to store the Aim logs. The environment variable AIM_LOGGING_DIR takes precedence over this argument.

TYPE: Optional[Union[str, PathLike]] DEFAULT: None

kwargs

Additional keyword arguments to pass to the Aim init function.

DEFAULT: {}

WandBLogger

Logger for Weights & Biases. This logger is also available via the loggers registry as wandb.

Parameters

PARAMETER DESCRIPTION

project_name

Name of the project. This will become the project parameter in wandb.init.

TYPE: str

kwargs

Additional keyword arguments to pass to the WandB init function.

DEFAULT: {}

MLflowLogger

Logger for MLflow. This logger is also available via the loggers registry as mlflow.

Parameters

PARAMETER	DESCRIPTION
`project_name`	Name of the project. This will become the mlflow experiment name. TYPE: `str`
`logging_dir`	Directory in which to store the MLflow logs. TYPE: `Optional[Union[str, PathLike]]` DEFAULT: `None`
`run_id`	If specified, get the run with the specified UUID and log parameters and metrics under that run. The run’s end time is unset and its status is set to running, but the run’s other attributes (source_version, source_type, etc.) are not changed. Environment variable MLFLOW_RUN_ID has priority over this argument. TYPE: `Optional[str]` DEFAULT: `None`
`tags`	An optional `dict` of `str` keys and values, or a `str` dump from a `dict`, to set as tags on the run. If a run is being resumed, these tags are set on the resumed run. If a new run is being created, these tags are set on the new run. Environment variable MLFLOW_TAGS has priority over this argument. TYPE: `Optional[Union[Dict[str, Any], str]]` DEFAULT: `None`
`nested_run`	Controls whether run is nested in parent run. True creates a nested run. Environment variable MLFLOW_NESTED_RUN has priority over this argument. TYPE: `Optional[bool]` DEFAULT: `False`
`run_name`	Name of new run (stored as a mlflow.runName tag). Used only when `run_id` is unspecified. TYPE: `Optional[str]` DEFAULT: `None`
`description`	An optional string that populates the description box of the run. If a run is being resumed, the description is set on the resumed run. If a new run is being created, the description is set on the new run. TYPE: `Optional[str]` DEFAULT: `None`

CometMLLogger

Logger for CometML. This logger is also available via the loggers registry as cometml.

Parameters

PARAMETER DESCRIPTION

project_name

Name of the project.

TYPE: str

kwargs

Additional keyword arguments to pass to the CometML Experiment object.

DEFAULT: {}

Loggers

RichLogger

Parameters

CSVLogger

Parameters

JSONLogger

Parameters

TensorBoardLogger

Parameters

AimLogger

Parameters

WandBLogger

Parameters

MLflowLogger

Parameters

CometMLLogger

Parameters

DVCLiveLogger[source]