#######################################
Deep Reinforcement Learning integration
#######################################
*Sinergym* is compatible with any controller that operates under the Gymnasium interface, and can be used with most existing **Deep Reinforcement Learning** (DRL) libraries.
It has a close integration with `Stable Baselines 3 `__, especially regarding the use of **callbacks**. Callbacks are functions called at specific stages of DRL agents execution. They allow access to the internal state of the DRL model during training, enabling monitoring, auto-saving, model manipulation, progress visualization, and more.
Pre-implemented callbacks provided by *Sinergym* inherit from Stable Baselines 3 and can be found in `sinergym/sinergym/utils/callbacks.py `__.
******************
LoggerEvalCallback
******************
The ``LoggerEvalCallback`` is used to evaluate the different model versions obtained during the training process of the agent. It saves the best model obtained, not necessarily the final one from the training process. This callback inherits from the ``EventCallback`` of Stable Baselines 3.
This callback is similar to the ``EvalCallback`` of Stable Baselines 3 but includes numerous enhancements and specific adaptations for *Sinergym*, in particular for logging relevant simulation data during the training process.
The evaluation environment must be first wrapped by a child class of ``BaseLoggerWrapper``. This is essential for the callback to access the logger's methods and attributes, and to log the information correctly.
In addition, this callback stores the best model and evaluation summaries (in CSV format) in a folder named ``evaluation`` within the training environment output.
Weights And Biases logging
~~~~~~~~~~~~~~~~~~~~~~~~~~
To log all this data to the `Weights and Biases `__ platform, the training environment must be first wrapped with the ``WandbLoggerWrapper`` class (see :ref:`Logger Wrappers`). Encapsulation of the evaluation environment is not necessary unless detailed monitoring of these episodes is desired.
The data logged to the platform (in the *Evaluations* section) depends on the specific logger wrapper used and its episode summary. Therefore, to get new metrics, the logger wrapper must be modified, not the callback. In addition, this callback will overwrite certain metrics for the best model obtained during the training process, in order to preserve the metrics of the best model.
The number of episodes run in each evaluation and their frequency can be configured, and metrics from the underlying logger can be excluded if desired. Moreover, if the observation space is normalized, the callback **automatically copies the calibration parameters** from the training environment to the evaluation environment.
More episodes lead to more accurate averages of the reward-based indicators, providing a more realistic assessment of the current model's performance. However, this will increase the time required. For a detailed usage example, see :ref:`Training a model`.
*****
Usage
*****
Model training
~~~~~~~~~~~~~~
If you want to train a DRL agent using *Sinergym*, you can use the script `sinergym/scripts/train/local_confs/train_agent_local_conf.py `__, which is easily adaptable for custom experiments.
Here are a few key points to consider:
* Models are instantiated using an algorithm constructor, each with its own **specific parameters**. Defaults are used if none are provided.
* If you apply a normalization wrapper to the environment, models will **train** using these **normalized** spaces.
* Callbacks are **combined** using a ``CallbackList`` from Stable Baselines3.
* Training starts when the ``model.learn()`` method is called. Important parameters such as ``total_timesteps``, ``callback``, and ``log_interval`` are passed here.
* **Sequential / curriculum learning** is supported by providing a path to a previously trained model using the ``model`` parameter. This allows resuming or fine-tuning a model.
The ``train_agent_local_conf.py`` script requires a single argument (``-conf``), which should point to a YAML configuration file. An example configuration file with detailed comments can be found here: `train_agent_PPO.yaml `__.
We distinguish between *mandatory* and *optional* configuration parameters:
* **Mandatory**: environment, number of training episodes, and algorithm (including non-default hyperparameters if needed).
* **Optional**: environment parameters (override defaults), random seed, pretrained model path, experiment ID, wrappers (in order), evaluation settings, and cloud integration options.
Once executed, the script performs the following steps:
1. **Generate the experiment name** using the format ``_`` if ``experiment_name`` is specified, or ``_`` otherwise.
2. **Load a pretrained model**, if defined in the configuration:
- From a local file path.
- From a Weights & Biases (WandB) artifact.
- From a Google Cloud Storage bucket.
3. **Load and configure environment parameters**:
- If an environment YAML configuration is provided, load all parameters from it (:ref:`Environment Configuration Serialization`).
- Optionally override or extend specific parameters using ``env_params`` in the configuration.
- Set the ``env_name`` to match the experiment name for better traceability.
4. **Apply wrappers to the environment**, if specified:
- Load wrapper settings from a YAML file (:ref:`Wrapper Serialization and Restoration`).
- Optionally override or add wrappers defined directly in the configuration.
- Supports custom objects or callables using the ``: