Wrappers
Sinergym provides several wrappers to add functionalities to the environments that are not included by default.
The implementations of these wrappers are available in sinergym/sinergym/utils/wrappers.py.
Custom wrappers can be created by inheriting from gym.Wrapper
or one of its variants, as seen in the
Gymnasium documentation.
Note
For examples on how to use these wrappers, refer to Wrappers example.
Important
Take care of the wrapping order when using multiple nested wrappers, as their inputs and outputs may be altered.
Below are the wrappers that are pre-implemented in Sinergym.
MultiObjectiveReward
When using it, every environment step will return a reward vector (one scalar per reward term) instead of a single scalar value. Refer to #301 for additional information.
PreviousObservationWrapper
This wrapper adds observations from the previous timestep to the current environment observation. You can select the variables you want to track for their previous observation values.
DatetimeWrapper
This wrapper replaces the day_of_month
value with the is_weekend
flag, and the hour
and month
values with sin and cos values. The observation space is automatically updated.
NormalizeAction
This wrapper normalizes the action space. It is particularly useful for DRL algorithms, as normalized action values are generally recommended.
By default, normalization is applied in the range [-1,1]
. However, a different range can be specified when the wrapper is instantiated.
Sinergym parses these values to the real action space defined in the original environment internally before sending it to the EnergyPlus Simulator via the API middleware.
DiscretizeEnv
This wrapper discretizes the action space. The discrete space should be defined according to the Gymnasium standard. This space should be either gym.spaces.Discrete
, gym.spaces.MultiDiscrete
, or gym.spaces.MultiBinary
.
An action mapping function is also provided to map these values into ones that are compatible with the underlying continuous environment, just before sending them to the simulator.
Important
The discrete space must discretize the original continuous space. Hence, the discrete space should only take values that are considered in the original continuous action space.
Users can define this action mapping function to specify the transition from discrete to continuous values. If the output of the action mapping function does not align with the original environment action space, an error will be raised. Refer to Action discretization wrapper for an usage example.
IncrementalWrapper
This wrapper converts some continuous variables into actions that indicate an increment/decrement with respect to their current value, rather than directly setting a value.
A dictionary is given as an argument to calculate the possible increments/decrements for each variable. This dictionary uses the name of each variable to be transformed as the key, while the value is a tuple of values called delta and step, which creates a set of possible increments for each desired variable.
delta: the maximum range of increments and decrements.
step: the interval of intermediate values within the ranges.
The following figure illustrates its operation. Essentially, the values are rounded to the nearest increment value and added to the current real values of the simulation:
DiscreteIncrementalWrapper
This wrapper updates an environment, transforming it into a discrete environment with an action mapping function and action based on the specified delta and step. The action is added to the current setpoint values rather than overwriting the last action. Therefore, the action is the current setpoint with the increment, rather than the discrete value action which is intended to define the increment/decrement itself.
Warning
This wrapper fully changes the action space from continuous to discrete, meaning that increments/decrements
apply to all variables. In essence, selecting variables individually as the IncrementalWrapper
does is not possible.
NormalizeObservation
This wrapper is used to transform observations received from the simulator into values in [-1,1]
.
It is based on the dynamic normalization wrapper of Gymnasium.
Initially, it may not be precise and the values may often be out of range, so use this wrapper with caution.
However, Sinergym extends its functionality with some additional features:
It includes the last unnormalized observation as an environment attribute, which is useful for logging.
It provides access to the means and standard deviations used for normalization calibration, thus addressing the low-level issues found in the original wrapper.
Similarly, these calibration values can be set via a method or within the wrapper constructor. These values can be specified either as a list or as
numpy
array, or simply writing the file path generated. See API reference for more information.Automatic calibration can be enabled or disabled when interacting with the environment, allowing the calibration to remain static rather than adaptive. This is useful for model evaluation.
In addition, this wrapper saves the mean and standard deviation values as part of the Sinergym output. These can be used when loading and a evaluating a trained model.
An example of its use can be found in Loading and evaluating a trained model. It is also important that normalization calibration update is disabled during evaluation.
Sinergym will also save intermediate mean and standard deviation values in files within episode directories, as well as for the best model obtained if LoggerEvalCallback is active during training.
These features are crucial when evaluating models trained using this wrapper. For more details, see #407.
Logger Wrappers
These wrappers use the Sinergym LoggerStorage
class functionalities to save information during environment interactions. For more details, see Logging system overview.
The diagram below illustrates the relationship between the wrappers and the logger, with explanations provided in the following subsections.
LoggerWrapper
BaseLoggerWrapper
is the abstract class for logger wrappers. It stores all the relevant information during environment interactions. A new attribute, data_logger
, is included to the environment. This is an instance of LoggerStorage
containing the interaction information. A custom LoggerStorage
class can be used by passing it to the constructor to change the logging backend.
Inherit from this class to create a new logger wrapper and implement abstract methods to define custom and episode summary metrics from the available data.
Sinergym uses this base class to implement the LoggerWrapper
, the default logger, but custom loggers can be implemented easily from this abstract class (see LoggerWrapper customization).
The current summary metrics for this default Sinergym wrapper are: episode_num
, mean_reward
, std_reward
,
mean_reward_comfort_term
, std_reward_comfort_term
, mean_reward_energy_term
, std_reward_energy_term
,
mean_abs_comfort_penalty
, std_abs_comfort_penalty
, mean_abs_energy_penalty
, std_abs_energy_penalty
,
mean_temperature_violation
, std_temperature_violation
, mean_power_demand
, std_power_demand
, cumulative_power_demand
, comfort_violation_time(%)
, length(timesteps)
, time_elapsed(hours)
, terminated
, truncated
Note how data is refreshed with each new episode. However, this wrapper can be combined with others to store all data and summaries in different locations and formats. For this purpose, Sinergym implements CSVLogger
and WandBLogger
.
CSVLogger
This wrapper works with the data_logger
instance of LoggerWrapper
, enabling the parsing and saving of data in CSV files during simulations. A file named progress.csv
is generated in the root of the output directory. This file contains general simulation results, updated per episode. The structure of this file is defined in the LoggerWrapper
class.
Each episode directory includes a monitor
directory with several CSV files for data such as observations, actions, rewards, info and custom metrics, as detailed in Sinergym output.
Please note that the CSVs for observations and info dictionaries are saved with an additional row, as they are saved at the beginning of the episode when reset
is called. Subsequently, for a given row with the same index, there would be the observation and info, the action taken in that state, and the reward obtained from that action in that state.
WandBLogger
This wrapper works with the data_logger
instance of LoggerWrapper
to dump information to the Weights and Biases platform in real-time.
This solution is ideal for monitoring the real-time training process and can be integrated with Stable Baselines 3 callbacks. The initialization process allows the user to define a number of key parameters, including the project, entity, run groups, tags, and whether code or outputs are saved as platform artifacts. Additionally, the user can specify the dump frequency, any excluded info keys, and excluded summary metric keys.
This wrapper can be used with an existing WandB session, eliminating the need to specify the entity or project (which, if provided, will be ignored). In the absence of a pre-existing WandB session, it is necessary to provide the entity and project fields.
This wrapper will only save data on episode summaries once they have reached a minimum of 90% completion. This can be modified when creating the wrapper.
Important
A Weights and Biases account is required to use this wrapper, with an environment variable containing the API key for login.
ReduceObservationWrapper
This wrapper reduces the original observation space by subtracting the variables specified in the string list parameter. These removed variables are returned in the info dictionary under the key removed_variables
, and are ignored by the agent.
If combined with the LoggerWrapper in subsequent layers, the removed variables will be saved in the output files, even if they are not used. This makes it perfect for monitoring simulation values that are not part of the problem to be solved.
Similarly, any other wrapper applied in layers prior to this one will affect the removed variables, which can be observed in the info dictionary.
MultiObsWrapper
This wrapper stacks observations received in a history queue.
The size of the queue can be customized.
WeatherForecastingWrapper
This wrapper adds weather forecast information to the current observation.
EnergyCostWrapper
This wrapper adds energy cost information to the current observation.
Warning
It internally uses the EnergyCostLinearReward
reward function, independently of the reward function set when creating the environment.
DeltaTempWrapper
This wrapper adds to the observation space the delta values with respect to the specified zone temperatures, that its, the difference between the zone air temperature and the fixed setpoint value.
It requires that the air temperature and setpoints variables are defined in the wrapper constructor.
If the environment has a unique setpoint variable for all zones, you can specify a single setpoint variable. Otherwise, you can specify a list of variables, one for each zone.
Important
The air temperature variables and setpoints variables should be specified in the same order. The length of these lists should be the same, in case you are not using the same setpoint for all zones.