12. Wrappers

Sinergym provides several wrappers to add functionality to the environment that isn’t included by default. The code is available in sinergym/sinergym/utils/wrappers.py. You can create your own wrappers by inheriting from gym.Wrapper or one of its variants, as seen in the Gymnasium documentation.

Note

For examples about how to use these wrappers, visit Wrappers example.

Important

You have to be careful if you are going to use several nested wrappers. A wrapper works on top of the previous one. The order is flexible since Sinergym v3.0.5.

12.1. MultiObjectiveReward

The environment step will return a vector reward (selected elements in the wrapper constructor, one for each objective) instead of a traditional scalar value. Refer to #301 for more information.

12.2. PreviousObservationWrapper

This wrapper adds observation values from the previous timestep to the current environment observation. You can select the variables you want to track for their previous observation values.

12.3. DatetimeWrapper

This wrapper replaces the day_of_month value with the is_weekend flag, and the hour and month values with sin and cos values. The observation space is automatically updated.

12.4. NormalizeAction

This wrapper applies normalization in the action space. It’s particularly useful in DRL algorithms, as some of them only work correctly with normalized values, making environments more generic in DRL solutions.

By default, normalization is applied in the range [-1,1]. However, a different range can be specified when the wrapper is instantiated.

Sinergym parses these values to the real action space defined in the original environment internally before sending it to the EnergyPlus Simulator via the API middleware.

Normalize action wrapper graph.

12.5. DiscretizeEnv

Wrapper to discretize the action space. The Discrete space should be defined according to the Gymnasium standard. This space should be either gym.spaces.Discrete, gym.spaces.MultiDiscrete, or gym.spaces.MultiBinary. An action mapping function is also provided to map these values into ones that are compatible with the underlying continuous environment (before sending it to the simulator).

Important

The discrete space must discretize the original continuous space. Hence, the discrete space should only reach values that are considered in the original environment action space.

Users can define this action mapping function to specify the transition from discrete to continuous values. If the output of the action mapping function doesn’t align with the original environment action space, an error will be raised. Refer to Environment Discretization Wrapper for a usage example.

Discretize wrapper graph.

12.6. IncrementalWrapper

A wrapper is available to convert some of the continuous environment variables into actions that indicate an increase/decrease in their current value, rather than directly setting the value. A dictionary is specified as an argument to calculate the possible increments/decrements for each variable. This dictionary uses the name of each variable to be transformed as the key, and the value is a tuple of values called delta and step, which creates a set of possible increments for each desired variable.

  • delta: The maximum range of increments and decrements.

  • step: The interval of intermediate values within the ranges.

The following figure illustrates its operation. Essentially, the values are rounded to the nearest increment value and added to the current real values of the simulation:

Incremental wrapper graph.

12.7. DiscreteIncrementalWrapper

A wrapper for an incremental setpoint action space environment is also available. This wrapper updates an environment, transforming it into a discrete environment with an action mapping function and action space based on the specified delta and step. The action will be added to the current setpoint values instead of overwriting the latest action. Therefore, the action is the current setpoint values with the increase, rather than the discrete value action, which is intended to define the increment/decrement itself.

Warning

This wrapper fully changes the action space from continuous to discrete, meaning that increments/decrements apply to all variables. In essence, selecting variables individually as in IncrementalWrapper is not possible.

Discrete incremental wrapper graph.

12.8. NormalizeObservation

This is used to transform observations received from the simulator into values between -1 and 1. It’s based on the dynamic normalization wrapper of Gymnasium. Initially, it may not be precise and the values might often be out of range, so use this wrapper with caution.

However, Sinergym enhances its functionality with some additional features:

  • It includes the last unnormalized observation as an environment attribute, which is very useful for logging.

  • It provides access to the means and variations used for normalization calibration, addressing the low-level issues found in the original wrapper.

  • Similarly, these calibration values can be set via a method or in the wrapper constructor. These values can be specified neither in list/numpy array format or writing the txt file path previously generated. Refer to the API reference for more information.

  • The automatic calibration can be enabled or disabled as you interact with the environment, allowing the calibration to remain static instead of adaptive (useful for model evaluations).

In addition, this wrapper saves the values of mean and var in txt files in the *Sinergym* output. This should be used in case of evaluating the model later. An example of its use can be found in the use case Loading a model. It is also important that normalization calibration update is deactivated during evaluation processes.

Sinergym with this wrapper will save the mean and variance files in the root output directory as last calibration values. However, it also will save the intermediate values as files in episode directories, and in the best model found if Evaluation Callback is active in a DRL training process.

These functionalities are crucial when evaluating models trained using this wrapper. For more details, visit #407.

12.9. Logger Wrappers

These wrappers use the Sinergym LoggerStorage class functionalities to store information during environment interactions. For more details, see Logging System Overview.

The diagram below illustrates the relationship between the wrappers and the logger, with explanations provided in the following subsections.

Logger wrappers graph.

12.9.1. LoggerWrapper

BaseLoggerWrapper is the abstract class for logger wrappers. It stores all information during environment interactions. The environment gains a new attribute, data_logger, an instance of LoggerStorage containing all the information. You can create a custom LoggerStorage class by passing it to the constructor to change the logging backend, such as storing information in a different database.

Inherit from this class to create a new logger wrapper and implement abstract methods to define custom and episode summary metrics with the current data. Data is reset at the start of a new episode. Sinergym uses this base class to implement LoggerWrapper, the default logger, but custom loggers can be implemented easily following this abstract class (see Logger Wrapper personalization/configuration).

The current summary metrics for this default Sinergym wrapper are: episode_num,*mean_reward*,*std_reward*, mean_reward_comfort_term,*std_reward_comfort_term*,*mean_reward_energy_term*,*std_reward_energy_term*, mean_abs_comfort_penalty,*std_abs_comfort_penalty*,*mean_abs_energy_penalty*,*std_abs_energy_penalty*, mean_temperature_violation,*std_temperature_violation*,*mean_power_demand*,*std_power_demand*, cumulative_power_demand,*comfort_violation_time(%)*,*length(timesteps)*,*time_elapsed(hours)*, terminated,*truncated*

Although data is reset with each new episode, this wrapper can be combined with others to save all data and summaries in different ways and platforms. Sinergym implements CSVLogger and WandBLogger by default.

12.9.2. CSVLogger

This wrapper works with the LoggerWrapper data_logger instance to parse and save data in CSV files during simulations. A progress.csv file is generated in the root output directory, containing general simulation results, updated per episode. The structure of this file is defined by the LoggerWrapper class.

Each episode directory includes a monitor folder with several CSV files for data such as observations, actions, rewards, infos, and custom metrics. For more details, see Output format.

Observations and infos CSV has one row more than the rest of the files, as they are saved at the beginning of the episode (reset). Then, for a given row with the same index, there would be the observation and info, the action taken in that state and the reward obtained from that action in that state.

12.9.3. WandBLogger

This wrapper works with the LoggerWrapper data_logger instance to dump all information to the WandB platform in real-time. It is useful for real-time training process monitoring and is combinable with Stable Baselines 3 callbacks. The initialization allows definition of the project, entity, run groups, tags, and whether code or outputs are saved as platform artifacts, as well as dump frequency, excluded info keys, and excluded summary metric keys.

This wrapper can be used with a pre-existing WandB session, without the need to specify the entity or project (which, if provided, will be ignored), such as when using sweeps. It still allows specifying other parameters during construction, maintaining full functionality of the wrapper. If there is no pre-existing WandB session, the entity and project fields are required.

This wrapper only saves data on episode summaries when they are 90% complete or more. This can be modified when creating the wrapper.

Important

A Weights and Biases account is required to use this wrapper, with an environment variable containing the API key for login. For more information, visit Weights and Biases.

12.10. ReduceObservationWrapper

This wrapper starts from the original observation space and reduces it by subtracting the variables specified in a string list parameter. These removed variables are returned in the info dictionary (under the key removed_variables) and are not used in the agent optimization process.

If combined with the LoggerWrapper in subsequent layers, the removed variables will be saved in the output files, even if they are not “used”. This makes it perfect for monitoring simulation values that are not part of the problem to be solved.

Similarly, any other wrapper applied in layers prior to this one will affect the removed variables, which can be observed in the info dictionary.

12.11. MultiObsWrapper

This stacks observations received in a history queue (the size can be customized).

12.12. WeatherForecastingWrapper

This wrapper adds weather forecast information to the current observation.

12.13. EnergyCostWrapper

This wrapper adds energy cost information to the current observation.

Warning

This wrapper internally uses the EnergyCostLinearReward reward function independently of the reward function set when creating the environment.

12.14. DeltaTempWrapper

This wrapper adds to the observation space the delta values about specified zone air temperatures. That its, the difference between the zone air temperature and the fixed setpoint value. Thus, it requires that the air temperature and setpoints variables are defined in the wrapper constructor. If the environment has a unique setpoint variable for all zones, you can specify only one variable in setpoints field. Otherwise, you can specify a list of variables, one for each zone.

Important

The air temperature variables and setpoints variables should be specified in the same order. The length of these lists should be the same, in case you are not using the same setpoint for all zones.