Wrappers example

In this notebook, we will explore Sinergym’s pre-defined wrappers and how to use them.

You can also create your own wrappers by inheriting from gym.Wrapper or any of its variants.

[ ]:
import gymnasium as gym
import numpy as np

import sinergym
from sinergym.utils.wrappers import *

Multi-objective wrapper

MO-Gymnasium is an open-source Python library for developing and comparing multi-objective reinforcement learning algorithms.

Available MO-Gymnasium environments return a reward vector instead of a scalar value, one for each objective.

This wrapper enables Sinergym to return a reward vector. This way, Sinergym is made compatible with both multi-objective algorithms and algorithms that work with a traditional reward value.

We can transform the returned reward into a vector using as follows:

[ ]:
env = gym.make('Eplus-5zone-hot-discrete-v1')
env = MultiObjectiveReward(env, reward_terms=['energy_term', 'comfort_term'])

Make sure that reward_terms are available in the info dict returned by the environment’s step method. Otherwise, an execution error will occur.

By default, Sinergym environments return all reward terms of the reward class in the info dict.

[ ]:
env.reset()
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
env.close()

print(reward)

Previous observation wrappers

This wrapper will add previous timestep observation values to the current environment observation.

You can select the variables whose previous observed values should be tracked. The observation space will be updated with the corresponding new dimension.

[ ]:
env = gym.make('Eplus-5zone-hot-discrete-v1')
env = PreviousObservationWrapper(env, previous_variables=[
    'htg_setpoint',
    'clg_setpoint',
    'air_temperature'])

You can see how the observation values have been updated:

[ ]:
env.reset()
obs, _, _, _, _ = env.step(env.action_space.sample())
obs_dict = dict(zip(env.get_wrapper_attr('observation_variables'), obs))
env.close()

print('NEW OBSERVATION: ', obs_dict)

Datetime wrapper

This wrapper will replace the day value with the is_weekend flag, and hour and month with codified sin and cos values.

The observation space is also automatically updated.

[ ]:
env = gym.make('Eplus-5zone-hot-discrete-v1')
env = DatetimeWrapper(env)

This wrapper removes the observation variables month, day, and hour, and replace them by month_sin, month_cos, is_weekend, hour_sin, and hour_cos:

[ ]:
env.reset()
obs, _, _, _, _ = env.step(env.action_space.sample())
obs_dict = dict(zip(env.get_wrapper_attr('observation_variables'), obs))
env.close()
print('NEW OBSERVATION: ', obs_dict)

print('NEW OBSERVATION: ', obs_dict)

Action normalization wrapper

Here’s an example of how to normalize a continuous action space using the NormalizeAction wrapper.

If the normalization range is not defined, it will be [-1,1] by default.

[ ]:
# Create a continuous environment
env = gym.make('Eplus-5zone-hot-continuous-v1')
print('ORIGINAL ACTION SPACE: ', env.get_wrapper_attr('action_space'))

# Apply the normalization wrapper
env = NormalizeAction(env, normalize_range=(-1.0, 1.0))
print('WRAPPED ACTION SPACE: ', env.get_wrapper_attr('action_space'))

env.reset()
for i in range(5):
    action = env.action_space.sample()
    print('Normalized action: ', action)
    _, _, _, _, info = env.step(action)
    print('Action performed in the simulator: ', info['action'])
env.close()

Action discretization wrapper

Let’s see how to discretize a continuous action space. We will need to specify the new discrete action space and an action mapping function whose output matches the original unwrapped action space:

[ ]:
# We will create a continuous environment
env = gym.make('Eplus-5zone-hot-continuous-v1')
print('ORIGINAL ACTION SPACE: ', env.get_wrapper_attr('action_space'))
print('IS DISCRETE?: ', env.get_wrapper_attr('is_discrete'))

# Defining new discrete space and action mapping function
new_discrete_space = gym.spaces.Discrete(10)  # Action values [0,9]


def action_mapping_function(action):
    mapping = {
        0: [15, 30],  # These lists match with the original action space
        1: [16, 29],
        2: [17, 28],
        3: [18, 27],
        4: [19, 26],
        5: [20, 25],
        6: [21, 24],
        7: [22, 23],
        8: [22, 22.5],
        9: [21, 22.5]
    }

    return mapping[action]


# Apply the discretization wrapper
env = DiscretizeEnv(env, discrete_space=new_discrete_space,
                    action_mapping=action_mapping_function)
print('WRAPPED ACTION SPACE: ', env.get_wrapper_attr('action_space'))
print('IS DISCRETE?: ', env.get_wrapper_attr('is_discrete'))
env.reset()
for i in range(5):
    action = env.action_space.sample()
    print('ACTION DISCRETE: ', action)
    _, _, _, _, info = env.step(action)
    print('Action done in simulator: ', info['action'])
env.close()

Discrete incremental wrapper

This wrapper updates an environment to utilize an incremental setpoint action space.It converts the environment into a discrete environment with an action mapping function and action space depending on the step and delta values specified.

The action is added to the current setpoint value instead of overwriting the latest action. Thus, the action is the current setpoint values with the applied increment/decrement, rather than the discrete value action that defines the increment/decrement itself.

[ ]:
env = gym.make('Eplus-5zone-hot-continuous-v1')
print('ORIGINAL ACTION SPACE: ', env.get_wrapper_attr('action_space'))

env = DiscreteIncrementalWrapper(
    env, initial_values=[21.0, 25.0], delta_temp=2, step_temp=0.5)

print('WRAPPED ACTION SPACE: ', env.get_wrapper_attr('action_space'))
print('WRAPPED ACTION MAPPING: ', env.get_wrapper_attr('action_mapping'))

The maximum and minimum values defined when creating the action mapping are read from the environment action space, ensuring that the setpoint increments and decrements do not exceed the corresponding limits.

The delta and step values are used to determine how the discrete space of these increments and decrements will be constructed.

Here’s an example of how it works:

[ ]:
env.reset()
print('CURRENT SETPOINTS VALUES: ', env.get_wrapper_attr('current_setpoints'))

for i in range(5):
    action = env.action_space.sample()
    _, _, _, _, info = env.step(action)
    print('Action number ', i, ': ',
          env.get_wrapper_attr('action_mapping')(action))
    print('Setpoints update: ', info['action'])
env.close()

Normalization wrapper

This wrapper is used to transform observations received from the simulator to values between in range [-1, 1].

It is based on Gymnasium’s dynamic normalization wrapper.

Until properly calibrated, it may not be precise, and the values may often be out of range, so use this wrapper with caution.

[ ]:
# Original env
env = gym.make('Eplus-5zone-hot-discrete-v1')

# Normalized env
env = NormalizeObservation(
    env=env)

You can check how the specified variables have been correctly normalized:

[ ]:
env.reset()

obs, _, _, _, _ = env.step(env.action_space.sample())
obs_dict = dict(zip(env.get_wrapper_attr('observation_variables'), obs))
env.close()

print('OBSERVATION WITH NORMALIZATION: ', obs_dict)

Logging and storing data with logger wrappers

LoggerWrapper layer

This wrapper uses Sinergym’s LoggerStorage class to properly capture the interaction flow with the environment.

The class used by the wrapper can be replaced with a different back-end. It can then be combined with different wrappers to save the stored data, such as CSVLogger or WandBLogger. For more information about Sinergym’s logger, visit Logging System Overview, Logger Wrappers and an example about custom loggers.

[ ]:
env = gym.make('Eplus-5zone-hot-discrete-v1')
env = LoggerWrapper(env, storage_class=LoggerStorage)

This wrapper enables the use of a LoggerStorage instance within the environment class and automatically captures interaction data while actions are sent by an agent. At each reset, the data from this class is cleared to start the next episode. The idea is to combine it with other output loggers like those listed below:

LoggerCSV layer

[ ]:
env = CSVLogger(env)

env.reset()
truncated = terminated = False
current_month = 0

while not (terminated or truncated):
    a = env.action_space.sample()
    _, _, terminated, truncated, _ = env.step(a)
env.close()

Once the LoggerWrapper is applied, this wrapper can be used to output episode data through Sinergym’s output, along with summary metrics added to CSV files. More details on this structure can be found in OutputFormat.

Sinergym will raise an error if this wrapper is used without first enabling LoggerWrapper or a similar custom logger.

WandBLogger layer

[ ]:
# env = WandBLogger(env = Env,
#                 entity = <wandb_account_entity>,
#                 project_name = <wandb_project_name>,
#                 run_name = <run_name>
#                 group = 'Notebook_example',
#                 tags: ['tag1','tag2'],
#                 save_code = False,
#                 dump_frequency = 1000,
#                 artifact_save = True,
#                 artifact_type = 'output',
#                 excluded_info_keys = ['reward',
#                                   'action',
#                                   'timestep',
#                                   'month',
#                                   'day',
#                                   'hour',
#                                   'time_elapsed(hours)',
#                                   'reward_weight',
#                                   'is_raining'],
#                 excluded_episode_summary_keys = ['terminated',
#                                              'truncated']):

# env.reset()
# truncated = terminated = False
# current_month = 0
# while not (terminated or truncated):
#     a = env.action_space.sample()
#     _,_,terminated,truncated,_=env.step(a)
# env.close()

Similar to CSVLogger, this wrapper requires the environment to have been previously encapsulated by a LoggerWrapper or any custom logger.

The user must have a pre-existing Weights and Biases account and correctly configured it.

This wrapper does not override CSVLogger, so both can be applied simultaneously.

Multi-observation wrapper

This wrapper stacks observations in a history queue, whose size can be customized:

[ ]:
# Original environment
env = gym.make('Eplus-5zone-hot-discrete-v1')
obs, info = env.reset()
print('BEFORE MULTI OBSERVATION: ', obs)

# Multi-observation environment with a queue of size 5
env = MultiObsWrapper(env, n=5, flatten=True)
obs, info = env.reset()

The result is:

[ ]:
print('MULTI OBSERVATION: \n', obs)
env.close()

Weather forecasting wrapper

This wrapper adds weather forecast information to the current observation.

[ ]:
# Original environment
env = gym.make('Eplus-5zone-hot-discrete-v1')
obs, info = env.reset()
print('OBSERVATION VARIABLES BEFORE WEATHER FORECASTING: ',
      env.get_wrapper_attr('observation_variables'))
print('OBSERVATION BEFORE WEATHER FORECASTING: ', obs)

# Weather forecasting environment
env = WeatherForecastingWrapper(env, n=5, delta=1)
obs, info = env.reset()

We can observe the results:

[ ]:
print('OBSERVATION VARIABLES AFTER WEATHER FORECASTING: ',
      env.get_wrapper_attr('observation_variables'))

print('OBSERVATION AFTER WEATHER FORECASTING: ', obs)

Energy cost wrapper

This wrapper adds energy cost information to the current observation:

[ ]:
# Original environment
env = gym.make('Eplus-5zone-hot-discrete-v1')
obs, info = env.reset()
print('OBSERVATION VARIABLES BEFORE ADDING ENERGY COST: \n',
      env.get_wrapper_attr('observation_variables'))
print('OBSERVATION VALUES BEFORE ADDING ENERGY COST: \n', obs)

# Energy Cost environment
env = EnergyCostWrapper(
    env, energy_cost_data_path='/workspaces/sinergym/sinergym/data/energy_cost/PVPC_active_energy_billing_Iberian_Peninsula_2023.csv')
obs, info = env.reset()

This is the result:

[ ]:
print('OBSERVATION VARIABLES AFTER ADDING ENERGY COST: \n', env.get_wrapper_attr('observation_variables'))
print('OBSERVATION VALUES AFTER ADDING ENERGY COST: \n',obs)

Nesting wrappers

All wrappers included in Sinergym are stackable and organized in layers. However, the order in which these layers are applied can affect the final result, depending on the wrappers being used.

For instance, applying the logger before normalizing differs from doing it in the reverse order. In the first case, the data will be logged without normalization, even though the agent will operate in a normalized environment. In the second case, the logger will capture the normalized values since it encapsulates the normalization applied by the previous layer.

An example of how to nest wrappers is shown below:

[ ]:
env = gym.make('Eplus-5zone-hot-continuous-v1')
env = MultiObjectiveReward(
    env=env,
    reward_terms=[
        'energy_term',
        'comfort_term'])
env = PreviousObservationWrapper(env, previous_variables=[
    'htg_setpoint',
    'clg_setpoint',
    'air_temperature'])
env = DatetimeWrapper(env)
env = DiscreteIncrementalWrapper(
    env, initial_values=[21.0, 25.0], delta_temp=2, step_temp=0.5)
env = NormalizeObservation(
    env=env)
env = LoggerWrapper(env=env)
env = MultiObsWrapper(env=env, n=5, flatten=True)

Now we can simply use the wrapped environment as follows:

[ ]:
for i in range(1):
    obs, info = env.reset()
    truncated = terminated = False
    current_month = 0
    while not (terminated or truncated):
        a = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(a)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', reward, info)
env.close()