Changing an environment registered in Sinergym

As discussed above, Sinergym has a list of available environments that we can call with gym.make(<environment_id>) as long as the sinergym package is imported into the python script.

[1]:

import gym
import numpy as np

import sinergym
env = gym.make('Eplus-5Zone-hot-continuous-stochastic-v1')

/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(
[2022-08-24 08:55:54,216] EPLUS_ENV_5Zone-hot-continuous-stochastic-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2022-08-24 08:55:54,217] EPLUS_ENV_5Zone-hot-continuous-stochastic-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2022-08-24 08:55:54,219] EPLUS_ENV_5Zone-hot-continuous-stochastic-v1_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2022-08-24 08:55:54,220] EPLUS_ENV_5Zone-hot-continuous-stochastic-v1_MainThread_ROOT INFO:Setting up extra configuration in building model if exists...
/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(

These environment IDs have a number of components defined, not only the building design (IDF), but also the reward function, the action and observation spaces, the defining variables, etc.

If you want a new environment, you can define it from scratch in our environment list and run it locally.

Another option (recommended) is to start from one of our environments and change the components you want. You can combine components you want to change obviously.

The way to do that is to add in the gym.make(<environment_id>) those parameters of the constructor of the Sinergym environment we want to change. Let’s see what things we can change starting from any environment:

Adding a new reward

As mentioned above, simply add the appropriate parameters to gym.make() after specifying the environment ID.

[2]:

from sinergym.utils.rewards import LinearReward, ExpReward

env = gym.make('Eplus-5Zone-hot-continuous-v1', reward=ExpReward, reward_kwargs={
                                                                    'temperature_variable': 'Zone Air Temperature (SPACE1-1)',
                                                                    'energy_variable': 'Facility Total HVAC Electricity Demand Rate (Whole Building)',
                                                                    'range_comfort_winter': (20.0, 23.5),
                                                                    'range_comfort_summer': (23.0, 26.0),
                                                                    'energy_weight': 0.1})

[2022-08-24 08:55:55,088] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2022-08-24 08:55:55,089] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2022-08-24 08:55:55,091] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2022-08-24 08:55:55,093] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Setting up extra configuration in building model if exists...
/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(

You have to specify the reward class you are going to use in the environment. A reward function class has several parameters that can be specified by user like temperature variables, weights, etc. Depending on the reward function you are using. In order to be able to define it, we have reward_kwargs parameter. See reward documentation for more information about reward classes and how to create a new one.

Adding other new components to the environment

In the same way that we can change the default reward function, as we have done in the second example, it is possible to substitute other default values of the environment ID.

You can change the weather file, the number of timesteps an action repeats (default 1), the last n episodes you want to be stored in the Sinergym output folder (default 10), the name of the environment or the variability in stochastic environments:

[3]:

env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
                weather_file='ESP_Granada.084190_SWEC.epw',
                weather_variability=(1.0,0.0,0.001),
                env_name='new_env_name',
                act_repeat=4,
                max_ep_data_store_num = 20)

[2022-08-24 08:55:56,077] EPLUS_ENV_new_env_name_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2022-08-24 08:55:56,078] EPLUS_ENV_new_env_name_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2022-08-24 08:55:56,080] EPLUS_ENV_new_env_name_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2022-08-24 08:55:56,082] EPLUS_ENV_new_env_name_MainThread_ROOT INFO:Setting up extra configuration in building model if exists...
/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(

Changing observation and action spaces

By default, the IDs of the predefined environments in Sinergym already have a space of actions and observations set.

However, it can be overwritten by a new definition of them. On the one hand, we will have to define the name of the variables, and on the other hand, the definition of the spaces (and an action mapping if it is a discrete environment).

[4]:

import gym
import numpy as np

import sinergym

new_observation_variables=[
    'Site Outdoor Air Drybulb Temperature(Environment)',
    'Site Outdoor Air Relative Humidity(Environment)',
    'Site Wind Speed(Environment)',
    'Zone Thermal Comfort Fanger Model PPD(East Zone PEOPLE)',
    'Zone People Occupant Count(East Zone)',
    'People Air Temperature(East Zone PEOPLE)',
    'Facility Total HVAC Electricity Demand Rate(Whole Building)'
]

new_action_variables = [
    'West-HtgSetP-RL',
    'West-ClgSetP-RL',
    'East-HtgSetP-RL',
    'East-ClgSetP-RL'
]

new_observation_space = gym.spaces.Box(
    low=-5e6,
    high=5e6,
    shape=(len(new_observation_variables) + 4,),
    dtype=np.float32)

new_action_mapping = {
    0: (15, 30, 15, 30),
    1: (16, 29, 16, 29),
    2: (17, 28, 17, 28),
    3: (18, 27, 18, 27),
    4: (19, 26, 19, 26),
    5: (20, 25, 20, 25),
    6: (21, 24, 21, 24),
    7: (22, 23, 22, 23),
    8: (22, 22, 22, 22),
    9: (21, 21, 21, 21)
}

new_action_space = gym.spaces.Discrete(10)

env = gym.make('Eplus-datacenter-cool-discrete-stochastic-v1',
                observation_variables=new_observation_variables,
                observation_space=new_observation_space,
                action_variables=new_action_variables,
                action_mapping=new_action_mapping,
                action_space=new_action_space
            )


for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

[2022-08-24 08:55:56,850] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2022-08-24 08:55:56,850] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2022-08-24 08:55:56,852] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2022-08-24 08:55:56,853] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:Setting up extra configuration in building model if exists...
[2022-08-24 08:55:56,854] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 08:55:56,989] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-datacenter-cool-discrete-stochastic-v1-res1/Eplus-env-sub_run1

Reward:  -0.18562950733969957 {'timestep': 1, 'time_elapsed': 900, 'year': 1991, 'month': 1, 'day': 1, 'hour': 0, 'total_power': 3712.590146793991, 'total_power_no_units': -0.37125901467939915, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 1.495266997755121, 'action_': [16, 29, 16, 29]}
Reward:  -676.8466963812559 {'timestep': 2976, 'time_elapsed': 2678400, 'year': 1991, 'month': 2, 'day': 1, 'hour': 0, 'total_power': 3962.959903637626, 'total_power_no_units': -0.3962959903637626, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 6.479715547683856, 'action_': [21, 21, 21, 21]}
Reward:  -1251.9649146249287 {'timestep': 5664, 'time_elapsed': 5097600, 'year': 1991, 'month': 3, 'day': 1, 'hour': 0, 'total_power': 4984.806746157207, 'total_power_no_units': -0.49848067461572076, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 3.083224933271658, 'action_': [22, 23, 22, 23]}
Reward:  -1856.1854810210355 {'timestep': 8640, 'time_elapsed': 7776000, 'year': 1991, 'month': 4, 'day': 1, 'hour': 0, 'total_power': 2542.049288786898, 'total_power_no_units': -0.25420492887868984, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 4.548597527242591, 'action_': [15, 30, 15, 30]}
Reward:  -2639.3977262984713 {'timestep': 11520, 'time_elapsed': 10368000, 'year': 1991, 'month': 5, 'day': 1, 'hour': 0, 'total_power': 4389.048609776261, 'total_power_no_units': -0.43890486097762615, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 9.966961794846867, 'action_': [20, 25, 20, 25]}
Reward:  -3540.0364337521905 {'timestep': 14496, 'time_elapsed': 13046400, 'year': 1991, 'month': 6, 'day': 1, 'hour': 0, 'total_power': 3962.959903637626, 'total_power_no_units': -0.3962959903637626, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 9.932837191979614, 'action_': [22, 23, 22, 23]}
Reward:  -4966.8910724773705 {'timestep': 17376, 'time_elapsed': 15638400, 'year': 1991, 'month': 7, 'day': 1, 'hour': 0, 'total_power': 3641.533364330611, 'total_power_no_units': -0.36415333643306114, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 10.73759114382434, 'action_': [16, 29, 16, 29]}
Reward:  -6970.456022185655 {'timestep': 20352, 'time_elapsed': 18316800, 'year': 1991, 'month': 8, 'day': 1, 'hour': 0, 'total_power': 7437.450634930504, 'total_power_no_units': -0.7437450634930505, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 12.01780444446879, 'action_': [21, 21, 21, 21]}
Reward:  -9176.457187552995 {'timestep': 23328, 'time_elapsed': 20995200, 'year': 1991, 'month': 9, 'day': 1, 'hour': 0, 'total_power': 5205.826494047731, 'total_power_no_units': -0.5205826494047732, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 11.76682020349629, 'action_': [16, 29, 16, 29]}
Reward:  -10831.128614483414 {'timestep': 26208, 'time_elapsed': 23587200, 'year': 1991, 'month': 10, 'day': 1, 'hour': 0, 'total_power': 3632.325833964732, 'total_power_no_units': -0.3632325833964732, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 7.408750045244985, 'action_': [16, 29, 16, 29]}
Reward:  -11597.451824735388 {'timestep': 29184, 'time_elapsed': 26265600, 'year': 1991, 'month': 11, 'day': 1, 'hour': 0, 'total_power': 3545.041301484721, 'total_power_no_units': -0.3545041301484721, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 5.470256190305543, 'action_': [18, 27, 18, 27]}
Reward:  -12205.38913036408 {'timestep': 32064, 'time_elapsed': 28857600, 'year': 1991, 'month': 12, 'day': 1, 'hour': 0, 'total_power': 4171.474946040738, 'total_power_no_units': -0.41714749460407385, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': 4.252032348441579, 'action_': [22, 23, 22, 23]}
Reward:  -12852.455174221925 {'timestep': 35040, 'time_elapsed': 31536000, 'year': 1992, 'month': 1, 'day': 1, 'hour': 0, 'total_power': 5946.285403368743, 'total_power_no_units': -0.5946285403368743, 'comfort_penalty': -0.0, 'abs_comfort': 0.0, 'temperatures': [], 'out_temperature': -0.01419754187374211, 'action_': [21, 24, 21, 24]}
Episode  0 Mean reward:  -0.36679381204974576 Cumulative reward:  -12852.455174221925

[2022-08-24 08:56:26,964] EPLUS_ENV_datacenter-cool-discrete-stochastic-v1_MainThread_ROOT INFO:EnergyPlus simulation closed successfully.

In case the definition has some inconsistency, such as the IDF has not been adapted to the new actions, the spaces do not fit with the variables, the observation variables do not exist, etc. Sinergym will display an error.

Updating the action definition of the environment

As we have explained in the previous example, one of the problems that can arise when modifying the space of actions and observations is that the IDF is not adapted to the new space of actions established.

We may even want to modify the effects of actions on the building directly for some kind of interest without being subject to a change of the action space. For example, we may want to change the zones assigned to each thermostat or change their value at the start of the simulation.

For this purpose, the Sinergym action definition is available. With a dictionary we can build a definition of what we want to be controlled in the building and how to control it using the action space of the environment:

[ ]:

import gym
import numpy as np

new_action_definition={
    'ThermostatSetpoint:DualSetpoint': [{
        'name': 'West-DualSetP-RL',
        'heating_name': 'West-HtgSetP-RL',
        'cooling_name': 'West-ClgSetP-RL',
        'heating_initial_value':21.0,
        'cooling_initial_value':25.0,
        'zones': ['West Zone']
    },
        {
        'name': 'East-DualSetP-RL',
        'heating_name': 'East-HtgSetP-RL',
        'cooling_name': 'East-ClgSetP-RL',
        'heating_initial_value':21.0,
        'cooling_initial_value':25.0,
        'zones': ['East Zone']
    }]
}

env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
                action_definition=new_action_definition
                )

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

The name of the heating and cooling should be the name of action variables defined in the environment. Otherwise, Sinergym will show the inconsistency.

For more information about the format of the action definition dictionaries, visit the section called action definition.

Adding more extra configuration

You can even add a dictionary with extra parameters that update IDF with some new context concerned with simulation directly.

This new IDF version, which also adapts to the new weather you put in, is saved in the Sinergym output folder, leaving the original intact:

[ ]:

extra_conf={
    'timesteps_per_hour':6,
    'runperiod':(1,1,1991,2,1,1992),
}

env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
                config_params=extra_conf
                )

For more information about extra configuration parameters, see our Extra configuration documentation