2. Usage example

If you used our Dockerfile during installation, you should have the try_env.py file in your workspace as soon as you enter in. In case you have installed everything on your local machine directly, place it inside our cloned repository. In any case, we start from the point that you have at your disposal a terminal with the appropriate python version and Sinergym running correctly.

At this point of the documentation, we have explained how to install Sinergym, the environments it includes, the reward functions we can define, wrappers and controllers.

In this section we will see some examples of use to better understand how they can be used in practice.

If you want to execute our notebooks on your own it is possible in examples folder.

2.1. Simplest example

Let’s start with the simplest use case for the Sinergym tool. In the root repository we have the script try_env.py:

import gym
import numpy as np

import sinergym
from sinergym.utils.wrappers import LoggerWrapper

env = gym.make('Eplus-demo-v1')
env = LoggerWrapper(env)

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

The Sinergym import is really important, because without it the ID’s of our environments will not have been registered in the gym module and therefore we cannot use our buildings as gym environments.

We create our env with gym.make and we run the simulation for one episode (for i in range(1)). We collect the rewards returned by the environment and calculate their average each month of simulation.

The action taken at each step is randomly chosen from its action space defined under the gym standard. When we have finished displaying the results on the screen and the episode is finished, we close the environment with env.close().

Note

We will use this simple example as a basis and will add new elements in the following examples in this section.

2.2. Adding a reward

By default, all our environment ID’s make use of a default (linear) reward. But this reward can be changed by adding this parameter to the constructor of our environment:

import gym
import numpy as np

import sinergym
from sinergym.utils.rewards import LinearReward, ExpReward

env = gym.make('Eplus-5Zone-hot-continuous-v1', reward=ExpReward, reward_kwargs={
                                                                    'temperature_variable': 'Zone Air Temperature (SPACE1-1)',
                                                                    'energy_variable': 'Facility Total HVAC Electricity Demand Rate (Whole Building)',
                                                                    'range_comfort_winter': (20.0, 23.5),
                                                                    'range_comfort_summer': (23.0, 26.0),
                                                                    'energy_weight': 0.1})
for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

Warning

When specifying a different reward with gym.make than the default environment ID, it is very important to set the reward_kwargs that are required and therefore do not have a default value. In the rewards we have defined it is required: temperature_variable(s), energy_variable, range_comfort_winter, range_comfort_summer. The rest of them have default values and it is not necessary to specify.

This example is exactly the same as the previous one, except that it uses different criteria to determine the rewards in each step of the simulation. If you run the code you can see the difference in the values obtained for the reward (using a seed for randomization).

2.3. Adding wrappers

By default, the ID’s of our environments do not include any wrapper, but we can add them after the creation of the environment:

import gym
import numpy as np

import sinergym
from sinergym.utils.wrappers import LoggerWrapper, NormalizeObservation

env = gym.make('Eplus-5Zone-hot-continuous-v1')
env = NormalizeObservation(env)
env = LoggerWrapper(env)
...

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

With this, we have added normalization to the observations returned by the environment and Sinergym will also store the outputs in a CSV. For more information about how Sinergym displays its output, please visit the section Output format.

2.4. Using a rule-based controller

You can replace the random actions we have used in the previous examples with one of our rule-based controllers for that type of environment (5Zone IDF):

import gym
import numpy as np

import sinergym
from sinergym.utils.controllers import RBC5Zone

env = gym.make('Eplus-5Zone-hot-continuous-v1')

agent = RBC5Zone(env)

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = agent.act(obs)
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

Note

You can also use our rule-based controller for Datacenter called RBCDatacenter if the environment is of that type or a random agent called RandomController in every environment.

2.5. Overwriting some default values of the environments

In the same way that we can change the default reward function, as we have done in the second example, it is possible to substitute other default values of the environment ID.

You can change the weather file, the number of timesteps an action repeats (default 1), the last n episodes you want to be stored in the Sinergym output folder (default 10), the name of the environment or the variability in stochastic environments:

import gym
import numpy as np

import sinergym
from sinergym.utils.controllers import RBCDatacenter

env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
                weather_file='ESP_Granada.084190_SWEC.epw',
                weather_variability=(1.0,0.0,0.001),
                env_name='new_env_name',
                act_repeat=4,
                max_ep_data_store_num = 20)

agent = RBCDatacenter(env)

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = agent.act(obs)
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

2.6. Overwriting observation and action spaces

By default, the IDs of the predefined environments in Sinergym already have a space of actions and observations set.

However, it can be overwritten by a new definition of them. On the one hand, we will have to define the name of the variables, and on the other hand, the definition of the spaces (and an action mapping if it is a discrete environment).

import gym
import numpy as np

import sinergym

new_observation_variables=[
    'Site Outdoor Air Drybulb Temperature(Environment)',
    'Site Outdoor Air Relative Humidity(Environment)',
    'Site Wind Speed(Environment)',
    'Zone Thermal Comfort Fanger Model PPD(East Zone PEOPLE)',
    'Zone People Occupant Count(East Zone)',
    'People Air Temperature(East Zone PEOPLE)',
    'Facility Total HVAC Electricity Demand Rate(Whole Building)'
]

new_action_variables = [
    'West-HtgSetP-RL',
    'West-ClgSetP-RL',
    'East-HtgSetP-RL',
    'East-ClgSetP-RL'
]

new_observation_space = gym.spaces.Box(
    low=-5e6,
    high=5e6,
    shape=(len(new_observation_variables) + 4,),
    dtype=np.float32)

new_action_mapping = {
    0: (15, 30, 15, 30),
    1: (16, 29, 16, 29),
    2: (17, 28, 17, 28),
    3: (18, 27, 18, 27),
    4: (19, 26, 19, 26),
    5: (20, 25, 20, 25),
    6: (21, 24, 21, 24),
    7: (22, 23, 22, 23),
    8: (22, 22, 22, 22),
    9: (21, 21, 21, 21)
}

new_action_space = gym.spaces.Discrete(10)

env = gym.make('Eplus-datacenter-cool-discrete-stochastic-v1',
               observation_variables=new_observation_variables,
               observation_space=new_observation_space,
               action_variables=new_action_variables,
               action_mapping=new_action_mapping,
               action_space=new_action_space
            )


for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

In case the definition has some inconsistency, such as the IDF has not been adapted to the new actions, the spaces do not fit with the variables, the observation variables do not exist, etc. Sinergym will display an error.

2.7. Adding a new action definition

As we have explained in the previous example, one of the problems that can arise when modifying the space of actions and observations is that the IDF is not adapted to the new space of actions established.

We may even want to modify the effects of actions on the building directly for some kind of interest without being subject to a change of the action space. For example, we may want to change the zones assigned to each thermostat or change their value at the start of the simulation.

For this purpose, the Sinergym action definition is available. With a dictionary we can build a definition of what we want to be controlled in the building and how to control it using the action space of the environment:

import gym
import numpy as np

import sinergym

new_action_definition={
    'ThermostatSetpoint:DualSetpoint': [{
        'name': 'West-DualSetP-RL',
        'heating_name': 'West-HtgSetP-RL',
        'cooling_name': 'West-ClgSetP-RL',
        'heating_initial_value':21.0,
        'cooling_initial_value':25.0,
        'zones': ['West Zone']
    },
        {
        'name': 'East-DualSetP-RL',
        'heating_name': 'East-HtgSetP-RL',
        'cooling_name': 'East-ClgSetP-RL',
        'heating_initial_value':21.0,
        'cooling_initial_value':25.0,
        'zones': ['East Zone']
    }]
}

env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
                action_definition=new_action_definition
                )

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

The name of the heating and cooling should be the name of action variables defined in the environment. Otherwise, Sinergym will show the inconsistency.

For more information about the format of the action definition dictionaries, visit the section Action definition.

2.8. Adding extra configuration definition

You can even add a dictionary with extra parameters that modify the IDF you use before it is used in the simulations (or overwrite an existing one).

This new IDF version, which also adapts to the new weather you put in, is saved in the Sinergym output folder, leaving the original intact:

import gym
import numpy as np

import sinergym

extra_conf={
    'timesteps_per_hour':6,
    'runperiod':(1,1,1991,2,1,1992)
}

env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
                config_params=extra_conf
                )

for i in range(1):
    obs = env.reset()
    rewards = []
    done = False
    current_month = 0
    while not done:
        a = env.action_space.sample()
        obs, reward, done, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:  # display results every month
            current_month = info['month']
            print('Reward: ', sum(rewards), info)
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

Note

For more information on how each of the elements explained here works, please see the appropriate section.

Note

To see how Sinergym can be combined with DRL algorithms, please visit section Deep Reinforcement Learning Integration of our documentation (specifically the DRL_battery.py script in section How use).

Note

Our team provide several notebooks with more functionality and examples, visit examples section.