2. Usage example
If you used our Dockerfile during installation, you should have the try_env.py file in your workspace as soon as you enter in. In case you have installed everything on your local machine directly, place it inside our cloned repository. In any case, we start from the point that you have at your disposal a terminal with the appropriate python version and Sinergym running correctly.
At this point of the documentation, we have explained how to install Sinergym, the environments it includes, the reward functions we can define, wrappers and controllers.
In this section we will see some examples of use to better understand how they can be used in practice.
If you want to execute our notebooks on your own it is possible in examples folder.
2.1. Simplest example
Let’s start with the simplest use case for the Sinergym tool. In the root repository we have the script try_env.py:
import gym
import numpy as np
import sinergym
from sinergym.utils.wrappers import LoggerWrapper
env = gym.make('Eplus-demo-v1')
env = LoggerWrapper(env)
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
The Sinergym import is really important, because without it the ID’s of our environments will not have been registered in the gym module and therefore we cannot use our buildings as gym environments.
We create our env with gym.make and we run the simulation for one episode (for i in range(1)). We collect the rewards returned by the environment and calculate their average each month of simulation.
The action taken at each step is randomly chosen from its action space defined under the gym standard. When we have finished displaying the results on the screen and the episode is finished, we close the environment with env.close().
Note
We will use this simple example as a basis and will add new elements in the following examples in this section.
2.2. Adding a reward
By default, all our environment ID’s make use of a default (linear) reward. But this reward can be changed by adding this parameter to the constructor of our environment:
import gym
import numpy as np
import sinergym
from sinergym.utils.rewards import LinearReward, ExpReward
env = gym.make('Eplus-5Zone-hot-continuous-v1', reward=ExpReward, reward_kwargs={
'temperature_variable': 'Zone Air Temperature (SPACE1-1)',
'energy_variable': 'Facility Total HVAC Electricity Demand Rate (Whole Building)',
'range_comfort_winter': (20.0, 23.5),
'range_comfort_summer': (23.0, 26.0),
'energy_weight': 0.1})
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
Warning
When specifying a different reward with gym.make than the default environment ID, it is very important to set the reward_kwargs that are required and therefore do not have a default value. In the rewards we have defined it is required: temperature_variable(s), energy_variable, range_comfort_winter, range_comfort_summer. The rest of them have default values and it is not necessary to specify.
This example is exactly the same as the previous one, except that it uses different criteria to determine the rewards in each step of the simulation. If you run the code you can see the difference in the values obtained for the reward (using a seed for randomization).
2.3. Adding wrappers
By default, the ID’s of our environments do not include any wrapper, but we can add them after the creation of the environment:
import gym
import numpy as np
import sinergym
from sinergym.utils.wrappers import LoggerWrapper, NormalizeObservation
env = gym.make('Eplus-5Zone-hot-continuous-v1')
env = NormalizeObservation(env)
env = LoggerWrapper(env)
...
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
With this, we have added normalization to the observations returned by the environment and Sinergym will also store the outputs in a CSV. For more information about how Sinergym displays its output, please visit the section Output format.
2.4. Using a rule-based controller
You can replace the random actions we have used in the previous examples with one of our rule-based controllers for that type of environment (5Zone IDF):
import gym
import numpy as np
import sinergym
from sinergym.utils.controllers import RBC5Zone
env = gym.make('Eplus-5Zone-hot-continuous-v1')
agent = RBC5Zone(env)
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = agent.act(obs)
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
Note
You can also use our rule-based controller for Datacenter called RBCDatacenter if the environment is of that type or a random agent called RandomController in every environment.
2.5. Overwriting some default values of the environments
In the same way that we can change the default reward function, as we have done in the second example, it is possible to substitute other default values of the environment ID.
You can change the weather file, the number of timesteps an action repeats (default 1), the last n episodes you want to be stored in the Sinergym output folder (default 10), the name of the environment or the variability in stochastic environments:
import gym
import numpy as np
import sinergym
from sinergym.utils.controllers import RBCDatacenter
env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
weather_file='ESP_Granada.084190_SWEC.epw',
weather_variability=(1.0,0.0,0.001),
env_name='new_env_name',
act_repeat=4,
max_ep_data_store_num = 20)
agent = RBCDatacenter(env)
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = agent.act(obs)
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
2.6. Overwriting observation and action spaces
By default, the IDs of the predefined environments in Sinergym already have a space of actions and observations set.
However, it can be overwritten by a new definition of them. On the one hand, we will have to define the name of the variables, and on the other hand, the definition of the spaces (and an action mapping if it is a discrete environment).
import gym
import numpy as np
import sinergym
new_observation_variables=[
'Site Outdoor Air Drybulb Temperature(Environment)',
'Site Outdoor Air Relative Humidity(Environment)',
'Site Wind Speed(Environment)',
'Zone Thermal Comfort Fanger Model PPD(East Zone PEOPLE)',
'Zone People Occupant Count(East Zone)',
'People Air Temperature(East Zone PEOPLE)',
'Facility Total HVAC Electricity Demand Rate(Whole Building)'
]
new_action_variables = [
'West-HtgSetP-RL',
'West-ClgSetP-RL',
'East-HtgSetP-RL',
'East-ClgSetP-RL'
]
new_observation_space = gym.spaces.Box(
low=-5e6,
high=5e6,
shape=(len(new_observation_variables) + 4,),
dtype=np.float32)
new_action_mapping = {
0: (15, 30, 15, 30),
1: (16, 29, 16, 29),
2: (17, 28, 17, 28),
3: (18, 27, 18, 27),
4: (19, 26, 19, 26),
5: (20, 25, 20, 25),
6: (21, 24, 21, 24),
7: (22, 23, 22, 23),
8: (22, 22, 22, 22),
9: (21, 21, 21, 21)
}
new_action_space = gym.spaces.Discrete(10)
env = gym.make('Eplus-datacenter-cool-discrete-stochastic-v1',
observation_variables=new_observation_variables,
observation_space=new_observation_space,
action_variables=new_action_variables,
action_mapping=new_action_mapping,
action_space=new_action_space
)
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
In case the definition has some inconsistency, such as the IDF has not been adapted to the new actions, the spaces do not fit with the variables, the observation variables do not exist, etc. Sinergym will display an error.
2.7. Adding a new action definition
As we have explained in the previous example, one of the problems that can arise when modifying the space of actions and observations is that the IDF is not adapted to the new space of actions established.
We may even want to modify the effects of actions on the building directly for some kind of interest without being subject to a change of the action space. For example, we may want to change the zones assigned to each thermostat or change their value at the start of the simulation.
For this purpose, the Sinergym action definition is available. With a dictionary we can build a definition of what we want to be controlled in the building and how to control it using the action space of the environment:
import gym
import numpy as np
import sinergym
new_action_definition={
'ThermostatSetpoint:DualSetpoint': [{
'name': 'West-DualSetP-RL',
'heating_name': 'West-HtgSetP-RL',
'cooling_name': 'West-ClgSetP-RL',
'heating_initial_value':21.0,
'cooling_initial_value':25.0,
'zones': ['West Zone']
},
{
'name': 'East-DualSetP-RL',
'heating_name': 'East-HtgSetP-RL',
'cooling_name': 'East-ClgSetP-RL',
'heating_initial_value':21.0,
'cooling_initial_value':25.0,
'zones': ['East Zone']
}]
}
env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
action_definition=new_action_definition
)
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
The name of the heating and cooling should be the name of action variables defined in the environment. Otherwise, Sinergym will show the inconsistency.
For more information about the format of the action definition dictionaries, visit the section Action definition.
2.8. Adding extra configuration definition
You can even add a dictionary with extra parameters that modify the IDF you use before it is used in the simulations (or overwrite an existing one).
This new IDF version, which also adapts to the new weather you put in, is saved in the Sinergym output folder, leaving the original intact:
import gym
import numpy as np
import sinergym
extra_conf={
'timesteps_per_hour':6,
'runperiod':(1,1,1991,2,1,1992)
}
env = gym.make('Eplus-datacenter-cool-continuous-stochastic-v1',
config_params=extra_conf
)
for i in range(1):
obs = env.reset()
rewards = []
done = False
current_month = 0
while not done:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
env.close()
Note
For more information on how each of the elements explained here works, please see the appropriate section.
Note
To see how Sinergym can be combined with DRL algorithms, please visit section Deep Reinforcement Learning Integration of our documentation (specifically the DRL_battery.py script in section How use).
Note
Our team provide several notebooks with more functionality and examples, visit examples section.