Rule-based controller example
Let’s try a simple RBC in Sinergym environment.
First, we import all the necessary libraries. Remember to always import sinergym
, even if it appears unused, as it is needed to define the environments.
[8]:
from typing import List, Any, Sequence
from sinergym.utils.constants import YEAR
from datetime import datetime
import gymnasium as gym
import numpy as np
import sinergym
Now we can define the environment:
[9]:
env = gym.make('Eplus-5zone-hot-continuous-v1')
#==============================================================================================#
[ENVIRONMENT] (INFO) : Creating Gymnasium environment.
[ENVIRONMENT] (INFO) : Name: Eplus-5zone-hot-continuous-v1
#==============================================================================================#
[MODEL] (INFO) : Working directory created: /workspaces/sinergym/examples/Eplus-5zone-hot-continuous-v1-res1
[MODEL] (INFO) : Model Config is correct.
[MODEL] (INFO) : Building model Output:Variable updated with defined variable names.
[MODEL] (INFO) : Updated building model Output:Meter with meter names.
[MODEL] (INFO) : Runperiod established.
[MODEL] (INFO) : Episode length (seconds): 31536000.0
[MODEL] (INFO) : timestep size (seconds): 900.0
[MODEL] (INFO) : timesteps per episode: 35040
[REWARD] (INFO) : Reward function initialized.
[ENVIRONMENT] (INFO) : Environment created successfully.
You should check out the available list of pre-defined RBCs.
In this example, we extend the pre-defined controller by defining a custom act
method:
[10]:
from sinergym.utils.controllers import RBC5Zone
class MyRuleBasedController(RBC5Zone):
def act(self, observation: List[Any]) -> Sequence[Any]:
"""Select action based on outdoor air drybulb temperature and daytime.
Args:
observation (List[Any]): Perceived observation.
Returns:
Sequence[Any]: Action chosen.
"""
obs_dict = dict(zip(self.env.get_wrapper_attr(
'observation_variables'), observation))
out_temp = obs_dict['outdoor_temperature']
day = int(obs_dict['day_of_month'])
month = int(obs_dict['month'])
hour = int(obs_dict['hour'])
year = int(obs_dict['year'] if obs_dict.get('year', False) else YEAR)
summer_start_date = datetime(year, 6, 1)
summer_final_date = datetime(year, 9, 30)
current_dt = datetime(year, month, day)
# Get season comfort range
if current_dt >= summer_start_date and current_dt <= summer_final_date:
season_comfort_range = self.setpoints_summer
else:
season_comfort_range = self.setpoints_summer
season_comfort_range = self.setpoints_winter
# Update setpoints
in_temp = obs_dict['air_temperature']
current_heat_setpoint = obs_dict[
'htg_setpoint']
current_cool_setpoint = obs_dict[
'clg_setpoint']
new_heat_setpoint = current_heat_setpoint
new_cool_setpoint = current_cool_setpoint
if in_temp < season_comfort_range[0]:
new_heat_setpoint = current_heat_setpoint + 1
new_cool_setpoint = current_cool_setpoint + 1
elif in_temp > season_comfort_range[1]:
new_cool_setpoint = current_cool_setpoint - 1
new_heat_setpoint = current_heat_setpoint - 1
# Clip setpoints to the action space
if new_heat_setpoint > self.env.get_wrapper_attr('action_space').high[0]:
new_heat_setpoint = self.env.get_wrapper_attr(
'action_space').high[0]
if new_heat_setpoint < self.env.get_wrapper_attr('action_space').low[0]:
new_heat_setpoint = self.env.get_wrapper_attr(
'action_space').low[0]
if new_cool_setpoint > self.env.get_wrapper_attr('action_space').high[1]:
new_cool_setpoint = self.env.get_wrapper_attr(
'action_space').high[1]
if new_cool_setpoint < self.env.get_wrapper_attr('action_space').low[1]:
new_cool_setpoint = self.env.get_wrapper_attr(
'action_space').low[1]
action = (new_heat_setpoint, new_cool_setpoint)
if current_dt.weekday() > 5 or hour in range(22, 6):
# Weekend or night
action = (18.33, 23.33)
return np.array(action, dtype=np.float32)
Note: ``action`` must be a NumPy array for compatibility with the Gymnasium API.
Now that our controller is ready, we can use it as follows:
[11]:
# Create rule-based controller
agent = MyRuleBasedController(env)
for i in range(1):
obs, info = env.reset()
rewards = []
truncated = terminated = False
current_month = 0
while not (terminated or truncated):
action = agent.act(obs)
obs, reward, terminated, truncated, info = env.step(action)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
print(
'Episode ',
i,
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
#----------------------------------------------------------------------------------------------#
[ENVIRONMENT] (INFO) : Starting a new episode.
[ENVIRONMENT] (INFO) : Episode 1: Eplus-5zone-hot-continuous-v1
#----------------------------------------------------------------------------------------------#
[MODEL] (INFO) : Episode directory created.
[MODEL] (INFO) : Weather file USA_AZ_Davis-Monthan.AFB.722745_TMY3.epw used.
[MODEL] (INFO) : Adapting weather to building model.
[ENVIRONMENT] (INFO) : Saving episode output path in /workspaces/sinergym/examples/Eplus-5zone-hot-continuous-v1-res1/episode-1/output.
[SIMULATOR] (INFO) : handlers initialized.
[SIMULATOR] (INFO) : handlers are ready.
[SIMULATOR] (INFO) : System is ready.
[ENVIRONMENT] (INFO) : Episode 1 started.
Reward: -0.10041572388949849 {'time_elapsed(hours)': 0.5, 'month': 1, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [13.800000190734863, 30.0], 'timestep': 1, 'reward': -0.10041572388949849, 'energy_term': -0.005897377870609513, 'comfort_term': -0.09451834601888898, 'energy_penalty': -117.94755741219025, 'comfort_penalty': -0.18903669203777795, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0.18903669203777795, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 10%|█ | 10/100 [00:01<00:10, 8.40%/s, 10% completed] Reward: -358.955216712506 {'time_elapsed(hours)': 744.25, 'month': 2, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 2976, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 17%|█▋ | 17/100 [00:01<00:09, 8.48%/s, 17% completed]Reward: -575.246118212152 {'time_elapsed(hours)': 1416.25, 'month': 3, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 5664, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 26%|██▌ | 26/100 [00:02<00:08, 8.60%/s, 26% completed]Reward: -851.3357056834082 {'time_elapsed(hours)': 2160.25, 'month': 4, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 8640, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 34%|███▍ | 34/100 [00:04<00:09, 6.72%/s, 34% completed]Reward: -1189.0703800741016 {'time_elapsed(hours)': 2880.25, 'month': 5, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 11520, 'reward': -0.009862351778960955, 'energy_term': -0.009862351778960955, 'comfort_term': 0.0, 'energy_penalty': -197.24703557921907, 'comfort_penalty': 0, 'total_power_demand': 197.24703557921907, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 42%|████▏ | 42/100 [00:05<00:08, 6.71%/s, 42% completed]Reward: -1594.6394910495567 {'time_elapsed(hours)': 3624.25, 'month': 6, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 14496, 'reward': -0.44330201860495905, 'energy_term': -0.0331394509149559, 'comfort_term': -0.41016256769000314, 'energy_penalty': -662.7890182991179, 'comfort_penalty': -0.8203251353800063, 'total_power_demand': 662.7890182991179, 'total_temperature_violation': 0.8203251353800063, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 51%|█████ | 51/100 [00:06<00:07, 6.84%/s, 51% completed]Reward: -2738.7219838962233 {'time_elapsed(hours)': 4344.25, 'month': 7, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 17376, 'reward': -0.21678233374921216, 'energy_term': -0.033327249345369646, 'comfort_term': -0.1834550844038425, 'energy_penalty': -666.5449869073929, 'comfort_penalty': -0.366910168807685, 'total_power_demand': 666.5449869073929, 'total_temperature_violation': 0.366910168807685, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 59%|█████▉ | 59/100 [00:07<00:05, 6.93%/s, 59% completed]Reward: -3936.6529299828426 {'time_elapsed(hours)': 5088.25, 'month': 8, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 20352, 'reward': -0.44443299855469764, 'energy_term': -0.03655363190508797, 'comfort_term': -0.40787936664960966, 'energy_penalty': -731.0726381017595, 'comfort_penalty': -0.8157587332992193, 'total_power_demand': 731.0726381017595, 'total_temperature_violation': 0.8157587332992193, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 67%|██████▋ | 67/100 [00:09<00:04, 6.83%/s, 67% completed]Reward: -5124.55480713915 {'time_elapsed(hours)': 5832.25, 'month': 9, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 23328, 'reward': -0.29726607311779185, 'energy_term': -0.02991704399019792, 'comfort_term': -0.26734902912759395, 'energy_penalty': -598.3408798039584, 'comfort_penalty': -0.5346980582551879, 'total_power_demand': 598.3408798039584, 'total_temperature_violation': 0.5346980582551879, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 76%|███████▌ | 76/100 [00:10<00:03, 7.35%/s, 76% completed]Reward: -6370.638753750336 {'time_elapsed(hours)': 6552.25, 'month': 10, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 26208, 'reward': -0.03178420234234498, 'energy_term': -0.03178420234234498, 'comfort_term': 0.0, 'energy_penalty': -635.6840468468996, 'comfort_penalty': 0, 'total_power_demand': 635.6840468468996, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 84%|████████▍ | 84/100 [00:11<00:02, 7.74%/s, 84% completed]Reward: -6665.020396418986 {'time_elapsed(hours)': 7296.25, 'month': 11, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 29184, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 92%|█████████▏| 92/100 [00:12<00:01, 7.82%/s, 92% completed]Reward: -6932.478810099446 {'time_elapsed(hours)': 8016.25, 'month': 12, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 32064, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 100%|██████████| 100/100 [00:13<00:00, 6.41%/s, 100% completed]Episode 0 Mean reward: -0.20798321705377112 Cumulative reward: -7287.73192556414
Always remember to close the environment:
[7]:
env.close()
Simulation Progress [Episode 2]: 100%|██████████| 100/100 [00:19<00:00, 5.04%/s, 100% completed]
[ENVIRONMENT] (INFO) : Environment closed. [Eplus-5zone-hot-continuous-v1]
For more information about pre-defined controllers and how to create custom ones, visit the corresponding documentation.