Rule-based controller example

Let’s try a simple RBC in Sinergym environment.

First, we import all the necessary libraries. Remember to always import sinergym, even if it appears unused, as it is needed to define the environments.

[8]:

from typing import List, Any, Sequence
from sinergym.utils.constants import YEAR
from datetime import datetime

import gymnasium as gym
import numpy as np
import sinergym

Now we can define the environment:

[9]:

env = gym.make('Eplus-5zone-hot-continuous-v1')

#==============================================================================================#
[ENVIRONMENT] (INFO) : Creating Gymnasium environment.
[ENVIRONMENT] (INFO) : Name: Eplus-5zone-hot-continuous-v1
#==============================================================================================#
[MODEL] (INFO) : Working directory created: /workspaces/sinergym/examples/Eplus-5zone-hot-continuous-v1-res1
[MODEL] (INFO) : Model Config is correct.
[MODEL] (INFO) : Building model Output:Variable updated with defined variable names.
[MODEL] (INFO) : Updated building model Output:Meter with meter names.
[MODEL] (INFO) : Runperiod established.
[MODEL] (INFO) : Episode length (seconds): 31536000.0
[MODEL] (INFO) : timestep size (seconds): 900.0
[MODEL] (INFO) : timesteps per episode: 35040
[REWARD] (INFO) : Reward function initialized.
[ENVIRONMENT] (INFO) : Environment created successfully.

You should check out the available list of pre-defined RBCs.

In this example, we extend the pre-defined controller by defining a custom act method:

[10]:

from sinergym.utils.controllers import RBC5Zone


class MyRuleBasedController(RBC5Zone):

    def act(self, observation: List[Any]) -> Sequence[Any]:
        """Select action based on outdoor air drybulb temperature and daytime.

        Args:
            observation (List[Any]): Perceived observation.

        Returns:
            Sequence[Any]: Action chosen.
        """
        obs_dict = dict(zip(self.env.get_wrapper_attr(
            'observation_variables'), observation))

        out_temp = obs_dict['outdoor_temperature']

        day = int(obs_dict['day_of_month'])
        month = int(obs_dict['month'])
        hour = int(obs_dict['hour'])
        year = int(obs_dict['year'] if obs_dict.get('year', False) else YEAR)

        summer_start_date = datetime(year, 6, 1)
        summer_final_date = datetime(year, 9, 30)

        current_dt = datetime(year, month, day)

        # Get season comfort range
        if current_dt >= summer_start_date and current_dt <= summer_final_date:
            season_comfort_range = self.setpoints_summer
        else:
            season_comfort_range = self.setpoints_summer
        season_comfort_range = self.setpoints_winter
        # Update setpoints
        in_temp = obs_dict['air_temperature']

        current_heat_setpoint = obs_dict[
            'htg_setpoint']
        current_cool_setpoint = obs_dict[
            'clg_setpoint']

        new_heat_setpoint = current_heat_setpoint
        new_cool_setpoint = current_cool_setpoint

        if in_temp < season_comfort_range[0]:
            new_heat_setpoint = current_heat_setpoint + 1
            new_cool_setpoint = current_cool_setpoint + 1
        elif in_temp > season_comfort_range[1]:
            new_cool_setpoint = current_cool_setpoint - 1
            new_heat_setpoint = current_heat_setpoint - 1

        # Clip setpoints to the action space
        if new_heat_setpoint > self.env.get_wrapper_attr('action_space').high[0]:
            new_heat_setpoint = self.env.get_wrapper_attr(
                'action_space').high[0]
        if new_heat_setpoint < self.env.get_wrapper_attr('action_space').low[0]:
            new_heat_setpoint = self.env.get_wrapper_attr(
                'action_space').low[0]
        if new_cool_setpoint > self.env.get_wrapper_attr('action_space').high[1]:
            new_cool_setpoint = self.env.get_wrapper_attr(
                'action_space').high[1]
        if new_cool_setpoint < self.env.get_wrapper_attr('action_space').low[1]:
            new_cool_setpoint = self.env.get_wrapper_attr(
                'action_space').low[1]

        action = (new_heat_setpoint, new_cool_setpoint)
        if current_dt.weekday() > 5 or hour in range(22, 6):
            # Weekend or night
            action = (18.33, 23.33)

        return np.array(action, dtype=np.float32)

Note: ``action`` must be a NumPy array for compatibility with the Gymnasium API.

Now that our controller is ready, we can use it as follows:

[11]:

# Create rule-based controller
agent = MyRuleBasedController(env)

for i in range(1):
    obs, info = env.reset()
    rewards = []
    truncated = terminated = False
    current_month = 0
while not (terminated or truncated):
    action = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(action)
    rewards.append(reward)
    if info['month'] != current_month:  # display results every month
        current_month = info['month']
        print('Reward: ', sum(rewards), info)
print(
    'Episode ',
    i,
    'Mean reward: ',
    np.mean(rewards),
    'Cumulative reward: ',
    sum(rewards))

#----------------------------------------------------------------------------------------------#
[ENVIRONMENT] (INFO) : Starting a new episode.
[ENVIRONMENT] (INFO) : Episode 1: Eplus-5zone-hot-continuous-v1
#----------------------------------------------------------------------------------------------#
[MODEL] (INFO) : Episode directory created.
[MODEL] (INFO) : Weather file USA_AZ_Davis-Monthan.AFB.722745_TMY3.epw used.
[MODEL] (INFO) : Adapting weather to building model.
[ENVIRONMENT] (INFO) : Saving episode output path in /workspaces/sinergym/examples/Eplus-5zone-hot-continuous-v1-res1/episode-1/output.
[SIMULATOR] (INFO) : handlers initialized.
[SIMULATOR] (INFO) : handlers are ready.
[SIMULATOR] (INFO) : System is ready.
[ENVIRONMENT] (INFO) : Episode 1 started.
Reward:  -0.10041572388949849 {'time_elapsed(hours)': 0.5, 'month': 1, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [13.800000190734863, 30.0], 'timestep': 1, 'reward': -0.10041572388949849, 'energy_term': -0.005897377870609513, 'comfort_term': -0.09451834601888898, 'energy_penalty': -117.94755741219025, 'comfort_penalty': -0.18903669203777795, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0.18903669203777795, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  10%|█         | 10/100 [00:01<00:10,  8.40%/s, 10% completed]  Reward:  -358.955216712506 {'time_elapsed(hours)': 744.25, 'month': 2, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 2976, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  17%|█▋        | 17/100 [00:01<00:09,  8.48%/s, 17% completed]Reward:  -575.246118212152 {'time_elapsed(hours)': 1416.25, 'month': 3, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 5664, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  26%|██▌       | 26/100 [00:02<00:08,  8.60%/s, 26% completed]Reward:  -851.3357056834082 {'time_elapsed(hours)': 2160.25, 'month': 4, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 8640, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  34%|███▍      | 34/100 [00:04<00:09,  6.72%/s, 34% completed]Reward:  -1189.0703800741016 {'time_elapsed(hours)': 2880.25, 'month': 5, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 11520, 'reward': -0.009862351778960955, 'energy_term': -0.009862351778960955, 'comfort_term': 0.0, 'energy_penalty': -197.24703557921907, 'comfort_penalty': 0, 'total_power_demand': 197.24703557921907, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  42%|████▏     | 42/100 [00:05<00:08,  6.71%/s, 42% completed]Reward:  -1594.6394910495567 {'time_elapsed(hours)': 3624.25, 'month': 6, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 14496, 'reward': -0.44330201860495905, 'energy_term': -0.0331394509149559, 'comfort_term': -0.41016256769000314, 'energy_penalty': -662.7890182991179, 'comfort_penalty': -0.8203251353800063, 'total_power_demand': 662.7890182991179, 'total_temperature_violation': 0.8203251353800063, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  51%|█████     | 51/100 [00:06<00:07,  6.84%/s, 51% completed]Reward:  -2738.7219838962233 {'time_elapsed(hours)': 4344.25, 'month': 7, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 17376, 'reward': -0.21678233374921216, 'energy_term': -0.033327249345369646, 'comfort_term': -0.1834550844038425, 'energy_penalty': -666.5449869073929, 'comfort_penalty': -0.366910168807685, 'total_power_demand': 666.5449869073929, 'total_temperature_violation': 0.366910168807685, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  59%|█████▉    | 59/100 [00:07<00:05,  6.93%/s, 59% completed]Reward:  -3936.6529299828426 {'time_elapsed(hours)': 5088.25, 'month': 8, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 20352, 'reward': -0.44443299855469764, 'energy_term': -0.03655363190508797, 'comfort_term': -0.40787936664960966, 'energy_penalty': -731.0726381017595, 'comfort_penalty': -0.8157587332992193, 'total_power_demand': 731.0726381017595, 'total_temperature_violation': 0.8157587332992193, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  67%|██████▋   | 67/100 [00:09<00:04,  6.83%/s, 67% completed]Reward:  -5124.55480713915 {'time_elapsed(hours)': 5832.25, 'month': 9, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 23328, 'reward': -0.29726607311779185, 'energy_term': -0.02991704399019792, 'comfort_term': -0.26734902912759395, 'energy_penalty': -598.3408798039584, 'comfort_penalty': -0.5346980582551879, 'total_power_demand': 598.3408798039584, 'total_temperature_violation': 0.5346980582551879, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  76%|███████▌  | 76/100 [00:10<00:03,  7.35%/s, 76% completed]Reward:  -6370.638753750336 {'time_elapsed(hours)': 6552.25, 'month': 10, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 26208, 'reward': -0.03178420234234498, 'energy_term': -0.03178420234234498, 'comfort_term': 0.0, 'energy_penalty': -635.6840468468996, 'comfort_penalty': 0, 'total_power_demand': 635.6840468468996, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  84%|████████▍ | 84/100 [00:11<00:02,  7.74%/s, 84% completed]Reward:  -6665.020396418986 {'time_elapsed(hours)': 7296.25, 'month': 11, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 29184, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]:  92%|█████████▏| 92/100 [00:12<00:01,  7.82%/s, 92% completed]Reward:  -6932.478810099446 {'time_elapsed(hours)': 8016.25, 'month': 12, 'day': 1, 'hour': 0, 'is_raining': False, 'action': [18.329999923706055, 23.329999923706055], 'timestep': 32064, 'reward': -0.005897377870609513, 'energy_term': -0.005897377870609513, 'comfort_term': 0.0, 'energy_penalty': -117.94755741219025, 'comfort_penalty': 0, 'total_power_demand': 117.94755741219025, 'total_temperature_violation': 0, 'reward_weight': 0.5}
Simulation Progress [Episode 1]: 100%|██████████| 100/100 [00:13<00:00,  6.41%/s, 100% completed]Episode  0 Mean reward:  -0.20798321705377112 Cumulative reward:  -7287.73192556414

Always remember to close the environment:

[7]:

env.close()

Simulation Progress [Episode 2]: 100%|██████████| 100/100 [00:19<00:00,  5.04%/s, 100% completed]
[ENVIRONMENT] (INFO) : Environment closed. [Eplus-5zone-hot-continuous-v1]

For more information about pre-defined controllers and how to create custom ones, visit the corresponding documentation.