Rule-based controller example

Let’s try a simple RBC in Sinergym environment.

First, we import all the necessary libraries. Remember to always import sinergym, even if it appears unused, as it is needed to define the environments.

[ ]:
from typing import List, Any, Sequence
from sinergym.utils.constants import YEAR
from datetime import datetime

import gymnasium as gym
import numpy as np
import sinergym

Now we can define the environment:

[2]:
env = gym.make('Eplus-5zone-hot-continuous-v1')
#==============================================================================================#
[ENVIRONMENT] (INFO) : Creating Gymnasium environment.
[ENVIRONMENT] (INFO) : Name: 5zone-hot-continuous-v1
#==============================================================================================#
[MODELING] (INFO) : Experiment working directory created.
[MODELING] (INFO) : Working directory: /workspaces/sinergym/examples/Eplus-env-5zone-hot-continuous-v1-res1
[MODELING] (INFO) : Model Config is correct.
[MODELING] (INFO) : Update building model Output:Variable with variable names.
[MODELING] (INFO) : Update building model Output:Meter with meter names.
[MODELING] (INFO) : Runperiod established.
[MODELING] (INFO) : Episode length (seconds): 31536000.0
[MODELING] (INFO) : timestep size (seconds): 900.0
[MODELING] (INFO) : timesteps per episode: 35040
[REWARD] (INFO) : Reward function initialized.
[ENVIRONMENT] (INFO) : Environment created successfully.

You should check out the available list of pre-defined RBCs.

In this example, we extend the pre-defined controller by defining a custom act method:

[ ]:
from sinergym.utils.controllers import RBC5Zone


class MyRuleBasedController(RBC5Zone):

    def act(self, observation: List[Any]) -> Sequence[Any]:
        """Select action based on outdoor air drybulb temperature and daytime.

        Args:
            observation (List[Any]): Perceived observation.

        Returns:
            Sequence[Any]: Action chosen.
        """
        obs_dict = dict(zip(self.env.get_wrapper_attr(
            'observation_variables'), observation))

        out_temp = obs_dict['outdoor_temperature']

        day = int(obs_dict['day_of_month'])
        month = int(obs_dict['month'])
        hour = int(obs_dict['hour'])
        year = int(obs_dict['year'] if obs_dict.get('year', False) else YEAR)

        summer_start_date = datetime(year, 6, 1)
        summer_final_date = datetime(year, 9, 30)

        current_dt = datetime(year, month, day)

        # Get season comfort range
        if current_dt >= summer_start_date and current_dt <= summer_final_date:
            season_comfort_range = self.setpoints_summer
        else:
            season_comfort_range = self.setpoints_summer
        season_comfort_range = self.setpoints_winter
        # Update setpoints
        in_temp = obs_dict['air_temperature']

        current_heat_setpoint = obs_dict[
            'htg_setpoint']
        current_cool_setpoint = obs_dict[
            'clg_setpoint']

        new_heat_setpoint = current_heat_setpoint
        new_cool_setpoint = current_cool_setpoint

        if in_temp < season_comfort_range[0]:
            new_heat_setpoint = current_heat_setpoint + 1
            new_cool_setpoint = current_cool_setpoint + 1
        elif in_temp > season_comfort_range[1]:
            new_cool_setpoint = current_cool_setpoint - 1
            new_heat_setpoint = current_heat_setpoint - 1

        # Clip setpoints to the action space
        if new_heat_setpoint > self.env.get_wrapper_attr('action_space').high[0]:
            new_heat_setpoint = self.env.get_wrapper_attr(
                'action_space').high[0]
        if new_heat_setpoint < self.env.get_wrapper_attr('action_space').low[0]:
            new_heat_setpoint = self.env.get_wrapper_attr(
                'action_space').low[0]
        if new_cool_setpoint > self.env.get_wrapper_attr('action_space').high[1]:
            new_cool_setpoint = self.env.get_wrapper_attr(
                'action_space').high[1]
        if new_cool_setpoint < self.env.get_wrapper_attr('action_space').low[1]:
            new_cool_setpoint = self.env.get_wrapper_attr(
                'action_space').low[1]

        action = (new_heat_setpoint, new_cool_setpoint)
        if current_dt.weekday() > 5 or hour in range(22, 6):
            # Weekend or night
            action = (18.33, 23.33)

        return action

Now that our controller is ready, we can use it as follows:

[ ]:

# Create rule-based controller agent = MyRuleBasedController(env) for i in range(1): obs, info = env.reset() rewards = [] truncated = terminated = False current_month = 0 while not (terminated or truncated): action = agent.act(obs) obs, reward, terminated, truncated, info = env.step(action) rewards.append(reward) if info['month'] != current_month: # display results every month current_month = info['month'] print('Reward: ', sum(rewards), info) print( 'Episode ', i, 'Mean reward: ', np.mean(rewards), 'Cumulative reward: ', sum(rewards))
#----------------------------------------------------------------------------------------------#
[ENVIRONMENT] (INFO) : Starting a new episode.
[ENVIRONMENT] (INFO) : Episode 1: 5zone-hot-continuous-v1
#----------------------------------------------------------------------------------------------#
[MODELING] (INFO) : Episode directory created.
[MODELING] (INFO) : Weather file USA_AZ_Davis-Monthan.AFB.722745_TMY3.epw used.
[MODELING] (INFO) : Adapting weather to building model.
[ENVIRONMENT] (INFO) : Saving episode output path.
[ENVIRONMENT] (INFO) : Episode 1 started.
[SIMULATOR] (INFO) : handlers initialized.
[SIMULATOR] (INFO) : handlers are ready.
[SIMULATOR] (INFO) : System is ready.
Reward:  -0.10122987987606541 {'time_elapsed(hours)': 0.5, 'month': 1, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(13.8), np.float32(30.0)), 'timestep': 1, 'reward': -0.10122987987606541, 'energy_term': -0.00589738497079933, 'comfort_term': -0.09533249490526607, 'reward_weight': 0.5, 'abs_energy_penalty': -117.9476994159866, 'abs_comfort_penalty': -0.19066498981053215, 'total_power_demand': 117.9476994159866, 'total_temperature_violation': 0.19066498981053215}
/usr/local/lib/python3.12/dist-packages/gymnasium/spaces/box.py:240: UserWarning: WARN: Casting input x to numpy array.
  gym.logger.warn("Casting input x to numpy array.")
Simulation Progress [Episode 1]:  10%|█         | 10/100 [00:00<00:09,  9.64%/s, 10% completed] Reward:  -500.8158604654543 {'time_elapsed(hours)': 744.25, 'month': 2, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 2976, 'reward': -0.00589738497079933, 'energy_term': -0.00589738497079933, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -117.9476994159866, 'abs_comfort_penalty': 0, 'total_power_demand': 117.9476994159866, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]:  17%|█▋        | 17/100 [00:01<00:08,  9.88%/s, 17% completed]Reward:  -771.2477959705227 {'time_elapsed(hours)': 1416.25, 'month': 3, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 5664, 'reward': -0.00589738497079933, 'energy_term': -0.00589738497079933, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -117.9476994159866, 'abs_comfort_penalty': 0, 'total_power_demand': 117.9476994159866, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]:  26%|██▌       | 26/100 [00:02<00:06, 11.30%/s, 26% completed]Reward:  -1099.6675505462101 {'time_elapsed(hours)': 2160.25, 'month': 4, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (18.33, 23.33), 'timestep': 8640, 'reward': -0.00589738497079933, 'energy_term': -0.00589738497079933, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -117.9476994159866, 'abs_comfort_penalty': 0, 'total_power_demand': 117.9476994159866, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]:  34%|███▍      | 34/100 [00:03<00:06,  9.53%/s, 34% completed]Reward:  -1443.9892728462598 {'time_elapsed(hours)': 2880.25, 'month': 5, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 11520, 'reward': -0.00986244334034497, 'energy_term': -0.00986244334034497, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -197.24886680689937, 'abs_comfort_penalty': 0, 'total_power_demand': 197.24886680689937, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]:  42%|████▏     | 42/100 [00:04<00:06,  9.08%/s, 42% completed]Reward:  -1850.3837512878513 {'time_elapsed(hours)': 3624.25, 'month': 6, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 14496, 'reward': -0.443418973577729, 'energy_term': -0.03306759297397215, 'comfort_term': -0.41035138060375687, 'reward_weight': 0.5, 'abs_energy_penalty': -661.351859479443, 'abs_comfort_penalty': -0.8207027612075137, 'total_power_demand': 661.351859479443, 'total_temperature_violation': 0.8207027612075137}
Simulation Progress [Episode 1]:  51%|█████     | 51/100 [00:05<00:04, 11.44%/s, 51% completed]Reward:  -2994.821084822717 {'time_elapsed(hours)': 4344.25, 'month': 7, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (18.33, 23.33), 'timestep': 17376, 'reward': -0.2168368480281406, 'energy_term': -0.03331450494007778, 'comfort_term': -0.1835223430880628, 'reward_weight': 0.5, 'abs_energy_penalty': -666.2900988015556, 'abs_comfort_penalty': -0.3670446861761256, 'total_power_demand': 666.2900988015556, 'total_temperature_violation': 0.3670446861761256}
Simulation Progress [Episode 1]:  59%|█████▉    | 59/100 [00:06<00:04,  8.89%/s, 59% completed]Reward:  -4192.843264700024 {'time_elapsed(hours)': 5088.25, 'month': 8, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 20352, 'reward': -0.4444265580218612, 'energy_term': -0.03655451169123719, 'comfort_term': -0.407872046330624, 'reward_weight': 0.5, 'abs_energy_penalty': -731.0902338247438, 'abs_comfort_penalty': -0.815744092661248, 'total_power_demand': 731.0902338247438, 'total_temperature_violation': 0.815744092661248}
Simulation Progress [Episode 1]:  67%|██████▋   | 67/100 [00:06<00:02, 11.54%/s, 67% completed]Reward:  -5380.941330999556 {'time_elapsed(hours)': 5832.25, 'month': 9, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 23328, 'reward': -0.2973068555116624, 'energy_term': -0.02993733348744081, 'comfort_term': -0.2673695220242216, 'reward_weight': 0.5, 'abs_energy_penalty': -598.7466697488162, 'abs_comfort_penalty': -0.5347390440484432, 'total_power_demand': 598.7466697488162, 'total_temperature_violation': 0.5347390440484432}
Simulation Progress [Episode 1]:  76%|███████▌  | 76/100 [00:07<00:02, 10.25%/s, 76% completed]Reward:  -6627.123778199568 {'time_elapsed(hours)': 6552.25, 'month': 10, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 26208, 'reward': -0.03177037793883851, 'energy_term': -0.03177037793883851, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -635.4075587767702, 'abs_comfort_penalty': 0, 'total_power_demand': 635.4075587767702, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]:  84%|████████▍ | 84/100 [00:08<00:01, 11.03%/s, 84% completed]Reward:  -6944.464641871541 {'time_elapsed(hours)': 7296.25, 'month': 11, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 29184, 'reward': -0.00589738497079933, 'energy_term': -0.00589738497079933, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -117.9476994159866, 'abs_comfort_penalty': 0, 'total_power_demand': 117.9476994159866, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]:  92%|█████████▏| 92/100 [00:09<00:00, 10.92%/s, 92% completed]Reward:  -7259.377698005483 {'time_elapsed(hours)': 8016.25, 'month': 12, 'day': 1, 'hour': 0, 'is_raining': False, 'action': (np.float32(18.33), np.float32(23.33)), 'timestep': 32064, 'reward': -0.018698130195533794, 'energy_term': -0.018698130195533794, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -373.96260391067585, 'abs_comfort_penalty': 0, 'total_power_demand': 373.96260391067585, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]: 100%|██████████| 100/100 [00:09<00:00, 11.14%/s, 100% completed]Episode  0 Mean reward:  -0.21907941002188872 Cumulative reward:  -7676.542527166982

Always remember to close the environment:

[5]:
env.close()
Simulation Progress [Episode 1]: 100%|██████████| 100/100 [00:12<00:00,  8.33%/s, 100% completed]
[ENVIRONMENT] (INFO) : Environment closed. [5zone-hot-continuous-v1]

For more information about pre-defined controllers and how to create custom ones, visit the corresponding documentation.