Basic example
Sinergym adheres to the the standard Farama Gymnasium API. Let’s explore how to create a basic control loop.
To begin, we need to import Sinergym and create an environment. In this example, we will use the Eplus-demo-v1
environment.
[ ]:
import gymnasium as gym
import numpy as np
import sinergym
env = gym.make('Eplus-demo-v1')
#==============================================================================================#
[ENVIRONMENT] (INFO) : Creating Gymnasium environment.
[ENVIRONMENT] (INFO) : Name: demo-v1
#==============================================================================================#
[MODELING] (INFO) : Experiment working directory created.
[MODELING] (INFO) : Working directory: /workspaces/sinergym/examples/Eplus-env-demo-v1-res1
[MODELING] (INFO) : Model Config is correct.
[MODELING] (INFO) : Update building model Output:Variable with variable names.
[MODELING] (INFO) : Update building model Output:Meter with meter names.
[MODELING] (INFO) : Extra config: runperiod updated to {'apply_weekend_holiday_rule': 'No', 'begin_day_of_month': 1, 'begin_month': 1, 'begin_year': 1991, 'day_of_week_for_start_day': 'Monday', 'end_day_of_month': 1, 'end_month': 3, 'end_year': 1991, 'use_weather_file_daylight_saving_period': 'Yes', 'use_weather_file_holidays_and_special_days': 'Yes', 'use_weather_file_rain_indicators': 'Yes', 'use_weather_file_snow_indicators': 'Yes'}
[MODELING] (INFO) : Updated episode length (seconds): 5184000.0
[MODELING] (INFO) : Updated timestep size (seconds): 3600.0
[MODELING] (INFO) : Updated timesteps per episode: 1440
[MODELING] (INFO) : Runperiod established.
[MODELING] (INFO) : Episode length (seconds): 5184000.0
[MODELING] (INFO) : timestep size (seconds): 3600.0
[MODELING] (INFO) : timesteps per episode: 1440
[REWARD] (INFO) : Reward function initialized.
[ENVIRONMENT] (INFO) : Environment created successfully.
At first glance, Sinergym might seem to be only imported and not used. However, importing Sinergym inherently defines all the available Environments. In this case, Eplus-demo-v1
is a pre-defined environment that is just ready to use.
With this straightforward setup, we are prepared to iterate over episodes. For this basic example, we will consider a single episode.
[2]:
for i in range(1):
obs, info = env.reset()
rewards = []
truncated = terminated = False
current_month = 0
while not (terminated or truncated):
a = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(a)
rewards.append(reward)
if info['month'] != current_month: # display results every month
current_month = info['month']
print('Reward: ', sum(rewards), info)
#----------------------------------------------------------------------------------------------#
[ENVIRONMENT] (INFO) : Starting a new episode.
[ENVIRONMENT] (INFO) : Episode 1: demo-v1
#----------------------------------------------------------------------------------------------#
[MODELING] (INFO) : Episode directory created.
[MODELING] (INFO) : Weather file USA_PA_Pittsburgh-Allegheny.County.AP.725205_TMY3.epw used.
[MODELING] (INFO) : Adapting weather to building model.
[ENVIRONMENT] (INFO) : Saving episode output path.
[ENVIRONMENT] (INFO) : Episode 1 started.
[SIMULATOR] (INFO) : handlers initialized.
[SIMULATOR] (INFO) : handlers are ready.
[SIMULATOR] (INFO) : System is ready.
Reward: -43.96143518328036 {'time_elapsed(hours)': 2.5, 'month': 1, 'day': 1, 'hour': 1, 'is_raining': False, 'action': array([21.257265, 22.842768], dtype=float32), 'timestep': 1, 'reward': -43.96143518328036, 'energy_term': -43.67932315835093, 'comfort_term': -0.2821120249294271, 'reward_weight': 0.5, 'abs_energy_penalty': -87.35864631670186, 'abs_comfort_penalty': -0.5642240498588542, 'total_power_demand': 87.35864631670186, 'total_temperature_violation': 0.5642240498588542}
Simulation Progress [Episode 1]: 53%|█████▎ | 53/100 [00:00<00:00, 206.77%/s, 53% completed] Reward: -1654090.4591468547 {'time_elapsed(hours)': 745.0833333333334, 'month': 2, 'day': 1, 'hour': 0, 'is_raining': False, 'action': array([15.304309, 23.142662], dtype=float32), 'timestep': 744, 'reward': -10307.216587526324, 'energy_term': -10307.216587526324, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -20614.433175052647, 'abs_comfort_penalty': 0, 'total_power_demand': 20614.433175052647, 'total_temperature_violation': 0.0}
Simulation Progress [Episode 1]: 98%|█████████▊| 98/100 [00:00<00:00, 176.82%/s, 98% completed]Reward: -2817928.777706766 {'time_elapsed(hours)': 1417.25, 'month': 3, 'day': 1, 'hour': 0, 'is_raining': False, 'action': array([15.631611, 24.258501], dtype=float32), 'timestep': 1416, 'reward': -2181.3837787755947, 'energy_term': -2181.3837787755947, 'comfort_term': 0.0, 'reward_weight': 0.5, 'abs_energy_penalty': -4362.767557551189, 'abs_comfort_penalty': 0, 'total_power_demand': 4362.767557551189, 'total_temperature_violation': 0.0}
Remember to close the environment once the interaction is complete:
[3]:
env.close()
Simulation Progress [Episode 1]: 98%|█████████▊| 98/100 [00:02<00:00, 38.51%/s, 98% completed]
[ENVIRONMENT] (INFO) : Environment closed. [demo-v1]
Now, let’s examine the final rewards:
[4]:
print(
'Mean reward: ',
np.mean(rewards),
'Cumulative reward: ',
sum(rewards))
Mean reward: -1968.0675045384069 Cumulative reward: -2834017.206535306
Sinergym has an extensive list of registered environments. We utilize building models with varying characteristics, such as continuous or discrete action spaces, different weather types, weather noise, run periods, timesteps, reward functions, and more.
We will explore these features in the rest of notebooks.