DRL usage example
We are going to rely on the script available in the repository root called DRL_battery.py
. This script applies all the possibilities that Sinergym has to work with deep reinforcement learning algorithms and set parameters to everything so that we can define the training options from the execution of the script.
Note
For more information about how run DRL_battery.py
, please, see DRL documentation
[1]:
import sinergym
from sinergym.utils.callbacks import LoggerEvalCallback
from sinergym.utils.rewards import *
from sinergym.utils.wrappers import LoggerWrapper
from datetime import datetime
import gym
from stable_baselines3 import DQN
from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.vec_env import DummyVecEnv
/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:73: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(
First let’s define some strings and variables for the execution.
[2]:
environment = "Eplus-demo-v1"
episodes = 4
experiment_date = datetime.today().strftime('%Y-%m-%d %H:%M')
# register run name
name = F"DQN-{environment}-episodes_{episodes}({experiment_date})"
Now we are ready to create the Gym Environment. Here we use the run name defined before as well as the type of reward, in our case we are going to use the LinearReward defined by Sinergym
. You can define your own or use any of the other defined by Sinergym
have a look at ref:rewards
for more information on that.
[3]:
env = gym.make(environment, reward=LinearReward)
[2022-08-24 09:07:06,521] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2022-08-24 09:07:06,522] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to weather and ddy file...
[2022-08-24 09:07:06,524] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Updating idf OutPut:Variable and variables XML tree model for BVCTB connection.
[2022-08-24 09:07:06,526] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Setting up extra configuration in building model if exists...
We can add also a Wrapper to the environment, we are going to use a Logger (extension of gym.Wrapper
) this is used to monitor and log the interactions with the environment and save the data into a CSV.
[4]:
env = LoggerWrapper(env)
At this point we have the environment all set up and ready to be used to define and create our learning model in this case it’s going to be a DQN, but we can use any other (have a look at the DRL_battery.py
and read :ref:Deep Reinforcement Learning Integration
for more detailed information on available DRL algorithms). Please feel free to play and change the values of the attributes of our model (or even the model) to see the differences.
[5]:
model = DQN('MlpPolicy', env, verbose=1)
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Now we need to calculate the number of timesteps of each episode for the evaluation.
[6]:
n_timesteps_episode = env.simulator._eplus_one_epi_len / \
env.simulator._eplus_run_stepsize
Now we need to create a vectorized wrapper for the environment because the callbacks we are going to use require a vector.
[7]:
env_vec = DummyVecEnv([lambda: env])
We are going to use the LoggerEval callback to print and save the best model evaluated during training.
[8]:
callbacks = []
# Set up Evaluation and saving best model
eval_callback = LoggerEvalCallback(
env_vec,
best_model_save_path='best_model/' + name + '/',
log_path='best_model/' + name + '/',
eval_freq=n_timesteps_episode * 2,
deterministic=True,
render=False,
n_eval_episodes=2)
callbacks.append(eval_callback)
callback = CallbackList(callbacks)
This is the number of total time steps for the training.
[9]:
timesteps = episodes * n_timesteps_episode
Now is time to train the model with the callbacks defined earlier. This may take a few minutes, depending on your computer.
[10]:
model.learn(
total_timesteps=timesteps,
callback=callback,
log_interval=1)
[2022-08-24 09:07:07,060] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:07:07,069] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run1
[2022-08-24 09:07:20,353] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:07:20,354] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:07:20,362] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run2
-----------------------------------
| rollout/ | |
| ep_len_mean | 3.5e+04 |
| ep_rew_mean | -2.24e+04 |
| exploration_rate | 0.05 |
| time/ | |
| episodes | 1 |
| fps | 2536 |
| time_elapsed | 13 |
| total_timesteps | 35040 |
-----------------------------------
[2022-08-24 09:07:42,057] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:07:42,058] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:07:42,075] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run3
/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:264: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:256: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
[2022-08-24 09:07:46,890] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:07:46,891] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:07:46,899] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run4
[2022-08-24 09:08:00,708] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:08:00,711] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:08:00,726] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run5
[2022-08-24 09:08:14,972] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:08:14,973] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:08:14,980] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run6
Eval num_timesteps=70080, episode_reward=-17522.75 +/- 0.00
Episode length: 35040.00 +/- 0.00
New best mean reward!
-----------------------------------------
| eval/ | |
| comfort_penalty | -1.67e+04 |
| comfort_violation(%) | 35.7 |
| mean_ep_length | 3.5e+04 |
| mean_power_consumption | 1.83e+08 |
| mean_rewards | -17522.75 |
| power_penalty | -1.83e+04 |
| std_rewards | 0.0 |
| rollout/ | |
| ep_len_mean | 3.5e+04 |
| ep_rew_mean | -2.06e+04 |
| exploration_rate | 0.05 |
| time/ | |
| episodes | 2 |
| fps | 1024 |
| time_elapsed | 68 |
| total_timesteps | 70080 |
| train/ | |
| learning_rate | 0.0001 |
| loss | 61.9 |
| n_updates | 5019 |
-----------------------------------------
[2022-08-24 09:08:42,072] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:08:42,073] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:08:42,080] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run7
-----------------------------------
| rollout/ | |
| ep_len_mean | 3.5e+04 |
| ep_rew_mean | -1.96e+04 |
| exploration_rate | 0.05 |
| time/ | |
| episodes | 3 |
| fps | 1100 |
| time_elapsed | 95 |
| total_timesteps | 105120 |
| train/ | |
| learning_rate | 0.0001 |
| loss | 15.4 |
| n_updates | 13779 |
-----------------------------------
[2022-08-24 09:09:09,600] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:09:09,602] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:09:09,617] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run8
[2022-08-24 09:09:14,436] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:09:14,436] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:09:14,444] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run9
[2022-08-24 09:09:28,929] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:09:28,929] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:09:28,936] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run10
[2022-08-24 09:09:43,141] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully.
[2022-08-24 09:09:43,143] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2022-08-24 09:09:43,157] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-demo-v1-res2/Eplus-env-sub_run11
Eval num_timesteps=140160, episode_reward=-21405.96 +/- 0.00
Episode length: 35040.00 +/- 0.00
------------------------------------------
| eval/ | |
| comfort_penalty | -2.55e+04 |
| comfort_violation(%) | 40.5 |
| mean_ep_length | 3.5e+04 |
| mean_power_consumption | 1.73e+08 |
| mean_rewards | -21405.959 |
| power_penalty | -1.73e+04 |
| std_rewards | 0.0 |
| rollout/ | |
| ep_len_mean | 3.5e+04 |
| ep_rew_mean | -1.94e+04 |
| exploration_rate | 0.05 |
| time/ | |
| episodes | 4 |
| fps | 895 |
| time_elapsed | 156 |
| total_timesteps | 140160 |
| train/ | |
| learning_rate | 0.0001 |
| loss | 28.4 |
| n_updates | 22539 |
------------------------------------------
[10]:
<stable_baselines3.dqn.dqn.DQN at 0x7fa912391f60>
Now we save the current model.
[11]:
model.save(env.simulator._env_working_dir_parent + '/' + name)
And as always, remember to close the environment.
[12]:
env.close()
[2022-08-24 09:09:48,272] EPLUS_ENV_demo-v1_MainThread_ROOT INFO:EnergyPlus simulation closed successfully.