Output format
When a simulation is run, this generate a directory called Eplus-env-<env_name>-res<num_simulation>
. The content of this directory is the result of the simulation and we have:
Eplus-env-<env_name>-res<num_simulation>
├── Eplus-env-sub_run1
├── Eplus-env-sub_run2
├── Eplus-env-sub_run3
├── ...
├── Eplus-env-sub_runN
│ ├── output/
│ ├── variables.cfg
│ ├── socket.cfg
│ ├── utilSocket.cfg
│ ├── environment.idf
│ ├── monitor.csv
| └── monitor_normalized.csv (optional)
└── progress.csv
Eplus-env-sub_run<num_episode> records the results of each episode in simulation. The number of these directories depends on the number of episodes.
- Within these directories, you have always the same structure:
A copy of variables.cfg and environment.idf which are being used during simulation.
A copy of socket.cfg and utilSocket.idf which are being used in order to establish communication interface with Energyplus during simulation.
monitor.csv: This records all interactions Agent-Environment during the episode timestep by timestep, the format is: timestep, observation_values, action_values, simulation_time (seconds), reward, done.
monitor_normalized.csv: This file is only generated when environment is wrapped with normalization (see Wrappers). The structure is the same than monitor.csv but
observation_values
are normalized.output/: This directory has EnergyPlus environment output.
progress.csv: This file has information about general simulation results. There is a row per episode and it records most important data. Currently, the format is: episode_num,cumulative_reward,mean_reward,cumulative_power_consumption, mean_power_consumption,cumulative_comfort_penalty,mean_comfort_penalty, cumulative_power_penalty,mean_power_penalty,comfort_violation (%),length(timesteps), time_elapsed(seconds).
Note
For more information about specific EnergyPlus output, visit EnergyPlus documentation.
Logger
The files monitor.csv, monitor_normalized.csv and progress.csv belong to Sinergym logger which is a wrapper for the environment (see Wrappers). This logger has the responsibility of recording all the interactions that are carried out in a simulation, regardless of the training technique which may be being used or any other external factor.
Recording is managed by a instance of the class CSVLogger
which is present as a environment attribute and is called in each timestep and in the end of a episode:
class CSVLogger(object):
"""CSV Logger for agent interaction with environment.
:param monitor_header: CSV header for sub_run_N/monitor.csv which record interaction step by step.
:param progress_header: CSV header for res_N/progress.csv which record main data episode by episode.
:param log_file: log_file path for monitor.csv, there will be one CSV per episode.
:param log_progress_file: log_file path for progress.csv, there will be only one CSV per whole simulation.
:param flag: This flag is used to activate (True) or deactivate (False) Logger in real time.
:param steps_data, rewards, powers, etc: These arrays are used to record steps data to elaborate main data for progress.csv later.
:param total_timesteps: Current episode timesteps executed.
:param total_time_elapsed: Current episode time elapsed (simulation seconds).
:param comfort_violation_timesteps: Current episode timesteps whose comfort_penalty!=0.
:param steps_data: It is a array of str's. Each element belong to a step data.
"""
def __init__(
self,
monitor_header,
progress_header,
log_progress_file,
log_file=None,
flag=True):
self.monitor_header = monitor_header
self.progress_header = progress_header + '\n'
self.log_file = log_file
self.log_progress_file = log_progress_file
self.flag = flag
# episode data
self.steps_data = [self.monitor_header.split(',')]
self.steps_data_normalized = [self.monitor_header.split(',')]
self.rewards = []
self.powers = []
self.comfort_penalties = []
self.power_penalties = []
self.total_timesteps = 0
self.total_time_elapsed = 0
self.comfort_violation_timesteps = 0
def log_step(
self,
timestep,
date,
observation,
action,
simulation_time,
reward,
total_power_no_units,
comfort_penalty,
power,
done):
"""Log step information and store it in steps_data param.
Args:
timestep (int): Current episode timestep in simulation.
date (list): Current date [month,day,hour] in simulation.
observation (list): Values that belong to current observation.
action (list): Values that belong to current action.
simulation_time (float): Total time elapsed in current episode (seconds).
reward (float): Current reward achieved.
total_power_no_units (float): Power consumption penalty depending on reward function.
comfort_penalty (float): Temperature comfort penalty depending on reward function.
power (float): Power consumption in current step (W).
done (bool): Spicifies if this step terminates episode or not.
"""
if self.flag:
row_contents = [timestep] + list(date) + list(observation) + \
list(action) + [simulation_time, reward,
total_power_no_units, comfort_penalty, done]
self.steps_data.append(row_contents)
# Store step information for episode
self._store_step_information(
reward,
power,
comfort_penalty,
total_power_no_units,
timestep,
simulation_time)
else:
pass
def log_step_normalize(
self,
timestep,
date,
observation,
action,
simulation_time,
reward,
total_power_no_units,
comfort_penalty,
done):
if self.flag:
row_contents = [timestep] + list(date) + list(observation) + \
list(action) + [simulation_time, reward,
total_power_no_units, comfort_penalty, done]
self.steps_data_normalized.append(row_contents)
else:
pass
def log_episode(self, episode):
"""Log episode main information using steps_data param.
Args:
episode (int): Current simulation episode number.
"""
if self.flag:
# statistics metrics for whole episode
ep_mean_reward = np.mean(self.rewards)
ep_cumulative_reward = np.sum(self.rewards)
ep_cumulative_power = np.sum(self.powers)
ep_mean_power = np.mean(self.powers)
ep_cumulative_comfort_penalty = np.sum(self.comfort_penalties)
ep_mean_comfort_penalty = np.mean(self.comfort_penalties)
ep_cumulative_power_penalty = np.sum(self.power_penalties)
ep_mean_power_penalty = np.mean(self.power_penalties)
try:
comfort_violation = (
self.comfort_violation_timesteps /
self.total_timesteps *
100)
except ZeroDivisionError:
comfort_violation = np.nan
# write steps_info in monitor.csv
with open(self.log_file, 'w', newline='') as file_obj:
# Create a writer object from csv module
csv_writer = csv.writer(file_obj)
# Add contents of list as last row in the csv file
csv_writer.writerows(self.steps_data)
# Write normalize steps_info in monitor_normalized.csv
if len(self.steps_data_normalized) > 1:
with open(self.log_file[:-4] + '_normalized.csv', 'w', newline='') as file_obj:
# Create a writer object from csv module
csv_writer = csv.writer(file_obj)
# Add contents of list as last row in the csv file
csv_writer.writerows(self.steps_data_normalized)
# Create CSV file with header if it's required for progress.csv
if not os.path.isfile(self.log_progress_file):
with open(self.log_progress_file, 'a', newline='\n') as file_obj:
file_obj.write(self.progress_header)
# building episode row
row_contents = [
episode,
ep_cumulative_reward,
ep_mean_reward,
ep_cumulative_power,
ep_mean_power,
ep_cumulative_comfort_penalty,
ep_mean_comfort_penalty,
ep_cumulative_power_penalty,
ep_mean_power_penalty,
comfort_violation,
self.total_timesteps,
self.total_time_elapsed]
with open(self.log_progress_file, 'a+', newline='') as file_obj:
# Create a writer object from csv module
csv_writer = csv.writer(file_obj)
# Add contents of list as last row in the csv file
csv_writer.writerow(row_contents)
# Reset episode information
self._reset_logger()
else:
pass
def set_log_file(self, new_log_file):
"""Change log_file path for monitor.csv when an episode ends.
Args:
new_log_file (str): New log path depending on simulation.
"""
if self.flag:
self.log_file = new_log_file
if self.log_file:
with open(self.log_file, 'a', newline='\n') as file_obj:
file_obj.write(self.monitor_header)
else:
pass
def _store_step_information(
self,
reward,
power,
comfort_penalty,
power_penalty,
timestep,
simulation_time):
"""Store relevant data to episode summary in progress.csv.
Args:
reward (float): Current reward achieved.
power (float): Power consumption in current step (W).
comfort_penalty (float): Temperature comfort penalty depending on reward function.
power_penalty (float): Power consumption penalty depending on reward function.
timestep (int): Current episode timestep in simulation.
simulation_time (float): Total time elapsed in current episode (seconds).
"""
if reward is not None:
self.rewards.append(reward)
if power is not None:
self.powers.append(power)
if comfort_penalty is not None:
self.comfort_penalties.append(comfort_penalty)
if power_penalty is not None:
self.power_penalties.append(power_penalty)
if comfort_penalty != 0:
self.comfort_violation_timesteps += 1
self.total_timesteps = timestep
self.total_time_elapsed = simulation_time
def _reset_logger(self):
"""Reset relevant data to next episode summary in progress.csv.
"""
self.steps_data = [self.monitor_header.split(',')]
self.steps_data_normalized = [self.monitor_header.split(',')]
self.rewards = []
self.powers = []
self. comfort_penalties = []
self.power_penalties = []
self.total_timesteps = 0
self.total_time_elapsed = 0
self.comfort_violation_timesteps = 0
def activate_flag(self):
"""Activate Sinergym CSV logger
"""
self.flag = True
def deactivate_flag(self):
"""Deactivate Sinergym CSV logger
"""
self.flag = False
Note
Normalized observation methods are only used when environment is wrapped with normalization previously (see Wrappers).
Note
Note that you can activate and deactivate logger from environment when you want it using methods activate and deactivate, so you don’t need to unwrap environment.