12. Sinergym with Google Cloud

In this project, we have defined some functionality based in gcloud API python in sinergym/utils/gcloud.py. Our team aim to configure a Google Cloud account and combine with Sinergym easily.

The main idea is to construct a virtual machine (VM) using Google Cloud Engine (GCE) in order to execute our Sinergym container on it. At the same time, this remote container will update a Google Cloud Bucket with experiments results and mlflow tracking server with artifacts if we configure that experiment with those options.

When an instance has finished its job, container auto-remove its host instance from Google Cloud Platform if experiments has been configured with this option.

Let’s see a detailed explanation above.

12.1. Preparing Google Cloud

12.1.1. 1. First steps (configuration)

Firstly, it is necessary that you have a Google Cloud account set up and SDK configured (auth, invoicing, project ID, etc). If you don’t have this, it is recommended to check their documentation. Secondly, It is important to have installed Docker in order to be able to manage these containers in Google Cloud.

You can link gcloud with docker accounts using the next (see authentication methods):

$ gcloud auth configure-docker

If you don’t want to have several problems in the future with the image build and Google Cloud functionality in general, we recommend you to allow permissions for google cloud build at the beginning (see this documentation).

Permissions required for cloud build.

On the other hand, we are going to enable Google Cloud services in API library. These are API’s which we need currently:

  • Google Container Registry API.

  • Artifact Registry API

  • Cloud Run API

  • Compute Engine API

  • Cloud Logging API

  • Cloud Monitoring API

  • Cloud Functions API

  • Cloud Pub/Sub API

  • Cloud SQL Admin API

  • Cloud Firestore API

  • Cloud Datastore API

  • Service Usage API

  • Cloud storage

  • Gmail API

Hence, you will have to allow this services into your Google account. You can do it using gcloud client SDK:

$ gcloud services list
$ gcloud services enable artifactregistry.googleapis.com \
                         cloudapis.googleapis.com \
                         cloudbuild.googleapis.com \
                         containerregistry.googleapis.com \
                         gmail.googleapis.com \
                         sql-component.googleapis.com \
                         sqladmin.googleapis.com \
                         storage-component.googleapis.com \
                         storage.googleapis.com \
                         cloudfunctions.googleapis.com \
                         pubsub.googleapis.com \
                         run.googleapis.com \
                         serviceusage.googleapis.com \
                         drive.googleapis.com \
                         appengine.googleapis.com

Or you can use Google Cloud Platform Console:

API's required for cloud build.

If you have installed Sinergym and Sinergym extras. Google Cloud SDK must be linked with other python modules in order to some functionality works in the future (for example, Tensorboard). Please, execute the next in your terminal:

$ gcloud auth application-default login

12.1.2. 2. Use our container in Google Cloud Platform

Our Sinergym container is uploaded in Container Registry as a public one currently. You can use it locally:

$ docker run -it eu.gcr.io/sinergym/sinergym:latest

If you want to use it in a GCE VM, you can execute the next:

$ gcloud compute instances create-with-container sinergym \
    --container-image eu.gcr.io/sinergym/sinergym \
    --zone europe-west1-b \
    --container-privileged \
    --container-restart-policy never \
    --container-stdin \
    --container-tty \
    --boot-disk-size 20GB \
    --boot-disk-type pd-ssd \
    --machine-type n2-highcpu-8

We have available containers in Docker Hub too. Please, visit our repository

Note

It is possible to change parameters in order to set up your own VM with your preferences (see create-with-container).

Warning

--boot-disk-size is really important, by default VM set 10GB and it isn’t enough at all for Sinergym container. This derive in a silence error for Google Cloud Build (and you would need to check logs, which incident is not clear).

12.1.3. 3. Use your own container

Suppose you have this repository forked and you want to upload your own container on Google Cloud and to use it. You can use cloudbuild.yaml with our Dockerfile for this purpose:

steps:
  # Write in cache for quick updates
  - name: "eu.gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: "bash"
    args: ["-c", "docker pull eu.gcr.io/${PROJECT_ID}/sinergym:latest || exit 0"]
    # Build image (using cache if it's possible)
  - name: "eu.gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: "docker"
    args:
      [
        "build",
        "-t",
        "eu.gcr.io/${PROJECT_ID}/sinergym:latest",
        "--cache-from",
        "eu.gcr.io/${PROJECT_ID}/sinergym:latest",
        "--build-arg",
        "SINERGYM_EXTRAS=[DRL,gcloud]",
        ".",
      ]

    # Push image built to container registry
  - name: "eu.gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: "docker"
    args: ["push", "eu.gcr.io/${PROJECT_ID}/sinergym:latest"]

    # This container is going to be public (Change command in other case)
  # - name: "gcr.io/cloud-builders/gsutil"
  #   args:
  #     [
  #       "iam",
  #       "ch",
  #       "AllUsers:objectViewer",
  #       "gs://artifacts.${PROJECT_ID}.appspot.com",
  #     ]
#Other options for execute build (not container)
options:
  diskSizeGb: "10"
  machineType: "E2_HIGHCPU_8"
timeout: 86400s
images: ["eu.gcr.io/${PROJECT_ID}/sinergym:latest"]

This file does the next:

  1. Write in cache for quick updates (if a older container was uploaded already).

  2. Build image (using cache if it’s available)

  3. Push image built to Container Registry

  4. Make container public inner Container Registry.

There is an option section at the end of the file. Do not confuse this part with the virtual machine configuration. Google Cloud uses a helper VM to build everything mentioned above. At the same time, we are using this YAML file in order to upgrade our container because of PROJECT_ID environment variable is defined by Google Cloud SDK, so its value is your current project in Google Cloud global configuration.

Warning

In the same way VM needs more memory, Google Cloud Build needs at least 10GB to work correctly. In other case it may fail.

Warning

If your local computer doesn’t have enough free space it might report the same error (there isn’t difference by Google cloud error manager), so be careful.

In order to execute cloudbuild.yaml, you have to do the next:

$ gcloud builds submit --region europe-west1 \
    --config ./cloudbuild.yaml .

--substitutions can be used in order to configure build parameters if they are needed.

Note

“.” in --config refers to Dockerfile, which is necessary to build container image (see build-config).

Note

In cloudbuild.yaml there is a variable named PROJECT_ID. However, it is not defined in substitutions. This is because it’s a predetermined variable by Google Cloud. When build begins “$PROJECT_ID” is set to current value in gcloud configuration (see substitutions-variables).

12.1.4. 4. Create your VM or MIG

To create a VM that uses this container, here there is an example:

$ gcloud compute instances create-with-container sinergym \
    --container-image eu.gcr.io/sinergym/sinergym \
    --zone europe-west1-b \
    --container-privileged \
    --container-restart-policy never \
    --container-stdin \
    --container-tty \
    --boot-disk-size 20GB \
    --boot-disk-type pd-ssd \
    --machine-type n2-highcpu-8

Note

--container-restart-policy never it’s really important for a correct functionality.

Warning

If you decide enter in VM after create it immediately, it is possible container hasn’t been created yet. You can think that is an error, Google cloud should notify this. If this issue happens, you should wait for a several minutes.

To create a MIG, you need to create a machine set up template firstly, for example:

$ gcloud compute instance-templates create-with-container sinergym-template \
--container-image eu.gcr.io/sinergym/sinergym \
--container-privileged \
--service-account storage-account@sinergym.iam.gserviceaccount.com \
--scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.full_control \
--container-env=gce_zone=europe-west1-b,gce_project_id=sinergym,MLFLOW_TRACKING_URI=http://$(gcloud compute addresses describe mlflow-ip --format='get(address)'):5000 \
--container-restart-policy never \
--container-stdin \
--container-tty \
--boot-disk-size 20GB \
--boot-disk-type pd-ssd \
--machine-type n2-highcpu-8

Note

--service-account, --scopes and --container-env parameters will be explained in Containers permission to bucket storage output. Please, read that documentation before using these parameters, since they require a previous configuration.

Then, you can create a group-instances as large as you want:

$ gcloud compute instance-groups managed create example-group \
    --base-instance-name sinergym-vm \
    --size 3 \
    --template sinergym-template

Warning

It is possible that quote doesn’t let you have more than one VM at the same time. Hence, the rest of VM’s probably will be initializing always but never ready. If it is your case, we recommend you check your quotes here

12.1.5. 5. Initiate your VM

Your virtual machine is ready! To connect you can use ssh (see gcloud-ssh):

$ gcloud compute ssh <machine-name>

Google Cloud use a Container-Optimized OS (see documentation) in VM. This SO have docker pre-installed with Sinergym container.

GCE VM containers list

To use this container in our machine you only have to do:

$ docker attach <container-name-or-ID>
GCE VM container usage.

And now you can execute your own experiments in Google Cloud! For example, you can enter in remote container with gcloud ssh and execute DRL_battery.py for the experiment you want.

12.2. Executing experiments in remote containers

This script, called DRL_battery.py, will be allocated in every remote container and it is used to execute experiments and combine it with Google Cloud Bucket, Mlflow Artifacts, auto-remove, etc:

import argparse
import os
from datetime import datetime

import gym
import mlflow
import numpy as np
import tensorboard
from stable_baselines3 import A2C, DDPG, DQN, PPO, SAC, TD3
from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.logger import configure
from stable_baselines3.common.noise import NormalActionNoise
from stable_baselines3.common.vec_env import DummyVecEnv

import sinergym
import sinergym.utils.gcloud as gcloud
from sinergym.utils.callbacks import LoggerCallback, LoggerEvalCallback
from sinergym.utils.constants import RANGES_5ZONE, RANGES_DATACENTER, RANGES_IW
from sinergym.utils.rewards import *
from sinergym.utils.wrappers import (LoggerWrapper, MultiObsWrapper,
                                     NormalizeObservation)

# ---------------------------------------------------------------------------- #
#                             Parameters definition                            #
# ---------------------------------------------------------------------------- #
parser = argparse.ArgumentParser()
# commons arguments for battery
parser.add_argument(
    '--environment',
    '-env',
    required=True,
    type=str,
    dest='environment',
    help='Environment name of simulation (see sinergym/__init__.py).')
parser.add_argument(
    '--model',
    '-mod',
    type=str,
    default=None,
    dest='model',
    help='Path where model is stored, only when you want a model to continue training it.')
parser.add_argument(
    '--episodes',
    '-ep',
    type=int,
    default=1,
    dest='episodes',
    help='Number of episodes for training.')
parser.add_argument(
    '--algorithm',
    '-alg',
    type=str,
    default='PPO',
    dest='algorithm',
    help='Algorithm used to train (possible values: PPO, A2C, DQN, DDPG, SAC, TD3).')
parser.add_argument(
    '--reward',
    '-rw',
    type=str,
    default='linear',
    dest='reward',
    help='Reward function used by model, by default is linear (possible values: linear, exponential).')
parser.add_argument(
    '--energy_weight',
    '-rew',
    type=float,
    dest='energy_weight',
    help='Reward energy weight with compatible rewards types.')
parser.add_argument(
    '--normalization',
    '-norm',
    action='store_true',
    dest='normalization',
    help='Apply normalization to observations if this flag is specified.')
parser.add_argument(
    '--multiobs',
    '-mobs',
    action='store_true',
    dest='multiobs',
    help='Apply Multi observations if this flag is specified.')
parser.add_argument(
    '--logger',
    '-log',
    action='store_true',
    dest='logger',
    help='Apply Sinergym CSVLogger class if this flag is specified.')
parser.add_argument(
    '--tensorboard',
    '-tens',
    type=str,
    default=None,
    dest='tensorboard',
    help='Tensorboard path for logging (if not specified, tensorboard log will not be stored).')
parser.add_argument(
    '--evaluation',
    '-eval',
    action='store_true',
    dest='evaluation',
    help='Evaluation is processed during training with this flag (save best model online).')
parser.add_argument(
    '--eval_freq',
    '-evalf',
    type=int,
    default=2,
    dest='eval_freq',
    help='Episodes executed before applying evaluation (if evaluation flag is not specified, this value is useless).')
parser.add_argument(
    '--eval_length',
    '-evall',
    type=int,
    default=2,
    dest='eval_length',
    help='Episodes executed during evaluation (if evaluation flag is not specified, this value is useless).')
parser.add_argument(
    '--log_interval',
    '-inter',
    type=int,
    default=1,
    dest='log_interval',
    help='model training log_interval parameter. See documentation since this value is different in every algorithm.')
parser.add_argument(
    '--seed',
    '-sd',
    type=int,
    default=None,
    dest='seed',
    help='Seed used to algorithm training.')
parser.add_argument(
    '--id',
    '-id',
    type=str,
    default=None,
    dest='id',
    help='Custom experiment identifier.')
parser.add_argument(
    '--remote_store',
    '-sto',
    action='store_true',
    dest='remote_store',
    help='Determine if sinergym output will be sent to a Google Cloud Storage Bucket.')
parser.add_argument(
    '--bucket_name',
    '-buc',
    type=str,
    default='experiments-storage',
    dest='bucket_name',
    help='Name of the bucket where experiments output will be stored if experiment is configured to that.')
parser.add_argument(
    '--mlflow_store',
    '-mlflow',
    action='store_true',
    dest='mlflow_store',
    help='Determine if sinergym output will be sent to a mlflow artifact storage')
parser.add_argument(
    '--group_name',
    '-group',
    type=str,
    dest='group_name',
    help='This field indicate instance group name')
parser.add_argument(
    '--auto_delete',
    '-del',
    action='store_true',
    dest='auto_delete',
    help='If is a GCE instance and this flag is active, that instance will be removed from GCP.')

parser.add_argument('--learning_rate', '-lr', type=float, default=.0003)
parser.add_argument('--n_steps', '-n', type=int, default=2048)
parser.add_argument('--batch_size', '-bs', type=int, default=64)
parser.add_argument('--n_epochs', '-ne', type=int, default=10)
parser.add_argument('--gamma', '-g', type=float, default=.99)
parser.add_argument('--gae_lambda', '-gl', type=float, default=.95)
parser.add_argument('--ent_coef', '-ec', type=float, default=0)
parser.add_argument('--vf_coef', '-v', type=float, default=.5)
parser.add_argument('--max_grad_norm', '-m', type=float, default=.5)
parser.add_argument('--buffer_size', '-bfs', type=int, default=1000000)
parser.add_argument('--learning_starts', '-ls', type=int, default=100)
parser.add_argument('--tau', '-tu', type=float, default=0.005)
parser.add_argument('--gradient_steps', '-gs', type=int, default=1)
parser.add_argument('--clip_range', '-cr', type=float, default=.2)
parser.add_argument('--sigma', '-sig', type=float, default=0.1)
parser.add_argument('--rms_prop_eps', '-rpe', type=float, default=1e-5)

args = parser.parse_args()
#------------------------------------------------------------------------------#

# ---------------------------------------------------------------------------- #
#                               Register run name                              #
# ---------------------------------------------------------------------------- #
experiment_date = datetime.today().strftime('%Y-%m-%d_%H:%M')
name = args.algorithm + '-' + args.environment + \
    '-episodes-' + str(args.episodes)
if args.seed:
    name += '-seed-' + str(args.seed)
if args.id:
    name += '-id-' + str(args.id)
name += '_' + experiment_date

# ---------------------------------------------------------------------------- #
#                    Check if MLFLOW_TRACKING_URI is defined                   #
# ---------------------------------------------------------------------------- #
mlflow_tracking_uri = os.environ.get('MLFLOW_TRACKING_URI')
if mlflow_tracking_uri is not None:
    # Check ping to server
    mlflow_ip = mlflow_tracking_uri.split('/')[-1].split(':')[0]
    # If server is not valid, setting default local path to mlflow
    response = os.system("ping -c 1 " + mlflow_ip)
    if response != 0:
        mlflow.set_tracking_uri('file://' + os.getcwd() + '/mlruns')
# MLflow track
with mlflow.start_run(run_name=name):
    # Log experiment params
    mlflow.log_param('sinergym-version', sinergym.__version__)

    mlflow.log_param('env', args.environment)
    mlflow.log_param('episodes', args.episodes)
    mlflow.log_param('algorithm', args.algorithm)
    mlflow.log_param('reward', args.reward)
    mlflow.log_param('normalization', bool(args.normalization))
    mlflow.log_param('multi-observations', bool(args.multiobs))
    mlflow.log_param('logger', bool(args.logger))
    mlflow.log_param('tensorboard', args.tensorboard)
    mlflow.log_param('evaluation', bool(args.evaluation))
    mlflow.log_param('evaluation-frequency', args.eval_freq)
    mlflow.log_param('evaluation-length', args.eval_length)
    mlflow.log_param('log-interval', args.log_interval)
    mlflow.log_param('seed', args.seed)
    mlflow.log_param('remote-store', bool(args.remote_store))

    mlflow.log_param('learning-rate', args.learning_rate)
    mlflow.log_param('n-steps', args.n_steps)
    mlflow.log_param('batch-size', args.batch_size)
    mlflow.log_param('n-epochs', args.n_epochs)
    mlflow.log_param('gamma', args.gamma)
    mlflow.log_param('gae-lambda', args.gae_lambda)
    mlflow.log_param('ent-coef', args.ent_coef)
    mlflow.log_param('vf-coef', args.vf_coef)
    mlflow.log_param('max-grad-norm', args.max_grad_norm)
    mlflow.log_param('buffer-size', args.buffer_size)
    mlflow.log_param('learning-starts', args.learning_starts)
    mlflow.log_param('tau', args.tau)
    mlflow.log_param('gradient-steps', args.gradient_steps)
    mlflow.log_param('clip-range', args.clip_range)
    mlflow.log_param('sigma', args.sigma)
    mlflow.log_param('rms_prop_eps', args.rms_prop_eps)
    mlflow.log_param('id', args.id)

    # ---------------------------------------------------------------------------- #
    #               Environment construction (with reward specified)               #
    # ---------------------------------------------------------------------------- #
    if args.reward == 'linear':
        reward = LinearReward
    elif args.reward == 'exponential':
        reward = ExpReward
    else:
        raise RuntimeError(
            'Reward function [{}] specified is not registered.'.format(
                args.reward))

    env = gym.make(args.environment, reward=reward)
    if hasattr(env.reward_fn, 'W_energy') and args.energy_weight is not None:
        env.reward_fn.W_energy = args.energy_weight

    # env for evaluation if is enabled
    eval_env = None
    if args.evaluation:
        eval_env = gym.make(args.environment, reward=reward)
        if hasattr(eval_env.reward_fn, 'W_energy') and args.energy_weight:
            eval_env.reward_fn.W_energy = args.energy_weight

    # ---------------------------------------------------------------------------- #
    #                                   Wrappers                                   #
    # ---------------------------------------------------------------------------- #
    if args.normalization:
        # dictionary ranges to use
        norm_range = None
        env_type = args.environment.split('-')[1]
        if env_type == 'datacenter':
            norm_range = RANGES_DATACENTER
        elif env_type == '5Zone':
            norm_range = RANGES_5ZONE
        elif env_type == 'IWMullion':
            norm_range = RANGES_IW
        else:
            raise NameError(
                'Normalization cant be use on environment :"{}", check environment name or disable normalization'.format(
                    args.environment))
        env = NormalizeObservation(env, ranges=norm_range)
        if eval_env is not None:
            eval_env = NormalizeObservation(eval_env, ranges=norm_range)
    if args.logger:
        env = LoggerWrapper(env)
        if eval_env is not None:
            eval_env = LoggerWrapper(eval_env)
    if args.multiobs:
        env = MultiObsWrapper(env)
        if eval_env is not None:
            eval_env = MultiObsWrapper(eval_env)

    # ---------------------------------------------------------------------------- #
    #                           Defining model (algorithm)                         #
    # ---------------------------------------------------------------------------- #

    if args.model is None:

        #--------------------------------------------------------#
        #                           DQN                          #
        #--------------------------------------------------------#
        if args.algorithm == 'DQN':
            model = DQN('MlpPolicy', env, verbose=1,
                        learning_rate=args.learning_rate,
                        buffer_size=args.buffer_size,
                        learning_starts=args.learning_starts,
                        batch_size=args.batch_size,
                        tau=args.tau,
                        gamma=args.gamma,
                        train_freq=4,
                        gradient_steps=args.gradient_steps,
                        target_update_interval=10000,
                        exploration_fraction=.1,
                        exploration_initial_eps=1.0,
                        exploration_final_eps=.05,
                        max_grad_norm=args.max_grad_norm,
                        seed=args.seed,
                        tensorboard_log=args.tensorboard)
        #--------------------------------------------------------#
        #                           DDPG                         #
        #--------------------------------------------------------#
        elif args.algorithm == 'DDPG':
            if args.sigma:
                # noise objects for DDPG
                n_actions = env.action_space.shape[-1]
                action_noise = NormalActionNoise(mean=np.zeros(
                    n_actions), sigma=0.1 * np.ones(n_actions))

            model = DDPG("MlpPolicy",
                         env,
                         action_noise=action_noise,
                         verbose=1,
                         seed=args.seed,
                         tensorboard_log=args.tensorboard)
        #--------------------------------------------------------#
        #                           A2C                          #
        #--------------------------------------------------------#
        elif args.algorithm == 'A2C':
            model = A2C('MlpPolicy', env, verbose=1,
                        learning_rate=args.learning_rate,
                        n_steps=args.n_steps,
                        gamma=args.gamma,
                        gae_lambda=args.gae_lambda,
                        ent_coef=args.ent_coef,
                        vf_coef=args.vf_coef,
                        max_grad_norm=args.max_grad_norm,
                        rms_prop_eps=args.rms_prop_eps,
                        seed=args.seed,
                        tensorboard_log=args.tensorboard)
        #--------------------------------------------------------#
        #                           PPO                          #
        #--------------------------------------------------------#
        elif args.algorithm == 'PPO':
            model = PPO('MlpPolicy', env, verbose=1,
                        learning_rate=args.learning_rate,
                        n_steps=args.n_steps,
                        batch_size=args.batch_size,
                        n_epochs=args.n_epochs,
                        gamma=args.gamma,
                        gae_lambda=args.gae_lambda,
                        clip_range=args.clip_range,
                        ent_coef=args.ent_coef,
                        vf_coef=args.vf_coef,
                        max_grad_norm=args.max_grad_norm,
                        seed=args.seed,
                        tensorboard_log=args.tensorboard)
        #--------------------------------------------------------#
        #                           SAC                          #
        #--------------------------------------------------------#
        elif args.algorithm == 'SAC':
            model = SAC(policy='MlpPolicy',
                        env=env,
                        seed=args.seed,
                        learning_rate=args.learning_rate,
                        buffer_size=args.buffer_size,
                        batch_size=args.batch_size,
                        tau=args.tau,
                        gamma=args.gamma,
                        tensorboard_log=args.tensorboard)
        #--------------------------------------------------------#
        #                           TD3                          #
        #--------------------------------------------------------#
        elif args.algorithm == 'TD3':
            model = TD3(policy='MlpPolicy',
                        env=env, seed=args.seed,
                        tensorboard_log=args.tensorboard,
                        learning_rate=args.learning_rate,
                        buffer_size=args.buffer_size,
                        batch_size=args.batch_size,
                        tau=args.tau,
                        gamma=args.gamma,
                        train_freq=(1, 'episode'),
                        action_noise=None,
                        replay_buffer_class=None,
                        replay_buffer_kwargs=None,
                        optimize_memory_usage=False,
                        policy_delay=2,
                        target_policy_noise=0.2,
                        target_noise_clip=0.5,
                        create_eval_env=False,
                        policy_kwargs=None,
                        verbose=0,
                        device='auto',
                        _init_setup_model=True)
        #--------------------------------------------------------#
        #                           Error                        #
        #--------------------------------------------------------#
        else:
            raise RuntimeError(
                F'Algorithm specified [{args.algorithm}] is not registered.')

    else:
        model_path = ''
        if 'gs://' in args.model:
            # Download from given bucket (gcloud configured with privileges)
            client = gcloud.init_storage_client()
            bucket_name = args.model.split('/')[2]
            model_path = args.model.split(bucket_name + '/')[-1]
            gcloud.read_from_bucket(client, bucket_name, model_path)
            model_path = './' + model_path
        else:
            model_path = args.model

        model = None
        if args.algorithm == 'DQN':
            model = DQN.load(model_path, tensorboard_log=args.tensorboard)
        elif args.algorithm == 'DDPG':
            model = DDPG.load(model_path, tensorboard_log=args.tensorboard)
        elif args.algorithm == 'A2C':
            model = A2C.load(model_path, tensorboard_log=args.tensorboard)
        elif args.algorithm == 'PPO':
            model = PPO.load(model_path, tensorboard_log=args.tensorboard)
        elif args.algorithm == 'SAC':
            model = SAC.load(model_path, tensorboard_log=args.tensorboard)
        elif args.algorithm == 'TD3':
            model = TD3.load(model_path, tensorboard_log=args.tensorboard)
        else:
            raise RuntimeError('Algorithm specified is not registered.')

        model.set_env(env)

    # ---------------------------------------------------------------------------- #
    #       Calculating total training timesteps based on number of episodes       #
    # ---------------------------------------------------------------------------- #
    n_timesteps_episode = env.simulator._eplus_one_epi_len / \
        env.simulator._eplus_run_stepsize
    timesteps = args.episodes * n_timesteps_episode - 1

    # ---------------------------------------------------------------------------- #
    #                                   CALLBACKS                                  #
    # ---------------------------------------------------------------------------- #
    callbacks = []

    # Set up Evaluation and saving best model
    if args.evaluation:
        eval_callback = LoggerEvalCallback(
            eval_env,
            best_model_save_path='best_model/' + name,
            log_path='best_model/' + name + '/',
            eval_freq=n_timesteps_episode *
            args.eval_freq,
            deterministic=True,
            render=False,
            n_eval_episodes=args.eval_length)
        callbacks.append(eval_callback)

    # Set up tensorboard logger
    if args.tensorboard:
        log_callback = LoggerCallback(sinergym_logger=bool(args.logger))
        callbacks.append(log_callback)
        # lets change default dir for TensorboardFormatLogger only
        tb_path = args.tensorboard + '/' + name
        new_logger = configure(tb_path, ["tensorboard"])
        model.set_logger(new_logger)

    callback = CallbackList(callbacks)

    # ---------------------------------------------------------------------------- #
    #                                   TRAINING                                   #
    # ---------------------------------------------------------------------------- #
    model.learn(
        total_timesteps=timesteps,
        callback=callback,
        log_interval=args.log_interval)
    model.save(env.simulator._env_working_dir_parent + '/' + name)

    # If the algorithm doesn't reset or close the environment, this script will do it in
    # order to correctly log all the simulation data (Energyplus + Sinergym
    # logs)
    if env.simulator._episode_existed:
        env.close()

    # ---------------------------------------------------------------------------- #
    #                           Mlflow artifacts storege                           #
    # ---------------------------------------------------------------------------- #
    if args.mlflow_store:
        # Code for send output and tensorboard to mlflow artifacts.
        mlflow.log_artifacts(
            local_dir=env.simulator._env_working_dir_parent,
            artifact_path=name)
        if args.evaluation:
            mlflow.log_artifacts(
                local_dir='best_model/' + name,
                artifact_path='best_model/' + name)
        # If tensorboard is active (in local) we should send to mlflow
        if args.tensorboard and 'gs://' + args.bucket_name not in args.tensorboard:
            mlflow.log_artifacts(
                local_dir=args.tensorboard + '/' + name,
                artifact_path=os.path.abspath(args.tensorboard).split('/')[-1] + '/' + name)

    # ---------------------------------------------------------------------------- #
    #                          Google Cloud Bucket Storage                         #
    # ---------------------------------------------------------------------------- #
    if args.remote_store:
        # Initiate Google Cloud client
        client = gcloud.init_storage_client()
        # Code for send output and tensorboard to common resource here.
        gcloud.upload_to_bucket(
            client,
            src_path=env.simulator._env_working_dir_parent,
            dest_bucket_name=args.bucket_name,
            dest_path=name)
        if args.evaluation:
            gcloud.upload_to_bucket(
                client,
                src_path='best_model/' + name + '/',
                dest_bucket_name=args.bucket_name,
                dest_path='best_model/' + name + '/')
        # If tensorboard is active (in local) we should send to bucket
        if args.tensorboard and 'gs://' + args.bucket_name not in args.tensorboard:
            gcloud.upload_to_bucket(
                client,
                src_path=args.tensorboard + '/' + name + '/',
                dest_bucket_name=args.bucket_name,
                dest_path=os.path.abspath(args.tensorboard).split('/')[-1] + '/' + name + '/')
        # gcloud.upload_to_bucket(
        #     client,
        #     src_path='mlruns/',
        #     dest_bucket_name=args.bucket_name,
        #     dest_path='mlruns/')

    # End mlflow run
    mlflow.end_run()

    # ---------------------------------------------------------------------------- #
    #                   Autodelete option if is a cloud resource                   #
    # ---------------------------------------------------------------------------- #
    if args.group_name and args.auto_delete:
        token = gcloud.get_service_account_token()
        gcloud.delete_instance_MIG_from_container(args.group_name, token)

Note

DRL_battery.py is able to be used to local experiments into client computer. For example, --auto_delete parameter will have no effect in experiment. This experiments results could be sent to bucket and mlflow artifacts if it is specified. We will see it.

The list of parameter is pretty large. Let’s see it:

  • --environment or -env: Environment name you want to use (see Environments).

  • --episodes or -ep: Number of episodes you want to train agent in simulation (Depending on environment episode length can be different).

  • --algorithm or -alg: Algorithm you want to use to train (Currently, it is available PPO, A2C, DQN, DDPG and SAC).

  • --reward or -rw: Reward class you want to use for reward function. Currently, possible values are “linear” and “exponential” (see Rewards).

  • --normalization or -norm: Apply normalization wrapper to observations during training. If it isn’t specified wrapper will not be applied (see Wrappers).

  • --multiobs or -mobs: Apply Multi-Observation wrapper to observations during training. If it isn’t specified wrapper will not be applied (see Wrappers).

  • --logger or -log: Apply Sinergym logger wrapper during training. If it isn’t specified wrapper will not be applied (see Wrappers and Logger).

  • --tensorboard or -tens: This parameter will contain a path-file or path-remote-bucket to allocate tensorboard training logs. If it isn’t specified this log will be deactivate (see Tensorboard structure).

  • --evaluation or -eval: If it is specified, evaluation callback will be activate, else model evaluation will be deactivate during training (see Deep Reinforcement Learning Integration).

  • --eval_freq or -evalf: Only if --evaluation flag has been written. Episode frequency for evaluation.

  • --eval_length or -evall: Only if --evaluation flag has been written. Number of episodes for each evaluation.

  • --log_interval or -inter: This parameter is used for learn()

    method in each algorithm. It is important specify a correct value.

  • --seed or -sd: Seed for training, random components in process will be able to be recreated.

  • --remote_store or -sto: Determine if sinergym output and tensorboard log (when a local path is specified and not a remote bucket path) will be sent to a common resource (Bucket), else will be allocate in remote container memory only.

  • --mlflow_store or -mlflow: Determine if sinergym output and tensorboard log (when a local path is specified and not a remote bucket path) will be sent to a Mlflow Artifact, else will be allocate in remote container memory only.

  • --group_name or -group: It specify to which MIG the host instance belongs, it is important if –auto-delete is activated.

  • --auto_delete or -del: Whether this parameter is specified, remote instance will be auto removed when its job has finished.

  • algorithm hyperparameters: Execute python DRL_battery --help for more information.

Warning

For a correct auto_delete functionality, please, use MIG’s instead of individual instances.

This script do the next:

  1. Setting an appropriate name for the experiment. Following the next format: <algorithm>-<environment_name>-episodes<episodes_int>-seed<seed_value>(<experiment_date>)

  2. Starting Mlflow track experiment with that name, if mlflow server is not available, it will be used an local path (./mlruns) in remote container.

  3. Log all MlFlow parameters (including sinergym.__version__).

  4. Setting reward function specified in --reward parameter.

  5. Setting wrappers specified in environment.

  6. Defining model algorithm using hyperparameters.

  7. Calculate training timesteps using number of episodes.

  8. Setting up evaluation callback if it has been specified.

  9. Setting up Tensorboard logger callback if it has been specified.

  10. Training with environment.

  11. If --remote_store has been specified, saving all outputs in Google Cloud Bucket. If --mlflow_store has been specified, saving all outputs in Mlflow run artifact.

  12. Auto-delete remote container in Google Cloud Platform when parameter --auto_delete has been specified.

12.2.1. Containers permission to bucket storage output

As you see in sinergym template explained in 4. Create your VM or MIG, it is specified --scope, --service-account and --container-env. This aim to remote_store option in DRL_battery.py works correctly. Those parameters provide each container with permissions to write in the bucket and manage Google Cloud Platform (auto instance remove function). Container environment variables indicate zone, project_id and mlflow tracking server uri need it in Mlflow tracking server set up.

Hence, it is necessary to set up this service account and give privileges in order to that objective. Then, following Google authentication documentation we will do the next:

$ gcloud iam service-accounts create storage-account
$ gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:storage-account@PROJECT_ID.iam.gserviceaccount.com" --role="roles/owner"
$ gcloud iam service-accounts keys create PROJECT_PATH/google-storage.json --iam-account=storage-account@PROJECT_ID.iam.gserviceaccount.com
$ export GOOGLE_CLOUD_CREDENTIALS= PROJECT_PATH/google-storage.json

In short, we create a new service account called storage-account. Then, we dote this account with roles/owner permission. The next step is create a file key (json) called google-storage.json in our project root (gitignore will ignore this file in remote). Finally, we export this file in GOOGLE_CLOUD_CREDENTIALS in our local computer in order to gcloud SDK knows that it has to use that token to authenticate.

12.3. Load a trained model

For this purpose, we have a script called load_agent.py which can be used both on a remote machine and locally on our computer, just like DRL_battery.py.

So, what this script does is to use the path that we pass as a parameter where our model is located. It loads the model and performs the evaluation that we want.

The list of parameter is:

  • --environment or -env: Environment name you want to use.

  • --model or -mod: Trained model (zip file) you want to use to execute the evaluation. This path can be a local path file into your computer (remote or host) or a Google Cloud Storage resource (bucket like gs://<bucket_name>/<model_path>).

  • --episodes or -ep: Number of episodes you want to evaluate agent in simulation (Depending on environment episode length can be different).

  • --algorithm or -alg: Algorithm which model was trained (Currently, it is available PPO, A2C, DQN, DDPG and SAC).

  • --reward or -rw: Reward class you want to use for reward function (same reward than training model is recommended). Currently, possible values are “linear” and “exponential”.

  • --normalization or -norm: Apply normalization wrapper to observations during evaluation. If it isn’t specified wrapper will not be applied.

  • --logger or -log: Apply Sinergym logger wrapper during evaluation. If it isn’t specified wrapper will not be applied.

  • --seed or -sd: Seed for evaluation, random components in process will be able to be recreated.

  • --remote_store or -sto: Determine if sinergym output will be sent to a common resource (Bucket), else will be allocate in container or host memory only.

  • --group_name or -group: It specify to which MIG the host instance belongs, it is important if –auto-delete is activated.

  • --auto_delete or -del: Whether this parameter is specified, remote instance will be auto removed when its job has finished.

This script loads the model. Once the model is loaded, it predicts the actions from the states during the agreed episodes. The information is collected and sent to a cloud storage if it has been specified, otherwise it is stored in local memory.

12.4. Remote Tensorboard log

In --tensorboard parameter we have to specify a local path or a Bucket path.

If we specify a local path, tensorboard logs will be stored in remote containers memory. If you have specified --remote_store or --mlflow_store, this logs will be sent to those remote storage when experiment finishes. One of the strengths of Tensorboard is the ability to see the data in real time as the training is running. Thus, it is recommended to define in --tensorboard the bucket path directly in order to send that information as the training is generating it (see this issue for more information). In our project we have gs://experiments-storage/tensorboard_log but you can have whatever you want.

Note

If in --tensorboard you have specified a gs path, --remote_store or --mlflow_store parameters don’t store tensorboard logs.

Warning

Whether you have written a bucket path, don’t write / at the end (gs://experiments-storage/tensorboard_log/), this causes that real-time remote storage doesn’t work correctly.

Warning

In the case that gs URI isn’t recognized. Maybe is due to your tensorboard installation hasn’t got access your google account. Try gcloud auth application-default login command.

12.4.1. Visualize remote Tensorboard log in real-time

You have two options:

  1. Create a remote server with tensorboard service deployed.

2. Initiate that service in your local computer, reading from the bucket log, and access to the visualization in http://localhost:6006

The second options is enough since we can read from bucket when we need directly and shut down local service when we finish.

$ tensorboard --logdir gs://experiments-storage/tensorboard_log/

12.5. Mlflow tracking server set up

Mlflow tracking server can be set up into your google account in order to organize your own experiments (Mlflow). You can separate back-end (SQL database) from tracking server. In this way, you can shut down or delete server instance without loose your experiments run data, since SQL is always up. Let’s see how:

#!/bin/bash

# This scrip is used to build a mlflow server in Google Cloud, it is important
# to set up account previously.

# Please, visit our documentation here --> https://ugr-sail.github.io/sinergymcompilationhtml/index.html

# Step 0 - Store all parameters

PROJECT_ID=$1
BUCKET_NAME=$2
ZONE=$3
REGION=${ZONE::-2}
DB_ROOT_PASSWORD=$4
MACHINE_TYPE=e2-medium
MLFLOW_IMAGE=kaysush/mlflow:1.14.1
CLOUD_SQL_PROXY_IMAGE=gcr.io/cloudsql-docker/gce-proxy:1.19.1
MYSQL_INSTANCE=${PROJECT_ID}:${REGION}:mlflow-backend

# Step 1 - Service account for mlflow service
echo "Creating Service account for mlflow service [mlflow-tracking-sa]..."
gcloud iam service-accounts create mlflow-tracking-sa --description="Service Account to run the MLFLow tracking server" --display-name="MLFlow tracking SA"

# Step 2 - Artifact used by mlflow to store all runs information
echo "Creating Back-end artifact bucket [$BUCKET_NAME]..."
gsutil mb -l europe-west1 gs://$BUCKET_NAME

# Step 3 - CLoud SQL, instance with SQL and "mlflow" database inner
echo "Creating sql instance with mlflow database [mlflow-backend]..."
gcloud sql instances create mlflow-backend --tier=db-f1-micro --region=${REGION} --root-password=${DB_ROOT_PASSWORD} --storage-type=SSD
gcloud sql databases create mlflow --instance=mlflow-backend

# Step 4 - IAM: Provisioning service account privileges in order to manipulate bucket and back-end
echo "Creating service account privileges to use Back-end [roles/cloudsql.editor]..."
gsutil iam ch "serviceAccount:mlflow-tracking-sa@${PROJECT_ID}.iam.gserviceaccount.com:roles/storage.admin" gs://${BUCKET_NAME}
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:mlflow-tracking-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/cloudsql.editor

# Step 5 - Creating start_mlflow_tracking.sh to initialize instance
echo "Creating start_mlflow_tracking.sh to initialize instance..."
cat <<EOF >./start_mlflow_tracking.sh
echo "Starting Cloud SQL Proxy'"
docker run -d --name mysql  --net host -p 3306:3306 $CLOUD_SQL_PROXY_IMAGE /cloud_sql_proxy -instances $MYSQL_INSTANCE=tcp:0.0.0.0:3306

echo "Starting mlflow-tracking server"
docker run -d --name mlflow-tracking --net host -p 5000:5000 $MLFLOW_IMAGE mlflow server --backend-store-uri mysql+pymysql://root:${DB_ROOT_PASSWORD}@localhost/mlflow --default-artifact-root gs://${BUCKET_NAME}/mlflow_artifacts/ --host 0.0.0.0

echo "Altering IPTables"
iptables -A INPUT -p tcp --dport 5000 -j ACCEPT
EOF

# Step 6 - Uploading start script and deleting from local
echo "Uploading start_mlflow_tracking.sh at gs://${BUCKET_NAME}/scripts/start_mlflow_tracking.sh..."
gsutil cp ./start_mlflow_tracking.sh gs://${BUCKET_NAME}/scripts/start_mlflow_tracking.sh
echo "Deleting temporal local script [start_mlflow_tracking.sh]"
rm ./start_mlflow_tracking.sh

#Step 7 - creating static external ip for mlflow server
echo "Creating static external ip for mlflow-tracking-server [mlflow-ip]"
gcloud compute addresses create mlflow-ip \
    --region europe-west1

# Step 8 - Compute Instance
echo "Deploying remote server [mlflow-tracking-server]..."
gcloud compute --project=$PROJECT_ID instances create mlflow-tracking-server \
    --zone=$ZONE \
    --machine-type=$MACHINE_TYPE \
    --subnet=default \
    --network-tier=PREMIUM \
    --metadata=startup-script-url=gs://${BUCKET_NAME}/scripts/start_mlflow_tracking.sh \
    --maintenance-policy=MIGRATE \
    --service-account=mlflow-tracking-sa@${PROJECT_ID}.iam.gserviceaccount.com \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --tags=mlflow-tracking-server \
    --image=cos-77-12371-1109-0 \
    --image-project=cos-cloud \
    --boot-disk-size=10GB \
    --boot-disk-type=pd-balanced \
    --boot-disk-device-name=mlflow-tracking-server \
    --no-shielded-secure-boot \
    --shielded-vtpm \
    --shielded-integrity-monitoring \
    --reservation-affinity=any \
    --address $(gcloud compute addresses describe mlflow-ip --format='get(address)')

# Step 8 - Firewall
echo "Creating firewall rules [allow-mlflow-tracking]..."
gcloud compute firewall-rules create allow-mlflow-tracking --network default --priority 1000 --direction ingress --action allow --target-tags mlflow-tracking-server --source-ranges 0.0.0.0/0 --rules tcp:5000 --enable-logging

This bash script defines all the process to configure this functionality automatically. (Once you execute it you don’t have to use this script anymore). The arguments it needs are: PROJECT_ID, BUCKET_NAME, ZONE and DB_ROOT_PASSWORD.

This script do the next for you:

  1. Creating Service account for mlflow service [mlflow-tracking-sa].

  2. Creating Back-end artifact bucket.

  3. Creating SQL instance with root password specified in argument 4.

  4. Creating mlflow database inner SQL instance.

  5. Creating service account privileges to use Back-end [roles/cloudsql.editor]

  6. Generating an automatic script called start_mlflow_tracking.sh and sending to gs://<BUCKET_NAME>/scripts/.

  7. Deleting local start_mlflow_tracking.sh file.

  8. Creating static external IP for mlflow-tracking-server.

  9. Deploying remote server [mlflow-tracking-server].

Step 8 is very important, this allows you to delete server instance and create again when you need it without redefining server IP in sinergym-template for remote container experiments. Notice that server instance creation use service account for mlflow, with this configuration mlflow can read from SQL server. In 4. Create your VM or MIG it is specified MLFLOW_TRACKING_URI container environment variable using that external static IP.

Warning

It is important execute this script before create sinergym-template instances in order to annotate mlflow-server-ip.

Note

If you want to change any backend configuration, you can change any parameter of the script bellow.

Note

Whether you have written --mlflow_store, Sinergym outputs will be sent to mlflow server as artifacts. These artifacts will be stored in the same bucket where is allocated gs://<BUCKET_NAME>.

12.6. Google Cloud Alerts

Google Cloud Platform include functionality in order to trigger some events and generate alerts in consequence. Then, a trigger has been created in our gcloud project which aim to advertise when an experiment has finished. This alert can be captured in several ways (Slack, SMS, Email, etc). If you want to do the same, please, check Google Cloud Alerts documentation here.