12. Sinergym with Google Cloud

In this project, we have defined some functionality based in gcloud API python in sinergym/utils/gcloud.py. Our team aim to configure a Google Cloud account and combine with Sinergym easily.

The main idea is to construct a virtual machine (VM) using Google Cloud Engine (GCE) in order to execute our Sinergym container on it. At the same time, this remote container will update Weights and Biases tracking server with artifacts if we configure that experiment with those options.

When an instance has finished its job, container auto-remove its host instance from Google Cloud Platform if experiments has been configured with this option.

Let’s see a detailed explanation above.

12.1. Preparing Google Cloud

12.1.1. First steps (configuration)

Firstly, it is necessary that you have a Google Cloud account set up and SDK configured (auth, invoicing, project ID, etc). If you don’t have this, it is recommended to check their documentation. Secondly, It is important to have installed Docker in order to be able to manage these containers in Google Cloud.

You can link gcloud with docker accounts using the next (see authentication methods):

$ gcloud auth configure-docker

If you don’t want to have several problems in the future with the image build and Google Cloud functionality in general, we recommend you to allow permissions for google cloud build at the beginning (see this documentation).

On the other hand, we are going to enable Google Cloud services in API library. These are API’s which we need currently:

Google Container Registry API.

Artifact Registry API

Cloud Run API

Compute Engine API

Cloud Logging API

Cloud Monitoring API

Cloud Functions API

Cloud Pub/Sub API

Cloud SQL Admin API

Cloud Firestore API

Cloud Datastore API

Service Usage API

Cloud storage

Gmail API

Hence, you will have to allow this services into your Google account. You can do it using gcloud client SDK:

$ gcloud services list
$ gcloud services enable artifactregistry.googleapis.com \
                         cloudapis.googleapis.com \
                         cloudbuild.googleapis.com \
                         containerregistry.googleapis.com \
                         gmail.googleapis.com \
                         sql-component.googleapis.com \
                         sqladmin.googleapis.com \
                         storage-component.googleapis.com \
                         storage.googleapis.com \
                         cloudfunctions.googleapis.com \
                         pubsub.googleapis.com \
                         run.googleapis.com \
                         serviceusage.googleapis.com \
                         drive.googleapis.com \
                         appengine.googleapis.com

Or you can use Google Cloud Platform Console:

If you have installed Sinergym and Sinergym extras. Google Cloud SDK must be linked with other python modules in order to some functionality works in the future. Please, execute the next in your terminal:

$ gcloud auth application-default login

12.1.2. Use our container in Google Cloud Platform

Our Sinergym container is uploaded in Container Registry as a public one currently. You can use it locally:

$ docker run -it eu.gcr.io/sinergym/sinergym:latest

If you want to use it in a GCE VM, you can execute the next:

$ gcloud compute instances create-with-container sinergym \
    --container-image eu.gcr.io/sinergym/sinergym \
    --zone europe-west1-b \
    --container-privileged \
    --container-restart-policy never \
    --container-stdin \
    --container-tty \
    --boot-disk-size 20GB \
    --boot-disk-type pd-ssd \
    --machine-type n2-highcpu-8

We have available containers in Docker Hub too. Please, visit our repository

Note

It is possible to change parameters in order to set up your own VM with your preferences (see create-with-container).

Warning

--boot-disk-size is really important, by default VM set 10GB and it isn’t enough at all for Sinergym container. This derive in a silence error for Google Cloud Build (and you would need to check logs, which incident is not clear).

12.1.3. Use your own container

Suppose you have this repository forked and you want to upload your own container on Google Cloud and to use it. You can use cloudbuild.yaml with our Dockerfile for this purpose:

steps:
  # Write in cache for quick updates
  - name: "eu.gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: "bash"
    args: ["-c", "docker pull eu.gcr.io/${PROJECT_ID}/sinergym:latest || exit 0"]
    # Build image (using cache if it's possible)
  - name: "eu.gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: "docker"
    args:
      [
        "build",
        "-t",
        "eu.gcr.io/${PROJECT_ID}/sinergym:latest",
        "--cache-from",
        "eu.gcr.io/${PROJECT_ID}/sinergym:latest",
        "--build-arg",
        "SINERGYM_EXTRAS=[DRL,gcloud]",
        ".",
      ]

    # Push image built to container registry
  - name: "eu.gcr.io/google.com/cloudsdktool/cloud-sdk"
    entrypoint: "docker"
    args: ["push", "eu.gcr.io/${PROJECT_ID}/sinergym:latest"]

    # This container is going to be public (Change command in other case)
  # - name: "gcr.io/cloud-builders/gsutil"
  #   args:
  #     [
  #       "iam",
  #       "ch",
  #       "AllUsers:objectViewer",
  #       "gs://artifacts.${PROJECT_ID}.appspot.com",
  #     ]
#Other options for execute build (not container)
options:
  diskSizeGb: "10"
  machineType: "E2_HIGHCPU_8"
timeout: 86400s
images: ["eu.gcr.io/${PROJECT_ID}/sinergym:latest"]

This file does the next:

Write in cache for quick updates (if a older container was uploaded already).

Build image (using cache if it’s available)

Push image built to Container Registry

Make container public inner Container Registry.

There is an option section at the end of the file. Do not confuse this part with the virtual machine configuration. Google Cloud uses a helper VM to build everything mentioned above. At the same time, we are using this YAML file in order to upgrade our container because of PROJECT_ID environment variable is defined by Google Cloud SDK, so its value is your current project in Google Cloud global configuration.

Warning

In the same way VM needs more memory, Google Cloud Build needs at least 10GB to work correctly. In other case it may fail.

Warning

If your local computer doesn’t have enough free space it might report the same error (there isn’t difference by Google cloud error manager), so be careful.

In order to execute cloudbuild.yaml, you have to do the next:

$ gcloud builds submit --region europe-west1 \
    --config ./cloudbuild.yaml .

--substitutions can be used in order to configure build parameters if they are needed.

Note

“.” in --config refers to Dockerfile, which is necessary to build container image (see build-config).

Note

In cloudbuild.yaml there is a variable named PROJECT_ID. However, it is not defined in substitutions. This is because it’s a predetermined variable by Google Cloud. When build begins “$PROJECT_ID” is set to current value in gcloud configuration (see substitutions-variables).

12.1.4. Create your VM or MIG

To create a VM that uses this container, here there is an example:

$ gcloud compute instances create-with-container sinergym \
    --container-image eu.gcr.io/sinergym/sinergym \
    --zone europe-west1-b \
    --container-privileged \
    --container-restart-policy never \
    --container-stdin \
    --container-tty \
    --boot-disk-size 20GB \
    --boot-disk-type pd-ssd \
    --machine-type n2-highcpu-8

Note

--container-restart-policy never it’s really important for a correct functionality.

Warning

If you decide enter in VM after create it immediately, it is possible container hasn’t been created yet. You can think that is an error, Google cloud should notify this. If this issue happens, you should wait for a several minutes.

To create a MIG, you need to create a machine set up template firstly, for example:

$ gcloud compute instance-templates create-with-container sinergym-template \
--container-image eu.gcr.io/sinergym/sinergym \
--container-privileged \
--service-account storage-account@sinergym.iam.gserviceaccount.com \
--scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.full_control \
--container-env=gce_zone=europe-west1-b,gce_project_id=sinergym \
--container-restart-policy never \
--container-stdin \
--container-tty \
--boot-disk-size 20GB \
--boot-disk-type pd-ssd \
--machine-type n2-highcpu-8

Note

--service-account, --scopes and --container-env parameters will be explained in Containers permission to bucket storage output. Please, read that documentation before using these parameters, since they require a previous configuration.

Then, you can create a group-instances as large as you want:

$ gcloud compute instance-groups managed create example-group \
    --base-instance-name sinergym-vm \
    --size 3 \
    --template sinergym-template

Warning

It is possible that quote doesn’t let you have more than one VM at the same time. Hence, the rest of VM’s probably will be initializing always but never ready. If it is your case, we recommend you check your quotes here

12.1.5. Initiate your VM

Your virtual machine is ready! To connect you can use ssh (see gcloud-ssh):

$ gcloud compute ssh <machine-name>

Google Cloud use a Container-Optimized OS (see documentation) in VM. This SO have docker pre-installed with Sinergym container.

To use this container in our machine you only have to do:

$ docker attach <container-name-or-ID>

And now you can execute your own experiments in Google Cloud! For example, you can enter in remote container with gcloud ssh and execute DRL_battery.py for the experiment you want.

12.2. Executing experiments in remote containers

DRL_battery.py and load_agent.py will be allocated in every remote container and it is used to execute experiments and evaluations, being possible to combine with Google Cloud Bucket, Weights and Biases, auto-remove, etc:

Note

DRL_battery.py can be used in local experiments and send output data and artifact to remote storage such as wandb without configure cloud computing too.

The structure of the JSON to configure the experiment or evaluation is specified in How to use section.

Warning

For a correct auto_delete functionality, please, use MIG’s instead of individual instances.

12.2.1. Containers permission to bucket storage output

As you see in sinergym template explained in Create your VM or MIG, it is specified --scope, --service-account and --container-env. This aim to remote_store option in DRL_battery.py works correctly. Those parameters provide each container with permissions to write in the bucket and manage Google Cloud Platform (auto instance remove function). Container environment variables indicate zone and project_id.

Hence, it is necessary to set up this service account and give privileges in order to that objective. Then, following Google authentication documentation we will do the next:

$ gcloud iam service-accounts create storage-account
$ gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:storage-account@PROJECT_ID.iam.gserviceaccount.com" --role="roles/owner"
$ gcloud iam service-accounts keys create PROJECT_PATH/google-storage.json --iam-account=storage-account@PROJECT_ID.iam.gserviceaccount.com
$ export GOOGLE_CLOUD_CREDENTIALS= PROJECT_PATH/google-storage.json

In short, we create a new service account called storage-account. Then, we dote this account with roles/owner permission. The next step is create a file key (json) called google-storage.json in our project root (gitignore will ignore this file in remote). Finally, we export this file in GOOGLE_CLOUD_CREDENTIALS in our local computer in order to gcloud SDK knows that it has to use that token to authenticate.

12.2.2. Visualize remote wandb log in real-time

You only have to enter in Weights & Biases and log in with your GitHub account.

12.3. Google Cloud Alerts

Google Cloud Platform include functionality in order to trigger some events and generate alerts in consequence. Then, a trigger has been created in our gcloud project which aim to advertise when an experiment has finished. This alert can be captured in several ways (Slack, SMS, Email, etc). If you want to do the same, please, check Google Cloud Alerts documentation here.