Google Cloud Compute Engine feature flags controlled by metadata

When you create a VM instance on Google Cloud, you can optionally specify instance metadata. Instance metadata is a list of key/value pairs and the most common use case for using metadata is passing a startup or shutdown script to a VM.

But startup and shutdown scripts are not the only platform features that rely on metadata. Compute Engine also uses metadata as a vehicle to implement a range of feature flags as the following list shows:

Metadata key Implemented by Windows Linux
enable-os-inventory OS agent
enable-oslogin OS agent
enable-oslogin-2fa OS agent
block-project-ssh-keys OS agent
ssh-keys OS agent
windows-keys OS agent
disable-account-manager OS agent
enable-diagnostics OS agent
disable-address-manager OS agent
enable-wsfc OS agent
wsfc-addrs OS agent
wsfc-agent-port OS agent
disable-agent-updates Googet
google-logging-enable Ops Logging agent
google-monitoring-enable Ops Monitoring agent
serial-port-enable GCE
enable-guest-attributes GCE
VmDnsSetting GCE
sysprep-specialize-script-url OS agent
sysprep-specialize-script-cmd OS agent
sysprep-specialize-script-bat OS agent
sysprep-specialize-script-ps1 OS agent
windows-startup-script-url OS agent
windows-startup-script-cmd OS agent
windows-startup-script-bat OS agent
windows-startup-script-ps1 OS agent
startup-script OS agent
startup-script-url OS agent
shutdown-script OS agent
shutdown-script-url OS agent
windows-shutdown-script-cmd OS agent
windows-shutdown-script-url OS agent

(Note: These are the flags I was aware of at the time of writing; the list is not meant to be exhaustive and is subject to change)

If you look at this list, you might be wondering why so many platform features are controlled by metadata keys – is not metadata meant to be used for user-defined configuration? Why are not there dedicated API attributes to control all these features?

To get an idea why these feature flags might have been implemented based on metadata, let us see what the requirements for storing feature flags are. As an example, let us consider the enable-oslogin flag which controls whether OS Login should be enabled or not:

  1. The feature flag must be visible by the Compute Engine agent. The agent implements the bulk of the OS Login functionality so it must know whether to engage or disengage this functionality. To make things a little more complicated, the agent must be able to read the value of the flag even if the VM does not have a service account attached.
  2. Only privileged users must be able to set the flag as it is a security-sensitive setting.
  3. SSH clients and tool such as gcloud must be able to read the flag so that they adjust their behavior: if OS Login is enabled, a user’s public must be published to the OS Login API, if it’s disabled, public keys must be added to the ssh-keys metadata entry.

As it turns out, these requirements are perfectly met by metadata:

  1. A VM instance can access its metadata by querying the metadata server. No authentication or authorization required, so the absence of an associated service account does not matter.
  2. Changing an instance’s metadata requires the compute.instances.setMetadata permission. Similarly, changing a project’s common instance metadata requires the compute.projects.setCommonInstanceMetadata permission. Only Compute Admin, Compute Instance Admin and a few service agent roles have these permissions – so it’s fair to say that changing an instance’s metadata is a privileged operation.
  3. Reading metadata only requires the compute.instances.get permission. Many roles contain this permission, including the lowly Compute Viewer role.

In contrast, simply adding an attribute to the Compute Engine instance API would fail the first requirement: Without a service account, the agent would not be able to query the API. So the attribute would have to additionally be surfaced by the metadata server.

OS Login is no exception – if you look at other flags such as block-project-ssh-keys, disable-account-manager or enable-os-inventory, you will notice that they have very similar requirements.

There are some feature flags however which for which things are less clear-cut: For example, enable-wsfc, google-compute-engine-auto-updater or VmDnsSetting all require (1) and (2), but the flags are irrelevant to clients, so (3) does not apply to them.

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.
« Back to home