Compute Engine feature flags controlled by metadata

When you create a VM instance on Google Cloud, you can optionally specify instance metadata. Instance metadata is a list of key/value pairs and the most common use case for using metadata is passing a startup or shutdown script to a VM.

But startup and shutdown scripts are not the only platform features that rely on metadata. Compute Engine also uses metadata as a vehicle to implement a range of feature flags as the following list shows:

Metadata key	Implemented by	Windows	Linux
enable-os-inventory	OS agent	✔	✔
enable-oslogin	OS agent		✔
enable-oslogin-2fa	OS agent		✔
block-project-ssh-keys	OS agent		✔
ssh-keys	OS agent		✔
windows-keys	OS agent	✔
disable-account-manager	OS agent	✔

enable-diagnostics	OS agent	✔

disable-address-manager	OS agent	✔
enable-wsfc	OS agent	✔
wsfc-addrs	OS agent	✔
wsfc-agent-port	OS agent	✔
disable-agent-updates	Googet	✔	✔
google-logging-enable	Ops Logging agent	✔	✔
google-monitoring-enable	Ops Monitoring agent	✔	✔
serial-port-enable	GCE	✔	✔
enable-guest-attributes	GCE	✔	✔
VmDnsSetting	GCE	✔	✔
sysprep-specialize-script-url	OS agent	✔
sysprep-specialize-script-cmd	OS agent	✔
sysprep-specialize-script-bat	OS agent	✔
sysprep-specialize-script-ps1	OS agent	✔
windows-startup-script-url	OS agent	✔
windows-startup-script-cmd	OS agent	✔
windows-startup-script-bat	OS agent	✔
windows-startup-script-ps1	OS agent	✔
startup-script	OS agent		✔
startup-script-url	OS agent		✔
shutdown-script	OS agent		✔
shutdown-script-url	OS agent		✔
windows-shutdown-script-cmd	OS agent	✔
windows-shutdown-script-url	OS agent	✔

(Note: These are the flags I was aware of at the time of writing; the list is not meant to be exhaustive and is subject to change)

If you look at this list, you might be wondering why so many platform features are controlled by metadata keys – is not metadata meant to be used for user-defined configuration? Why are not there dedicated API attributes to control all these features?

To get an idea why these feature flags might have been implemented based on metadata, let us see what the requirements for storing feature flags are. As an example, let us consider the enable-oslogin flag which controls whether OS Login should be enabled or not:

The feature flag must be visible by the Compute Engine agent. The agent implements the bulk of the OS Login functionality so it must know whether to engage or disengage this functionality. To make things a little more complicated, the agent must be able to read the value of the flag even if the VM does not have a service account attached.
Only privileged users must be able to set the flag as it is a security-sensitive setting.
SSH clients and tool such as gcloud must be able to read the flag so that they adjust their behavior: if OS Login is enabled, a user’s public must be published to the OS Login API, if it’s disabled, public keys must be added to the ssh-keys metadata entry.

As it turns out, these requirements are perfectly met by metadata:

A VM instance can access its metadata by querying the metadata server. No authentication or authorization required, so the absence of an associated service account does not matter.
Changing an instance’s metadata requires the compute.instances.setMetadata permission. Similarly, changing a project’s common instance metadata requires the compute.projects.setCommonInstanceMetadata permission. Only Compute Admin, Compute Instance Admin and a few service agent roles have these permissions – so it’s fair to say that changing an instance’s metadata is a privileged operation.
Reading metadata only requires the compute.instances.get permission. Many roles contain this permission, including the lowly Compute Viewer role.

In contrast, simply adding an attribute to the Compute Engine instance API would fail the first requirement: Without a service account, the agent would not be able to query the API. So the attribute would have to additionally be surfaced by the metadata server.

OS Login is no exception – if you look at other flags such as block-project-ssh-keys, disable-account-manager or enable-os-inventory, you will notice that they have very similar requirements.

There are some feature flags however which for which things are less clear-cut: For example, enable-wsfc, google-compute-engine-auto-updater or VmDnsSetting all require (1) and (2), but the flags are irrelevant to clients, so (3) does not apply to them.

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.

« Back to home

Compute Engine feature flags controlled by metadata

Related posts