Compute Engine feature flags controlled by metadata
When you create a VM instance on Google Cloud, you can optionally specify instance metadata. Instance metadata is a list of key/value pairs and the most common use case for using metadata is passing a startup or shutdown script to a VM.
But startup and shutdown scripts are not the only platform features that rely on metadata. Compute Engine also uses metadata as a vehicle to implement a range of feature flags as the following list shows:
(Note: These are the flags I was aware of at the time of writing; the list is not meant to be exhaustive and is subject to change)
If you look at this list, you might be wondering why so many platform features are controlled by metadata keys – is not metadata meant to be used for user-defined configuration? Why are not there dedicated API attributes to control all these features?
To get an idea why these feature flags might have been implemented based on metadata, let us see what the requirements for storing feature flags are. As an example, let us consider the enable-oslogin
flag which controls whether OS Login should be enabled or not:
- The feature flag must be visible by the Compute Engine agent. The agent implements the bulk of the OS Login functionality so it must know whether to engage or disengage this functionality. To make things a little more complicated, the agent must be able to read the value of the flag even if the VM does not have a service account attached.
- Only privileged users must be able to set the flag as it is a security-sensitive setting.
- SSH clients and tool such as
gcloud
must be able to read the flag so that they adjust their behavior: if OS Login is enabled, a user’s public must be published to the OS Login API, if it’s disabled, public keys must be added to thessh-keys
metadata entry.
As it turns out, these requirements are perfectly met by metadata:
- A VM instance can access its metadata by querying the metadata server. No authentication or authorization required, so the absence of an associated service account does not matter.
- Changing an instance’s metadata requires the
compute.instances.setMetadata
permission. Similarly, changing a project’s common instance metadata requires thecompute.projects.setCommonInstanceMetadata
permission. Only Compute Admin, Compute Instance Admin and a few service agent roles have these permissions – so it’s fair to say that changing an instance’s metadata is a privileged operation. - Reading metadata only requires the
compute.instances.get
permission. Many roles contain this permission, including the lowly Compute Viewer role.
In contrast, simply adding an attribute to the Compute Engine instance API would fail the first requirement: Without a service account, the agent would not be able to query the API. So the attribute would have to additionally be surfaced by the metadata server.
OS Login is no exception – if you look at other flags such as block-project-ssh-keys
, disable-account-manager
or enable-os-inventory
, you will notice that they have very similar requirements.
There are some feature flags however which for which things are less clear-cut: For example, enable-wsfc
, google-compute-engine-auto-updater
or VmDnsSetting
all require (1) and (2), but the flags are irrelevant to clients, so (3) does not apply to them.