Enforcing the use of OS Login for accessing Linux VMs on Google Cloud
Google Cloud lets us enable OS Login for a project by adding an entry to the project’s metadata:
gcloud compute project-info add-metadata \
--metadata enable-oslogin=TRUE
Project metadata “inherits down” down to all VM instances, so the command above enables OS Login for all VMs in that project. But is that sufficient to enforce the use of OS Login? Or does it leave any loopholes where users can still use metadata-based SSH keys to log in?
To find out, we need to take a look at how Compute Engine handles metadata.
Managing metadata
Compute Engine lets us specify two sets of metadata:
- Project metadata, also called common instance metadata, which applies to all instances in a project.
- Instance metadata, which is specific to a VM instance.
It’s tempting to assume that Compute Engine somehow merges these two sets automatically, but that’s not actually what’s happening. Instead, both sets are kept separate:
- The metadata server provides two separate endpoints, one for each set of metadata:
/computeMetadata/v1/project/attributes/
serves project metadata while/computeMetadata/v1/instance/attributes/
serves instance metadata. - The Compute Engine API lets us query project metadata by calling
projects.get
and includes instance metadata in the response ofinstances.get
.
It’s up to the application to either ignore one set of metadata, or to merge the two. When merging project and instance metadata, the convention is that instance metadata takes priority. So any metadata specified on the project level can be “overridden” on the instance level.
Crucially, this overriding-behavior also applies to flags: If we add the entry
enable-oslogin=TRUE
to project metadata, we can “opt out” individual VMs by adding
enable-oslogin=FALSE
to their instance metadata.
Changing instance metadata is considered a privileged operation by Compute Engine. That’s not only because metadata is used to control a range of features and is used for resetting Windows passwords, but also because metadata can contain startup scripts. Startup scripts run on the VM, and can use the VM’s attached service account, so changing a startup script effectively gives us the privilege to impersonate that service account.
Because modifying instance metadata is such a powerful operation, we need the Compute Instance Admin role and the Service Account User role to do it.
Still, it’s not uncommon that at least some users have these roles, at least on some VM instances.
And there’s nothing that stops these users from adding enable-oslogin=FALSE
to the instance metadata,
effectively undermining OS Login and creating a persistence threat.
Using org policy constraints
A better way to manage OS Login is to use an organizational policy constraint:
gcloud resource-manager org-policies enable-enforce \
compute.requireOsLogin --project PROJECT_ID
This constraint automatically adds an entry enable-oslogin=TRUE
to the project metadata:
This entry looks exactly like a manually-added metadata entry. But there’s a key difference – this entry is now read-only! Even as a project owner, trying to change the entry now fails:
$ gcloud compute project-info add-metadata \
--metadata enable-oslogin=FALSE
ERROR: (gcloud.compute.project-info.add-metadata) Could not fetch resource:
- Constraint constraints/compute.requireOsLogin violated for project 98….
Similarly, trying to override the entry for a specific instance fails, despite having all the necessary roles:
$ gcloud compute instances add-metadata instance-1 \
--metadata enable-oslogin=FALSE
ERROR: (gcloud.compute.instances.add-metadata) Could not fetch resource:
- Constraint constraints/compute.requireOsLogin violated for project 9899….
Clearly, that’s much better and safer than relying on metadata alone – if our goal is to enforce
the use of OS Login, then using the compute.requireOsLogin
organizational policy constraint therefore is the way to go.