Google Cloud Enforcing the use of OS Login for accessing Linux VMs on Google Cloud

Google Cloud lets us enable OS Login for a project by adding an entry to the project’s metadata:

gcloud compute project-info add-metadata \
  --metadata enable-oslogin=TRUE

Project metadata “inherits down” down to all VM instances, so the command above enables OS Login for all VMs in that project. But is that sufficient to enforce the use of OS Login? Or does it leave any loopholes where users can still use metadata-based SSH keys to log in?

To find out, we need to take a look at how Compute Engine handles metadata.

Managing metadata

Compute Engine lets us specify two sets of metadata:

  • Project metadata, also called common instance metadata, which applies to all instances in a project.
  • Instance metadata, which is specific to a VM instance.

It’s tempting to assume that Compute Engine somehow merges these two sets automatically, but that’s not actually what’s happening. Instead, both sets are kept separate:

  • The metadata server provides two separate endpoints, one for each set of metadata: /computeMetadata/v1/project/attributes/ serves project metadata while /computeMetadata/v1/instance/attributes/ serves instance metadata.
  • The Compute Engine API lets us query project metadata by calling projects.get and includes instance metadata in the response of instances.get.

It’s up to the application to either ignore one set of metadata, or to merge the two. When merging project and instance metadata, the convention is that instance metadata takes priority. So any metadata specified on the project level can be “overridden” on the instance level.

Crucially, this overriding-behavior also applies to flags: If we add the entry enable-oslogin=TRUE to project metadata, we can “opt out” individual VMs by adding enable-oslogin=FALSE to their instance metadata.

Changing instance metadata is considered a privileged operation by Compute Engine. That’s not only because metadata is used to control a range of features and is used for resetting Windows passwords, but also because metadata can contain startup scripts. Startup scripts run on the VM, and can use the VM’s attached service account, so changing a startup script effectively gives us the privilege to impersonate that service account.

Because modifying instance metadata is such a powerful operation, we need the Compute Instance Admin role and the Service Account User role to do it.

Still, it’s not uncommon that at least some users have these roles, at least on some VM instances. And there’s nothing that stops these users from adding enable-oslogin=FALSE to the instance metadata, effectively undermining OS Login and creating a persistence threat.

Using org policy constraints

A better way to manage OS Login is to use an organizational policy constraint:

gcloud resource-manager org-policies enable-enforce \
  compute.requireOsLogin --project PROJECT_ID

This constraint automatically adds an entry enable-oslogin=TRUE to the project metadata:

Metadata

This entry looks exactly like a manually-added metadata entry. But there’s a key difference – this entry is now read-only! Even as a project owner, trying to change the entry now fails:

$ gcloud compute project-info add-metadata \
  --metadata enable-oslogin=FALSE                                                                                                   

ERROR: (gcloud.compute.project-info.add-metadata) Could not fetch resource:
 - Constraint constraints/compute.requireOsLogin violated for project 98….

Similarly, trying to override the entry for a specific instance fails, despite having all the necessary roles:

$ gcloud compute instances add-metadata instance-1 \
   --metadata enable-oslogin=FALSE 

ERROR: (gcloud.compute.instances.add-metadata) Could not fetch resource:
 - Constraint constraints/compute.requireOsLogin violated for project 9899….

Clearly, that’s much better and safer than relying on metadata alone – if our goal is to enforce the use of OS Login, then using the compute.requireOsLogin organizational policy constraint therefore is the way to go.

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.
« Back to home