Using IAM to control SSH and RDP access

In an ideal world, all our VM instances are immutable, their management fully automated, and nobody ever has to access a VM by using SSH or RDP. Reality however typically looks a little different and SSH or RDP access is difficult to avoid.

When we give users SSH or RDP access to VM instances, there is more than immutability we have to worry about – we also have to ensure that access governance concerns are addressed:

How do we grant users access to VMs?
How do we ensure access is checked every time the user logs in?
How do we ensure access is revoked when the employee leaves or changes teams?
How do we avoid users delegating their own access to other users in an uncontrolled manner?

SSH and RDP are services provided by the guest operating system and not by the underlying platform. It’s therefore the operating system’s job to manage access and enforce access checks – which creates yet another challenge: how do you keep the configuration consistent across VMs?

Centralizing access management

In practice, there are two options to handle access management for VM instances. The first option is to deploy a custom or third-party solution to centralize access management. For example:

You can deploy a Linux pluggable authentication module (PAM) that uses a central LDAP server for authentication and authorization.
You can join VMs to an Active Directory domain and use Active Directory groups and group policies to control who is allowed to log in to which VM.

The second option is to defer access management to the underlying platform, which in the case of Google Cloud, is Cloud IAM:

For Linux you can choose between OS Login and managing access by using metadata-based keys.

If you use metadata-based keys, you restrict SSH access by controlling who is allowed to publish their SSH public keys to VM metadata. Once published, the SSH keys stays associated with the VM. There is no connection between the key and the user who published it.

If you use OS Login, a user’s SSH key is attached to their user profile, and you use IAM roles to define which VMs the user should be allowed to log in to.
For Windows, you can indirectly manage access by controlling who is allowed to request Windows credentials. Because the mechanism to request credentials is based on metadata, the mechanism is quite similar to using metadata-based keys for SSH.

Regardless of whether you use a custom access management solution or Cloud IAM, you can implement an additional layer of defense and access control by configuring VM instances to only be accessible over IAP TCP forwarding.

It’s great to have so many choices, but how do they stack up in addressing the four governance concerns outlined in the beginning?

Comparison

To see how the different options address the four governance concerns, let’s compare:

Which IAM permissions are required for an initial sign-in?
Which IAM permissions are required for subsequent sign-ins?
Is external access revoked when the corresponding Google user is suspended or deleted?
Is VPC-internal access revoked when the corresponding Google user is suspended or deleted?
Are users prevented from delegating their access to others?

First, let’s see how things look without IAP TCP forwarding:

	IAM permissions required for initial sign-in	IAM permissions required for subsequent sign-in	External access revoked?	Internal access revoked?	Prevents delegation?
Linux OS Login	`compute.instances.os[Admin]Login` on VM and `compute.oslogin.updateExternalUser`on VM¹ and `iam.serviceAccounts.actAs` on SA²	`compute.instances.os[Admin]Login` on VM and `compute.oslogin.updateExternalUser`on VM¹ and `iam.serviceAccounts.actAs` on SA²	✅	✅	✅
Linux w/ metadata-based keys	`compute.instances.setMetadata` on VM or `compute.projects.setCommonInstanceMetadata` on project and `iam.serviceAccounts.actAs` on SA²	none	❌	❌	❌
Linux w/ custom PAM	none	none	?⁵	?⁵	?⁵
Windows	`compute.instances.setMetadata` on VM and `iam.serviceAccounts.actAs` on SA²	none	❌	❌	❌
Windows w/ AD	none	none	?⁵	?⁵	?⁵

There aren’t too many green check marks in this table, and that’s worrying:

When you use a custom access management solution, then no IAM permission checks are performed at all. That might be ok if the access management solution is properly configured and integrates with your organization’s joiner-mover-leaver processes.
When you rely on metadata-based keys to control SSH access, you really only control initial access. Once a user’s published key has been published, the user is not subject to any further checks and can even freely delegate their own access to others. And that’s clearly a risk.
Similarly, if you allow users to request Windows credentials, you also only control initial access. Once a user has generated credentials, they are not subject to any further checks and can create additional local user accounts for others. Again, that’s a risk.

Now let’s see how things look if only permit SSH and RDP access over IAP TCP forwarding:

	IAM permissions required for initial sign-in	IAM permissions required for subsequent sign-in	External access revoked?	Internal access revoked?	Prevents delegation?
Linux OS Login	`iap.tunnelInstances.accessViaIAP` on VM/zone/project and `compute.instances.os[Admin]Login` on VM and `compute.oslogin.updateExternalUser`on VM¹ and `iam.serviceAccounts.actAs` on SA²	`iap.tunnelInstances.accessViaIAP` on VM/zone/project and `compute.instances.os[Admin]Login` on VM and `compute.oslogin.updateExternalUser`on VM¹ and `iam.serviceAccounts.actAs` on SA²	✅	✅	✅
Linux w/ metadata-based keys	`iap.tunnelInstances.accessViaIAP` on VM/zone/project and `compute.instances.setMetadata` on VM or `compute.projects.setCommonInstanceMetadata` on project and `iam.serviceAccounts.actAs` on SA²	`iap.tunnelInstances.accessViaIAP` on VM/zone/project	✅	✅³ ❌⁴	✅
Linux w/ custom PAM	`iap.tunnelInstances.accessViaIAP` on VM/zone/project	`iap.tunnelInstances.accessViaIAP` on VM/zone/project	✅	✅³ ?⁴	✅
Windows	`iap.tunnelInstances.accessViaIAP` on VM/zone/project and `compute.instances.setMetadata` on VM and `iam.serviceAccounts.actAs` on SA²	`iap.tunnelInstances.accessViaIAP` on VM/zone/project	✅	✅³ ❌⁴	✅
Windows w/ AD	`iap.tunnelInstances.accessViaIAP` on VM/zone/project	`iap.tunnelInstances.accessViaIAP` on VM/zone/project	✅	✅³ ?⁴	✅

These results look better: By enforcing all RDP and SSH access to use IAP TCP forwarding, we can ensure that at least some IAM permission checks are performed on each access.

Takeaway

When we grant users SSH or RDP access to VM instances, we have to ensure that users can’t easily delegate their own access to others, and that access is revoked when the user changes teams, or leaves the organization. This task gets easier if allow Cloud IAM to help us:

Use OS Login when you can.
Use IAP TCP Forwarding, particularly in cases where you cannot use OS Login.
When you use a custom access management solution like Active Directory or a custom Linux PAM, make sure the user account lifecycle is kept in sync with that of Google users.

Footnotes:

^{1 Only required if the user belongs to a different Cloud Identity/Workspace account}
^{2 Only required if the VM has an attached
service account} ^{3 If firewall rules set up to only permit access via IAP}
^{4 If firewall rules permit direct access}
^{5 depends on third-party solution}

Thanks to Marco Ferrari for reviewing this blog post.

Any opinions expressed on this blog are Johannes' own. Refer to the respective vendor’s product documentation for authoritative information.

« Back to home

Using IAM to control SSH and RDP access

Centralizing access management

Comparison

Takeaway

Related posts