Automatically joining a VM to Active Directory on Google Cloud
Cloud computing is all about being able to dynamically scale, provision, and decommission resources or entire environments on demand. But the idea that infrastructure is dynamic clashes with some assumptions Active Directory is built around, and creates a challenge if you run Windows workloads in the cloud.
When Active Directory was designed and developed 20 years ago, virtual machines, virtual storage, and virtualized networks were not a thing – infrastructure was physical, and therefore static. In a static environment, joining new computers to the domain and removing old computers is an infrequent activity.
Today, automating the process of joining new VMs to a domain is key if you want to take full advantage of features like auto-scaling but also want to keep Active Directory for managing access and configuration of your Windows machines.
In the last post, we looked at how VM initialization works on Compute Engine and how Compute Engine lets you customize the initialization process by using specialize scripts.
At first glance, it seems like it should be fairly straightforward to use a specialize script
to automate the domain joining process: Initiating a domain join is a one-liner in PowerShell
(Add-Computer -Domain example.com
). Because executing the specialize script is followed by
a reboot anyway, it seems like we would not even have to worry about the fact that domain
joining requires a reboot.
Unfortunately, there are some challenges associated with this approach.
Challenge #1: Securing domain credentials
The first challenge is that Add-Computer
requires credentials of a domain user. This user
needs to have permission to create computer objects in the respective OU, and also needs to
hold the Add workstations to domain
user right. The issue with these permissions is that they give the user the ability to join
any computer to the domain: If the credentials are leaked, they could be used to join a rogue
computer to the domain. A rogue computer could then be used for launching various other attacks
to steal credentials and ultimately escalate privileges.
So where do you store the credentials of the domain user?
- You could simply put the credentials into the specialize script. But that means any user with read access to the GCP project and (because the specialize script is part of the VM metadata) any Windows user on the VM could see them.
- You could put the credentials into Secret Manager and grant the VM’s service account read access to the secret. But that means any code running on the VM can access the secret as well, which in turn means any Windows user could steal it.
- You could bake a custom image that contains the credentials. If you encrypt the credentials by using DPAPI so that only SYSTEM can read them, this approach makes the credentials inaccessible to any other Windows users on the VM. Alas, any GCP user with read access to the VM image can still access and decrypt the secret.
Clearly, none of these options are very good.
Challenge #2: Scavenging stale computer accounts
Each VM instance that is automatically joined to Active Directory also needs to be removed from Active Directory when it is terminated/deleted – otherwise, your domain will soon be littered with stale computer objects.
While specialize scripts provide a convenient way to hook into the initialization process of a VM instance, there is no such hook for VM termination. That means the only way to keep the domain clean is to periodically scan for stale computer objects.
But how can we tell if a computer account is actually stale?
- You could check the last logon date, but that is not as easy or reliable as it might seem.
- You could compare the list of computer objects in Active Directory with the list of VMs in Compute Engine. But the computer object does not tell you which GCP project to find the VM instance in – and to make things more complicated, the name of the computer object and the name of the corresponding VM instance might differ in Compute Engine.
Introducing a trusted third party
One way to overcome the first challenge is to introduce a trusted third party as a mediator:
- Rather than trying to initiate a domain join by itself, a new VM instance requests the mediator to initiate a join. To authenticate to the mediator, the VM uses its machine identity instead of domain credentials.
- The mediator authenticates the request, checks whether the VM is eligible to join the domain, creates a computer account in the Active Directory domain, and passes the computer credentials back to the VM.
- The VM uses the computer credentials to complete the domain join.
In this process, only the mediator needs domain credentials, which makes securing the credentials fairly trivial.
Introducing a mediator has another advantage that lets you address the second challenge as well: Having the mediator create the computer accounts in Active Directory gives you the opportunity to dd additional LDAP attributes to the computer object on creation. If you add additional attributes to track the VM instance name, project, and zone, then you can use that information to reliably track stale computer objects later.
To see how you can implement this approach by using a custom Cloud Function as trusted third party, check out my new article Configuring Active Directory for VMs to automatically join a domain on the Google Cloud website. You can find the associated code on GitHub.