VMworld 2017: Virtualizing AD

Session: VIRT1374BU: Matt Liebowitz

AD Replication
-Update sequence number (USN) tracks updates and are globally unique
-InvocationID – Identifies DC’s instance in the AD database
-USN + InvocationID = Replicable transaction

Why Virtualize AD?
-Fully supported by Microsoft
-AD is friendly towards virtualization (low I/O, low resource)
-Physical DCs waste resources

Common objections to virtualizing DCs
-Fear of stolen vmdk
-Privilege escalation – VC admins do not need to be domain admins and vice versa
-Must keep xx role physical – no technical or support reason. Myth
-Timekeeping is hard in VMs

Time Sync
-VM guest will get time re-set with vMotion and resuming from suspend. If there’s a ESXi host with bad time/date, it can cause weird “random” problems when DRS moves DCs around.
-There’s a set of ~8 advanced VMX settings to totally disable time sync from guest to ESXi host. Recommended for AD servers. See screenshot below.

Virtual machine security and Encryption
-vSphere supports VMDK encryption
-Virtualization based security – WS2016 feature – supported in future vSphere version

Best Practices

Domain Controller Sizing
USN Rollback
Happens when a DC is sent back into time (e.g. snapshot rollback)
-DCs can get orphaned if this happens since replication is broken
-If this happens, it’s a support call to MS and a very long, long process to fix it

VM Generation ID
-A way for the hypervisor to expose a 128-bit generation ID to the VM guest
-Need vSphere 5.0 U2 or later
-Active Directory tracks this number and prevents USN rollback
-Can be used for safety and VM cloning

Domain Controller Cloning
-Microsoft has an established process to do this, using hypervisor snapshots.
-Do NOT hot clone your DCs! Totally unsupported and will cause a huge mess.

vSphere 5.5 Install Pt. 12: Configure SSO

10-12-2013 8-02-44 AMNow that the SSO service and web client are installed, it’s time to do a little SSO configuration. In this installment we will configure the SSO STS certificate chain, add an Active Directory identity and source, and delegate SSO administrative rights to a AD group.

If you recall the vCenter 5.1 installation order, you will realize they’ve now moved up the web client install. This was done consciously so you could troubleshoot/configure the SSO service prior to vCenter being installed. Great idea VMware!

Blog Series

SQL 2012 AlwaysOn Failover Cluster for vCenter
vSphere 5.5 Install Pt. 1: Introduction
vSphere 5.5 Install Pt. 2: SSO 5.5 Reborn 
vSphere 5.5 Install Pt. 3: vCenter Upgrade Best Practices and Tips
vSphere 5.5 Install Pt. 4: ESXi 5.5 Upgrade Best Practices and Tips 
vSphere 5.5 Install Pt. 5: SSL Deep Dive
vSphere 5.5 Install Pt. 6: SSL Certificate Template
vSphere 5.5 Install Pt. 7: Install SSO
vSphere 5.5 Install Pt. 8: Online SSL Minting
vSphere 5.5 Install Pt. 9: Offline SSL Minting
vSphere 5.5 Install Pt. 10: Update SSO Certificate
vSphere 5.5 Install Pt. 11: Install Web Client
vSphere 5.5 Install Pt. 12: Configure SSO
vSphere 5.5 Install Pt. 13: Install Inventory Service
vSphere 5.5 Install Pt. 14: Create Databases
vSphere 5.5 Install Pt. 15: Install vCenter
vSphere 5.5 Install Pt. 16: vCenter SSL
vSphere 5.5 Install Pt. 17: Install VUM
vSphere 5.5 Install Pt. 18: VUM SSL
vSphere 5.5 Install Pt. 19: ESXi SSL Certificate

Permalink to this series: vexpert.me/Derek55
Permalink to the Toolkit script: vexpert.me/toolkit55

Configure SSO STS Chain

For some reason the VMware certificate tool does not automatically import the trusted CA chain into the SSO STS store. So we need to manually do that. My Toolkit script creates the complex Java keystore file, which is quite tedious. See Part 8 for the low down on my vCenter 5.5 Toolkit script. So all we need to do here is import the Java keystore file. I’m opting to leave the default self-signed chain in place, just in case there is a dependency.

1. Login to the vSphere web client with the administrator@vsphere.local account. In the left pane click Administration.

10-12-2013 8-04-42 AM

2. Under Single Sign-On click Configuration. Then click on the Certificates tab and then STS Signing.

10-12-2013 8-08-04 AM

3. Click on the green Plus sign and navigate to the vCenterSSO certificate directory the Toolkit script created. Select the server-identity.jks file. When prompted for a password enter testpassword.

10-12-2013 8-10-08 AM

4. Depending on your CA configuration you should see two or three certificates listed. In my case I have three, since I have a root and intermediate CA. Click on the ssoserver line and then click OK. Enter testpassword again.

10-12-2013 8-12-34 AM

If the import is successful you should see two certificate chains.

10-12-2013 8-14-37 AM

5. Reboot your vCenter server so that all the services are refreshed and pickup the new certificate chain.

Add Identity Source

In vSphere 5.5 your Active Directory identity source is not automatically added. So we will need to add AD as a source so you can authenticate with domain-based accounts.

1. Login to the vSphere web client, in the left pane click on Administration. Under Single Sign-On click Configuration. Click on Identity Sources in the middle pane.

10-12-2013 8-40-28 AM

2. Click on the green plus sign. If you want rich Active Directory support then choose Active Directory (integrated Windows Authentication). Chosing Active Directory as LDAP Server is for 5.1 backwards compatibility and should NOT be used. You will have issues with domain trusts, etc. Should be avoided!

10-12-2013 8-39-34 AM

3. After the source is added you should see three Identity Sources.

10-12-2013 8-43-30 AM

Delegate SSO Admin Rights

1. Create a group in Active Directory that you want to delegate SSO administrator rights too. In my case the group is called APP_VCTR_SSO_Admin. You can use whatever name you wish. Put your account into that group.

1. On the Groups tab click on Administrators, then in the lower Group Members pane click on the Blue Man Group person.

10-12-2013 8-59-54 AM

2. Change the domain to your AD domain, then find your group. Highlight the group then click on Add. Then you can click on OK to add the group.

10-12-2013 9-12-39 AM

3. If you log out of Windows then log back in (to refresh your group membership), you should now be able to use the Windows credential option to access the vSphere web client. The first time you try it a warning message will likely appear. I would uncheck the Always Ask box unless you like exercising your fingers.

10-12-2013 11-34-48 AM

10-12-2013 11-25-55 AM


Configuring some basic SSO settings is not rocket science, but common to many environments. At a minimum you need to import the SSO STS certificate chain. Nearly everyone has AD, so adding the more intelligent SSO 5.5 AD identity source will be on everyone’s agenda. Shared accounts are never a good idea, so setting up a group for SSO admin delegation is a great idea.

Next up in lucky Part 13 we install the Inventory Service and secure it with trusted SSL certificates.

Safely virtualizing Windows Server 2012 Active Directory via Generation-ID

Windows Server 2012 generation ID is a great new feature that will allow use to safely virtualize a domain controller, on specific hypervisors. One of the really great features that hypervisors have had for ages is the ability to perform snapshots, then roll back to a prior state with a click of a mouse. Invaluable feature in both the lab, and in production.

I know during all my (failed) vSphere 5.1 installs I practically wore out the revert to snapshot button in vCenter. But, there is at least one class of VMs that you almost NEVER want to roll back from a snapshot with, those which are vector-clock synchronized software such as Active Directory.

Why is rolling back AD bad? I mean why is rolling back AD *REALLY* bad? Microsoft has these little things called USNs, or Update Sequence Numbers. A USN is an Active Directory database instance counter which gets incremented each time an update to AD is made. USNs are unique to each DC, and use a monotonically increasing value. USNs are used to determine what changes need to be replicated to other DCs.

When you revert to a snapshot a USN rollback occurs. What can happen if a USN rollback occurs? Lots of bad things, such as missing AD objects, wrong security group memberships, passwords are reset, and re-appearing AD objects. Also, DCs that are rolled back may accumulate many changes which never get replicated to other DCs. In short, the AD consistency of your forest is SHOT.  Starting with Windows Server 2003 SP1 and later, an event log ID 2095 is generated if a USN roll-back is detected, but it’s up to you to fix the mess. Microsoft has a great KB article here that goes into a lot more detail.

What has Microsoft done in Windows Server 2012 (and Windows 8) to address this problem? They’ve introduced a safeguard called a VM-Generation ID, which can be implemented by any hypervisor. This generation ID can be used by applications and operating systems to detect if a virtual machine has been rolled back in time, and take appropriate measures.

So what happens when AD detects that the Generation IDs have changed? First, it dumps the RID pool, then does a non-authoritative synchronization of the SYSVOL folder. AD replication is then re-established to other DCs, to bring the reverted DC back into a consistent state with the rest of the forest.

Sounds great right? Well it is, but only a very limited number of hypervisors support VM-Generation ID. As of this writing the hypervisors are Hyper-V 3.0, vSphere 5.0 U2, and vSphere 5.1. Since a USN rollback is quite unpleasant, you of course want to verify that WS2012 and your hypervisor are playing nice and using the Generation-ID feature. If you look in the Directory Service event log, you will see event ID 2168 and 2172. In the screenshots below they have the same Generation-ID, since the VM was not reverted to a previous snapshot.

To test out this new feature I fired up my vCenter 5.1 web console and took a snapshot of my WS2012 domain controller. After the snapshot completed, I created a new group on another DC, then reverted the WS2012 DC back to my snapshot. Let’s look in the event viewer and see what happened:

Yes, AD realized it was reverted back to a prior snapshot…

Microsoft even tells you that snapshots are not backups, and silly, use an AD aware backup program to restore AD.

And now life is almost good…

Let’s freshen up FRS a little bit while we are at it…

Nothing like a new database to start off the day with…

A touch of USN cleanup…

And a few minutes later, everything is back in sync! As you can see from the screenshots, Microsoft is very verbose in the logs on exactly what is happening and why. In a very large forest with a lot of DCs the recovery process could take longer.

So under what circumstances does the Generation-ID change and not change? Here’s a list:

Generation-ID NOT changed when:
VM is paused or resumed
VM is rebooted
VM host reboots
VM is vMotioned/Live Migrated

Generation-ID IS changed when:
VM starts executing a snapshot
VM is recovered from a backup
VM is failed over in a disaster recovery environment
VM is imported, copied, or cloned

This feature alone should be a huge driver for deploying WS2012 based DCs on all of your hypervisors. Never thought I’d say this..but happy snapshotting your domain controllers! For even more detailed information on virtualized domain controllers, Microsoft has a great series of articles here you can read.

P.S. This feature does NOT work with array-based snapshots. The hypervisor tracks and creates the new Generation-IDs. So DO NOT revert a domain controller back to a prior state by reverting to a previous snapshot that your array created vice your hypervisor. With the forthcoming VVOLS in vSphere .Next, Generation-ID could be supported with hardware-snapshot offloads but we will have to wait and see if that’s the case.

SIA312: What’s new in Active Directory in Windows Server 2012

Dean Wells, Active Directory Product Group, Microsoft

This was another killer session, with a super dynamic speaker that only rivals Mark Minassi in presention and content. Dean could double as a stand up IT comedian. Although it may have not gotten a lot of press, there are a number of enhancements to Windows Server 2012 Active Directory. The session was highly technical and fast paced, so I didn’t get everything down. If you went to TechEd and can watch the video of this session, it is a must see if you have anything to do with AD in your job.

Brace yourself for a fire hose:

  • High Level Areas of Investment
    • Simplified deployment of AD
    • Optimal deployment experiences in both private and public clouds
    • Increase consistency throughout the management experience
    • Accommodate business-driven security requirements though the integration of file-classification and claims-based authorization (dynamic access controls)
  • Broad Goals
    • Virtualziation that just works
    • Simplified deployment of AD – No more adprep, forestprep,
    • Simplify Management of AD – GUI, PowerShell, etc.
  • New Features and Enhancements
    • Simplified Deployment
      • Background – Adding DCs were too hard and too error prone
      • Solution
        • Integrate preparation steps into the promotion process
        • Validates environment-wide pre-reqs
        • Integrated with server manager and remotable
        • Built on Windows powershell for GUI andn CLI consistency
        • Only one set of credentials needed (enterprise admin)
        • Note: Starting with Windows Server 2003 you can completely back out a scheme change.
      • Requirements
        • Windows Server 2012
      • Dcpromo will now retry forever until you cancel it, in case of network issues. Fixed a newly discovered bug that’s existed in AD for 12 years.
      • Enhanced IFM (install from media) options. Offline defrag is now no longer required prior to preparing for IFM. An option that you need to choose, as it’s not the default.
      • ADFS 2.1 is now in the box
    • Virtualization safe
      • DCs can detect when snapshots are taken
      • DCs can detect when they are copied
      • Built on a generation ID that is changed when VM-snapshots are used
      • Generation ID is exposed to the OS through the VMs BIOS ACPI table
      • Windows Server 2012 virtual DCs track the VM-generation ID to detect changes and protect AD.
        • Discard RID pool
        • Resetting InvocationID – Used when DCs write data
        • Re-asserting INITSYNC requirements for FSMOs
      • Requires a hypervisor that supports it. Only Hyper-V supports it today, but other vendors have been given the specification. Expect VMware and XenServer to support it in coming releases.
    • Rapid Deployment
      • Deploy a DC that is running as a VM and you can just copy it.
      • Powershell is used to prepare an existing VM and it creates a dcclone config file
      • Note: No need to use NTDSutil to whack dead DCs. You can use ADUC for a number of years now.
      • Doesn’t let you clone DCs with certain software (like certificate services). Built-in whitelist.
    • RID usage is now exposed and queryable (max 1 billion per forest)
    • RID Improvements
      • Background: Appended to the end of a SID. 30 bits.
      • Account creation failure could cause the loss of a RID
      • Prevent RID allocation through failed domain joins
      • Log events when RID pools are invalidated (e.g. malicous code)
      • Enforced a cap on RID block size (was unlimited), new max is 15,000
      • Periodic RID consumption warning. Events become more frequent as the pool depletes
      • RID artificial ceiling of 90%, which is a soft limit. Flip a bit on the RID and you can use the remaining 100 million
      • Unlocked the 31st bit in the global RID space. Address space now doubled from 1B to 2B. 31st bit was reserved to flag Novell migrated accounts.
    • Deferred index creation – Too geeky to explain here
    • Expose DNTs on RootDSE – Too geeky to explain here
    • Off-premises domain join
      • Extends offline domain-join by allowing the blog to accommodate direct access pre-reqs
        • Certs
        • Group policies
      • Download a base-64 blob from the web, then completely join your computer to the domain and setup direct access without ever touching the corporate network
    • Enhanced LDAP logging
    • New LDAP controls and behaviors
    • Reycle Bin GUI
    • Dynamic Access Control
    • Kerberos claims can be shoved into a ADFS claim token
    • Active-directory based Windows OS activation
      • Requires Windows 8 and Server 2012
    • Active Directory PowerShell History Viewer
      • Shows powershell cmdlet history like Exchange tools do
    • Fine-grained password policy GUI
    • Kerberos armoring  – Flexible Authentication Secure Tunneling (FAST)
    • KDC delegation now works across domains and forests. Huge for some customers.
    • Managed service accounts – Now old technology. New technology is Group Managed Service Accounts (gMSA).
      • Scheduled tasks can also use gMSAs
      • Need Server 2012 schema and one 2012 DC. Only works on Win8 and Server 2012.
      • Multiple computers can now utilize the gMSA unlike the legacy MSAs
    • AD replication and topology PowerShell cmdlets

GBing! (Inside joke for those that attended the session!)

Alka-Seltzer for your Windows Token Bloat

As most Windows administrators know when you logon to any system locally or remotely Windows generates a token that contains a list of security identifiers of all the groups the user belongs to. In large environments or where you have implemented granular role-based security, top-tier users could be a member of hundreds of groups. At some point you will exceed the default token size and experience some problems. Token bloat has struck!

The exact nature of the problem could be minor, or relatively major. You may get weird access denied messages, applications crashing, or strange entries in your event logs. Or worse yet a SID for a group that has a ‘deny permission’ on an object could be dropped into the virtual bit bucket, allowing a user to access a resource they are not supposed to access. Not good! Get ready to grab some Alka-Seltzer and your resume.

Thankfully there are several ways to combat this problem, and make it almost irrelevant for 99.99% of the organizations out there. R-e-l-i-e-f is close at hand. Starting in Windows 2000 SP4 and later the maximum token size was increased from 8,000 bytes to 12,000 bytes. Domain local groups consume 40 bytes per SID, while global and universal groups only consume 8 bytes per SID. There are approximately 400-1,200 bytes of ticket overhead, so worst case tokens will start to break around 270 domain-local groups. 270 can be low in large environments.

Are you thinking what I’m thinking? Let’s dispense with domain local groups and use global or universal groups for everything. Sure that’s an option, but it may not work so well in multi-domain or multi-forest environments. But you can probably do some combination of domain local and global/universal groups so help limit token sizes. If you are a single forest/domain, then domain local groups could likely be dispensed with.

How about a registry hack to increase the 12,000 byte limit to something larger? Sure! That’s a possibility too. If you navigate to HKLMSystemCurrentControlSetControlLSAKerberosParameters you can configure a REG_DWORD value for MaxTokenSize that can go up to 65535, decimal. But the trick is every machine in your forest needs to have this registry key updated, a perfect situation to use a GPO or computer start up script. Before you make this system wide change, do VERY VERY thorough testing with all of your applications.

Finally, a little known fact is that distribution groups (vice security groups) do not add to a user’s token bloat. So if you have email enabled groups that are only used for email and not ACLs on any resources, you can convert those security groups to distribution groups.

Summary of fixes for token bloat:
1) Use global or universal groups instead of domain local.
2) Increase the MaxTokenSize on all computers
3) Convert security groups to distribution groups if they are only used for email lists.

But wait, it’s not all sunshine and roses…more heartburn is on the way. There’s another Windows limitation that you will hit long before you are a member of 8,000+ groups. There is a hard-coded limit of 1,024 SIDs for the Kerberos PAC (privilege attribute certificate). Taking into account the nine default SIDs for any domain user (authenticated users, everyone, etc.) the real limit is 1,015 groups..of any type. If you go over this limit you may see a a logon error stating “the system cannot log you on due to the following error: during a logon attempt the user acquired to many security identifiers.” Oops!!

So the bottom line is the largest value your token size could be is approximately 42,160 bytes (1024 x 40 + 1200). This falls under the 65,536 byte maximum, but far above the 12,000 byte default limit. So if you want to protect yourself against any possible token logon problems increase the maxtokensize to 65,635 and keep group membership to 1,015 groups or less. This impacts both Kerberos and NTLM authentication protocols.

There are some good Microsoft KB articles that talk about this problem which are worth checking out. They are: 906208, 263693, and 327825. Microsoft also wrote a very detailed white paper on access token limitations you can download here. Microsoft also has a token size troubleshooting utility (tokensz) you can download here. Before you go changing any registry keys thoroughly read all of these resources.

SIA306: Server 2008 R2 Active Directory Recycle Bin

This session, Active Directory Recycle Bin, was presented by Mark Minasi, which is always a riot to listen to. In addition to really knowing his stuff, he’s probably in the the top two TechEd presenters for style. Guaranteed laughs!

Prior to Windows Server 2008 R2, when you delete an object it’s gets stripped of most of its attributes and is put in a special hidden OU called “Deleted Objects.” For example, if you delete a user then virtually every property except the SAM account name is removed. Password, title, office, name…all gone! If you restore the object then you need to re-populate the attributes. Yes you could do an authoritative restore on the object, but in large environments this can take a significant amount of time and requires taking one DC offline.

Starting with Windows Server 2008 R2, if your entire forest is in Windows Server 2008 R2 functional mode, there’s a new concept called the Active Directory Recycle bin. Unlike previous versions of the operating system, all attributes on deleted objects are preserved. Group membership,name, previous OU location, etc. are all retained. Nifty eh?

But the kicker is that this new feature is not enabled by default, and only objects deleted after you enable this feature can be restored. So as soon as your forest is in 2008 R2 functional mode, turn on this feature.

How does object deletion work in 2008 R2 FFL? For the first 180 days after the object is deleted it is put in the recycle bin and you can easily restore it. After 180 days its now placed in the deleted objects OU, tombstoned, and permanently deleted after another 180 days. So any deleted object is retained in AD for a total of 360 days.

Mark covered several methods to restore the objects, using PowerShell and Ldp. Given those methods are a bit tedious, there’s a GUI way to do it. If you download PowerGUI then download the Active Directory Recycle Bin Powerpack, you can now do several tasks from a friendly GUI:

– Restore a deleted object (original location)
– Restore a deleted object to a different location
– Permanently delete an objects
– Empty the recycle bin
– Enable the recycle bin

Happy undeleting!