Archives for 2010

vCenter 4.0/4.1 VUM SSL Certificate How-To

Update 2/11/2011: VMware has re-published the article and limited the applicability to 4.1 U1 (released 2/10/2011), since it directs you to use the new VMware Update Manager Utility. The new procedure is easier to follow and uses a new tool that makes it debut in 4.1 U1. However, IMHO, it’s still inadequate. So I wrote up the full procedure for VUM 4.1 U1 here.

Update 1/6/2011: VMware has retracted the public KB article that I referenced. There is no new ETA on a revised public version. However, the VMware techie said the basic steps should not change, so you can still follow the steps below.

A little over a year ago I posted a “hack” to reconfigure vCenter VUM 4.0 for a trusted SSL certificate. At that time VMware had no official guidance, and only a couple of days ago did VMware release an official KB article. In addition, “Abe” left some good comments a couple of months ago on my old blog post that came from an internal VMware KB article. The official article closely mirrors Abe’s steps.

I have vCenter and VUM running on Server 2008 R2 and on the D drive, so just to come full circle I’ll pull from Abe’s comments and the KB article, substituting the different paths for my environment. It’s really mind boggling that VMware doesn’t develop some simple GUI program that would create the certificate requests, then import them to ESXi hosts, vCenter, and VUM. The very complicated and time consuming effort to update all of the SSL certificates is really frustrating. Microsoft and HP make it vastly easier to use trusted SSL certificates. VMware’s process is the most convoluted and complicated that I know of.

These instructions work for vCenter 4.0 and 4.1 GA, BTW. For 4.1 U1, see my blog post here.

1. First you need to generate the trusted SSL certificates. To do this, follow steps 1 – 9 in my blogpost here.
2. Stop the VMware vCenter Update Manager service.
3. On your VUM server backup all the files in D:Program Files (x86)VMwareInfrastructureUpdate ManagerSSL.
4. Copy rui.key, rui.crt, rui.pfx to the SSL directory in the previous step.
5. Open an elevated command prompt and CD to D:Program Files (x86)VMwareInfrastructureUpdate Manager.
6. On one VERY long line type:

vciInstallUtils.exe -v localhost -p 80 -U {username} -P {password} -C “d:Program Files (x86)VMwareInfrastructureUpdate Manager” -L “C:UsersAll UsersVMwareVMware Update ManagerLogs” -I “d:Program Files(x86)VMwareInfrastructureUpdate Manager” –op install-keystore

7. Verify “Import and generation of certificate worked, install-keystore successful” is shown.
8. In the same command prompt type (as one line):

vciInstallUtils.exe -v localhost -p 80 -U {username} -P {password} -S “d:Program Files (x86)VMwareInfrastructureUpdate Managerextension.xml” -C “d:Program Files (x86)VMwareInfrastructureUpdate Manager” -L “C:UsersAll UsersVMwareVMware Update ManagerLogs” –op extupdate

9. Verify “The extension registration succeeded” is shown.
10. Start the VMware vCenter Update Manager Service.
11. Close the vSphere client, if open. Launch the vSphere client and connect to vCenter.
12. From the home page click on vCenter Service Status and verify it is healthy.

And there you have it! The official method to update your VUM SSL certificates. Again, why it took VMware this frigging long to tell customers how to do this is mind blowing. In the DoD using trusted SSL certificates is a requirement, so the lack of official VMware guidance was a real problem. Now VMware needs to make it 10x easier and GUI driven. Maybe in vSphere 7.0.

XenDesktop 5.0: Tips for a smooth installation

Last week Citrix XenDesktop 5.0 was released, which I was eagerly anticipating since XD 4.0 had a number of warts which made it unusable for the environment I support. XD5 was supposed to fix all of the show stoppers, so the day it was released I downloaded the HUGE 18GB installation package.

This weekend I finally got time to do the installation and ran across a few hurdles, but in the end I was successful but wasted a number of hours I could have spent out doors in our lovely 83 degree December weather. Here are some installation tips to bypass the hours I spent troubleshooting the installation.

1. It’s extremely important if you use VMware vSphere to have a trusted SSL certificate installed on your vCenter server. Nearly a year ago I wrote a blog on exactly how to do this. You can check it out here, and as a side note the instructions work for vCenter 4.1 too. The SSL certificate should use the FQDN of the vCenter server. When selecting VMware as the host type, use the following address format:

https://vCenter.domain.com/sdk

Symptoms of this error include a message in Desktop Studio when entering the VMware credentials:

The hypervisor was not contactable at the supplied address.

This error message is nearly completely useless and doesn’t give you any good details on what’s really wrong. It should pop up with a warning that the SSL certificate is not trusted and then point you to a KB article that describes how to install a trusted certificate. Or it could even give you the (less secure) option of ignoring the certificate error, which would be great for quick POC deployments and less frustration.

2. If the server VM you are using to install XD5 in has FIPS encryption mode enabled make sure that in IE under Advanced settings you have at least TLS 1.0 selected. If you only have TLS 1.1 and 1.2 selected then you get different errors depending on the deployment scenario you choose. If you chose “Desktop deployment” you will get the following errors:

In XenDesktop 5.0 Desktop Studio:
——
GUI Popup: Unknown error occurred.

Detailed logs show:

Reset-BrokerServiceGroupMembership : Unknown error occurred

+ CategoryInfo : InvalidOperation: (:) [Reset-BrokerServiceGroupMembership], SdkOperationException
+ FullyQualifiedErrorId : Citrix.XDPowerShell.Broker.UnknownError,Citrix.Broker.Admin.SDK.ResetBrokerServiceGroupMembershipCommand
——
In the Windows Applilcation Event logs you may see:
——
Log Name: Application

Source: Citrix Broker Service
Date: 12/12/2010 3:16:25 PM
Event ID: 1005
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: Q001XD01.contoso.net
Description:
The Citrix Broker Service failed to connect to the XenDesktop database.

Please check that the database is configured correctly.

Error details:
Exception ‘NewIcon: Unhandled Error’ of type ‘Citrix.Cds.DAL.DALDataStoreException’.
——
 
If you use the “Quick Deploy” wizard then you will get the following error:
 
Exception has been thrown by the target of an invocation.
 
You will also get the same event ID 1005 in the application logs as described above. Thanks Citrix for giving us consistent errors in the Desktop Studio GUI for the same problem…not!
 
3. If you are using vDS (virtual distributed switch) like the Cisco Nexus 1000v or the built-in VMware one, beware! If you configure your master template VM with a network port on a vDS, but configure your vCenter host connection options to deploy VMs to a non-vDS network you will be unable to create a new VM pool.

4. If you want to use webcams with XenDesktop, yes they do work! I tried my Logitech 9000, and made it work in “small” resolution which is 320 x 180. Anything higher and I got no video image. The small size resulted in real time video and it was extremely smooth and high quality image. Looked like native performance to me..on the LAN.

In conclusion, XD5 is vastly easier to install and configure than previous versions. But there are still some rough edges. As I do more testing I’ll post periodic updates.

Update 1: Found another bug in MCS (machine creation services), that you can read about here for a workaround.

Update 2: If you want to use Citrix Provisioning Services (PVS) and use the VMware VMXNET3 driver, you must apply Citrix hotfix CTX128160 found here, or your VMs will crash during boot.

Update3: I updated the FIPS encryption issue since the problem is caused by a combination of enabling FIPS encryption WITH a non-standard IE setting (TLS 1.0 is unchecked under advanced settings). You can safely have FIPS encrypiton enabled with a default IE configuration.

Update 4: Citrix released XD5 SP1, which fixes a few of these issues. Check out my post here.

Free Veeam licenses for the Holidays

Veeam is offering free, permanent, licenses for Backup and Replication v5 and Veeam Monitor Plus with Business View. These are NFR (not for resale) licenses, and limited to two sockets. But it’s great for a home lab, demo, training, etc. I don’t know how long the deal is good for, so I’d immediately request your gift.

http://www.veeam.com/nfr/free-nfr-license

I requested mine, and literally within seconds I got an email with the three free license keys. Veeam Backup is a GREAT product for virtual environments. I highly encourage everyone to get the gift licenses and try them out.

Two-Factor Authentication for Exchange 2010 is now possible

Back in 2009 I wrote a blog about the possibility of Microsoft supporting two-factor or multi-factor authentication for some Exchange services. For organizations which require high security, such as the DoD, allowing external access to email requires additional protection. With Exchange 2007 and prior versions there was no easy way (or any way!) to natively support certificate based two-factor authentication for services like Exchange ActiveSync. 

To my surprise and great delight, Microsoft just released a lengthy whitepaper on how to enable certificate based two-factor authentication with Exchange 2010 and Microsoft ForeFront TMG or Microsoft Forefront UAG. The table below is directly from their whitepaper and shows you the different authentication scenarios and which product(s) support that scenario.

You will notice though that Outlook Anywhere is missing from this list. So that’s a major bummer! But all is not lost. Microsoft released another whitepaper, Using IPsec to Secure Access to Exchange. By using IPsec you can enforce that only trusted computers can establish a secure connection to your Exchange servers. The whitepaper further states you could consider this a two-factor authentication solution since the certificate is something you have, and you need your password (something you know) to logon to the computer. This also has the added benefit that it works with AutoDiscover, Exchange Web Services, Outlook Anywhere and Outlook Web App.

3PAR vSphere VAAI "XCOPY" Test Results: More efficient but not faster

In my previous blog I discussed how the VMware 4.1 VAAI ‘write same’ implementation in a 3PAR T400 showed a dramatic 20x increase in performance, creating an eager zeroed thick VMDK at 10GB/sec (yes, GigaBYTES a second). The other major SCSI primitive that VAAI 4.1 leverages is XCOPY (SCSI opcode 0x83). What this does is basically offload the copy process of a VMDK to the array, so all of the data does not need to traverse your SAN or bog down your ESX host.

In this test I used the same configuration as described in my previous blog entry. I decided to perform a storage vMotion of a large VM. This VM had three VMDKs attached, to simulate real world usage. The first VMDK was 60GB and had about 5GB of operating system data on it. The next two VMDKs were eager zerod thick disks, 70GB and 240GB, and had no user data written to them. Total VMDK size is 370GB. I initiated a storage vMotion process from vCenter 4.1 to start the copy process.

“XCOPY” without VAAI:
Host CPU Utilization: ~3916 MHz
Read/write latency: 3-4ms
3PAR host facing port aggregrate throughput: 616MB/sec
3PAR back-end disk port aggregrate throughput: ~0MB/sec
Time to complete: 20 minutes

These results are very reasonable, and quite expected. Since VAAI was not used, the ESXi host has to read 370GB of data, then turn it right around and write 370GB of data to the disk. So in reality over 740GB of data traversed the SAN during the 20 minute storage vMotion process. Since the VMDKs only contained 1% written data, back-end disk throughput was nearly zero because of the ASIC zero detection feature. If the VMDKs were fully populated then the back-end ports would be going crazy and the copy would be slower since all I/Os would be hitting physical disks.

“XCOPY” with VAAI:
Host CPU Utilization: ~3674 MHz
Read/write latency: 3-4ms
3PAR host facing port aggregrate throughput: ~0MB/sec
3PAR back-end disk port aggregrate throughput: ~0MB/sec
Time to complete: 20 minutes

Now I’m pretty surprised at these results, and not in a positive fashion. First, it’s good to see nearly zero disk I/O on the host facing ports and the back-end ports. This means in fact VAAI commands were being used, and that the VMDKs were nearly all zeros. However what has me very puzzled is that the copy process took exactly the same amount of time to complete, and used nearly the same amount of host CPU. I repeated the tests several times, and each time I got the exact same results…20 minutes.

Since there’s virtually no physical disk I/O going on here, I would expect a dramatic increase in storage vMotion performance. Because these results are very surprising and unexpected, I contacted 3PAR and I will see if engineering can shed some light on this situation. Other vendors claim a 25% increase in storage vMotion performance when using VAAI. Clearly 0% is less than 25%. When I get clarification on what’s going on here, I will be sure to follow up.

Update: 3PAR got back to me about my observations, and confirmed what I’m seeing is correct. With firmware 2.3.1 MU2 XCOPY doesn’t reduce the observed wall clock time to “copy” empty space in a thinly provisioned volume. But as I noted, XCOPY does leverage the zero detection feature of their ASIC so there’s very little back-end I/O occuring for non-allocated chunklets.

So yes the current VAAI implementation reduces the I/O strain on the SAN and disk array, but doesn’t reduce the observed time to move the empty chunklets. In my environment the I/O loads are pretty darn low, so I’d prefer the best of both worlds…efficient copies and reduced observed copy times. If 3PAR could make the same dramatic performance gains of the ‘write same’ command for the XCOPY command, that would really be a big win for customers.

3PAR vSphere VAAI "Write Same" Test Results: 20x performance boost

So in my previous blog entry I wrote about how I upgraded a 3PAR T400 to support the new VMware vSphere 4.1 VAAI extensions. I did some quick tests just to confirm the array was responding to the three new SCSI primitives, and all was a go. But to better quantify the effects of VAAI I wanted to perform more controlled tests and share the results.

Environment
First let me give you a top level view of the test environment. The host is an 8 core HP ProLiant blade server with a dual port 8Gb HBA, dual 8Gb SAN switches, and two quad port 4Gb FC host facing cards in the 3PAR (one per controller). The ESXi server was only zoned to two ports on each of the 4Gb 3PAR cards, for a total of four paths. The ESXi 4.1 Build 320092 server was configured with native round robin multi-pathing. The presented LUNs were 2TB in size, zero detect enabled, and formatted with VMFS 3.46 and using an 8MB block size.

Testing Methodology
My testing goal was to exercise the XCOPY (SCSI opcode 0x83) and write same (SCSI opcode 0x93). To test the write same extension, I wanted to create large eager zeroed disks, which forces ESXi to write all zeros to the entire VMDK. Normally this would take a lot of SAN bandwidth and time to transfer all of those zeros. Unfortunately I can’t provide screen shots because the system is in production, so you will have to take my word for the results.

“Write Same” Without VAAI:
70GB VMDK 2 minutes 20 seconds (500MB/sec)
240GB VMDK 8 minutes 1 second (498MB/sec)
1TB VMDK 33 minutes 10 seconds (502MB/sec)

Without VAAI the ESXi 4.1 host is sending a total 500MB/sec of data through the SAN and into the 4 ports on the 3PAR. Because the T400 is an active/active concurrent controller design, both controllers can own the same LUN and distribute the I/O load. In the 3PAR IMC (InForm Management console) I monitored the host ports and all four were equally loaded around 125MB/sec.

This shows that round-robin was functioning, and highlights the very well balanced design of the T400. But this configuration is what everyone has been using the last 10 years..nothing exciting here except if you want to weight down your SAN and disk array with processing zeros. Boorrrringgg!!

Now what is interesting, and very few arrays support, is a ‘zero detect’ feature where the array is smart enough on thin provisioned LUNs to not write data if the entire block is all zeros. So in the 3PAR IMC I was monitoring the back-end disk facing ports and sure enough, virtually zero I/O. This means the controllers were accepting 500MB/sec of incoming zeros, and writing practically nothing to disk. Pretty cool!

“Write Same” With VAAI: 20x Improvement
70GB VMDK 7 seconds (10GB/sec) 
240GB VMDK 24 seconds (10GB/sec)
1TB VMDK 1 minute 23 seconds (12GB/sec)

Now here’s where your juices might start flowing if you are a storage and VMware geek at heart. When performing the exact same VMDK create functions on the same host using the same LUNs, performance was increased 20x!! Again I monitored the host facing ports on the 3PAR, and this time I/O was virtually zero, and thanks to zero detection within the array, almost zero disk I/O. Talk about a major performance increase. Instead of waiting over 30 minutes to create a 1TB VMDK, you can create one in less than 90 seconds and place no load on your SAN or disk array. Most other vendors are only claiming up to 10x boost, so I was pretty shocked to see a consistent 20x increase in performance.

In conclusion I satisfied myself that 3PAR’s implementation of the “write same” command coupled with their ASIC based zero detection feature drastically increases creation performance of eager zeroed VMDK files. Next up will be my analysis of the XCOPY command, which has some interesting results that surprised me.

Update: I saw on the vStorage blog they did a similar comparison on the HP P4000 G2 iSCSI array. Of course the array configuration can dramatically affect performance, so this is not an apples to apples comparison. But nevertheless, I think the raw data is interesting to look at. For the P4000 the VAAI performance increase was only 4.4x better, not the 20x of the 3PAR. In addition, the VDMK creation throughput is drastically slower on the P4000.

Without VAAI:
T400 500MB/sec vs P4000 104MB/sec (T400 4.8x faster)

With VAAI:
T400 10GB/sec vs P4000 458MB/sec (T400 22x faster)

3PAR VAAI Upgrade is cakewalk

For those of you using vSphere 4.1, one of the cool new features is VAAI support. What is VAAI? VAAI is a deep level of integration between select storage arrays and the ESX kernel. The three VAAI functions released in 4.1 are:

•Atomic Test & Set (ATS), which is used during creation of files on the VMFS volume
•Clone Blocks/Full Copy/XCOPY, which is used to copy data
•Zero Blocks/Write Same, which is used to zero-out disk regions

Arrays need firmware updates to support these enhanced SCSI commands. Since vSphere 4.1 was released storage vendors have been releasing firmware updates for their arrays. Today I upgraded our 3PAR T400 to their 2.3.1 MU2 code base, which has VAAI support. Like I blogged about back in February, the 3PAR upgrades are fully non-disruptive, fairly straight forward, and not so complicated they need professional services.

I found a script which makes the verification, enabling, and disabling of the features a simple one liner, and it can be found here. For a little trivia, there was supposed to be fourth VAAI SCSI primitive, ‘thin provision stun’. I bet a Star Trek fan came up with that feature name. Basically this feature enabled the array to tell a VM that it ran out of physical disk space on the LUN and ESX would ‘stun’ the VM so it wouldn’t crash or corrupt data. But as the rumor goes, there was some miscommunication between VMware and various partners so not all partners implemented or certified the stun primitive. To put everyone on a level playing field the fourth primitive was dropped. I would expect it to make an appearance in a future release.

Due to time constraints and the approaching weekend, I didn’t have time to run any vSphere tests and look at SCSI stats to verify the VAAI commands are working. That will come over the next week or two, and I plan to blog on the results.

For those of you looking at buying new storage arrays and using them with VMware, one of the basic checklist features you should use as screening criteria is VAAI support. Finally, NetApp has a great PDF that goes into good details on how VAAI works and the use cases. While it contains some NetApp specific information, the majority of the document is a good read for anyone interested in VAAI.

Build a home ESX 4.x Server for $1,000

Update: I came up with a new parts list based on Sandy Bridge parts. You can check it out here.

Currently I dual-boot my home computer with ESXi 4.1, but that’s getting old. Yes, you can run ESXi 4.1 inside of VMware Workstation 7.1 (and I do), but you can’t run any 64-bit guest operating systems within that ESXi instance. Since Server 2008 R2 is 64-bit only, along with many other applications like Exchange 2010, that really limits what you can do with nested VMs.

So I broke down today and put together a white box micro-ATX computer for home use that should scream. Total cost was about $1,000, which I don’t think is too shabby for the hardware specs. I ordered most of the parts through Directron.com, since they don’t charge CA tax and have reasonable shipping costs.

Important factors for me were size (small case), noise level (very quiet), 16GB RAM, and dual NICs that were on the ESXi 4.1 HCL. I looked at Intel Clarksdale motherboards with on-die graphics, so I could eliminate the graphics card, but from what I read their memory performance is in the toilet due to an off-die memory controller. So I opted for a separate graphics card and a 45nm Intel Lynnfield processor. Next year when Intel Sandy Bridge processors are released, that should fix the terrible memory performance due to the new ring-bus micro-archtiecture.

You could ceratinly shave off a few dollars by getting a cheaper case, slower hard drive, less full-featured MB, and a single port NIC.

Antec MicroATX Minuet 350 case $98.99
Western Digital 1TB 7200 RPM 6Gb/s SATA $86.99
Asus Maximus III GENE MicroATX MB $136.98 (after rebate)
Intel Core i5-760 2.8GHz Quad-Core $204.99
Qty 2 Mushkin 8GB DDR3 PC3-10666 kit $129 x 2
Sapphire Radeon HD5450, low profile, fanless $29.99 (after rebate)
SuperMicro AOC-SG-i2 dual-port GiGE NIC $82
NetGear GS088T-200NAS Managed 8-port GigE switch $105.98

Total comes to about $1004.00, plus a few dollars shipping. I already had a DVD drive, so I didn’t need to get one. The NetGear switch supports VLANs, jumbo packets, and other features that got my interest. While doing some research, I also found a web site that has a pretty long list of whitebox hardware and an unofficial ESX(i) 4.x HCL. You can check it out here.  I can’t wait to get all of the parts and put the server together.

Change your VMware VM UUIDs to be Unique

UUID (Universally Unique IDentifier) are also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. Why do I care about UUIDs? Well VMware attempts to assign a unique UUID to every VM. Generally they succeed, but sometimes you can end up with VMs with duplicate UUIDs. UUIDs are present in physical hardware and also manifest themselves in Windows.

Recently I discovered this was happening for our VM templates, as we were using a unique method to transport and upload large templates to standalone ESX hosts before they were managed by vCenter. Basically we were using Veeam Backup to quickly restore templates to a host prior to shipping them to remote locations. What we didn’t realize was that all the templates would have the same UUID. Our backup software is now VMware aware and detected all of these duplicate UUIDs threw some errors. So I need a way to make sure our standalone VMs get built with unique UUIDs.

A VM’s .VMX file contains the UUID, and there are several methods to change it. First, you could just locate the BIOS.UUID value and change a few digits at random with Wordpad. This is fine, but prone to human error and not time efficient. So I cobbled together a little PowerShell script that attaches to an ESX host, enumerates all VMs, and tweaks the last eight digits in the UUID to make them all pseudo-unique.

All you need to do is run the PowerShell script via PowerCLI and give the IP address of the ESX host as an argument. A window will pop up asking for credentials, then it will enumerate all VMs and modify the UUID value. There’s nothing magical about the static portion of the UUID string in the script, so you can change that to a value from your environment if you wish.

The script pauses two seconds between each VM, so that the time stamp is unique for all VMs.


 # Usage: .changeUUID.ps1

if ($args[0].length -gt 0) {
connect-viserver $args[0]
$VMs = get-vm
foreach ($vm in $VMs){
$date = get-date -format “dd hh mm ss”
$newUuid = “56 4d 50 2e 9e df e5 e4-a7 f4 21 3b ” + $date
echo “VM: ” $VM.name “New UUID: ” $newuuid
$spec = New-Object VMware.Vim.VirtualMachineConfigSpec
$spec.uuid = $newUuid
$vm.Extensiondata.ReconfigVM_Task($spec)
start-sleep -s 2
}
}
else {Echo “Must supply IP address of ESX host. e.g. .changeUUID.ps1 192.168.0.10”}

So there you go, a way to easily change the UUID of your VMware virtual machines using PowerCLI.

Alka-Seltzer for your Windows Token Bloat

As most Windows administrators know when you logon to any system locally or remotely Windows generates a token that contains a list of security identifiers of all the groups the user belongs to. In large environments or where you have implemented granular role-based security, top-tier users could be a member of hundreds of groups. At some point you will exceed the default token size and experience some problems. Token bloat has struck!

The exact nature of the problem could be minor, or relatively major. You may get weird access denied messages, applications crashing, or strange entries in your event logs. Or worse yet a SID for a group that has a ‘deny permission’ on an object could be dropped into the virtual bit bucket, allowing a user to access a resource they are not supposed to access. Not good! Get ready to grab some Alka-Seltzer and your resume.

Thankfully there are several ways to combat this problem, and make it almost irrelevant for 99.99% of the organizations out there. R-e-l-i-e-f is close at hand. Starting in Windows 2000 SP4 and later the maximum token size was increased from 8,000 bytes to 12,000 bytes. Domain local groups consume 40 bytes per SID, while global and universal groups only consume 8 bytes per SID. There are approximately 400-1,200 bytes of ticket overhead, so worst case tokens will start to break around 270 domain-local groups. 270 can be low in large environments.

Are you thinking what I’m thinking? Let’s dispense with domain local groups and use global or universal groups for everything. Sure that’s an option, but it may not work so well in multi-domain or multi-forest environments. But you can probably do some combination of domain local and global/universal groups so help limit token sizes. If you are a single forest/domain, then domain local groups could likely be dispensed with.

How about a registry hack to increase the 12,000 byte limit to something larger? Sure! That’s a possibility too. If you navigate to HKLMSystemCurrentControlSetControlLSAKerberosParameters you can configure a REG_DWORD value for MaxTokenSize that can go up to 65535, decimal. But the trick is every machine in your forest needs to have this registry key updated, a perfect situation to use a GPO or computer start up script. Before you make this system wide change, do VERY VERY thorough testing with all of your applications.

Finally, a little known fact is that distribution groups (vice security groups) do not add to a user’s token bloat. So if you have email enabled groups that are only used for email and not ACLs on any resources, you can convert those security groups to distribution groups.

Summary of fixes for token bloat:
1) Use global or universal groups instead of domain local.
2) Increase the MaxTokenSize on all computers
3) Convert security groups to distribution groups if they are only used for email lists.

But wait, it’s not all sunshine and roses…more heartburn is on the way. There’s another Windows limitation that you will hit long before you are a member of 8,000+ groups. There is a hard-coded limit of 1,024 SIDs for the Kerberos PAC (privilege attribute certificate). Taking into account the nine default SIDs for any domain user (authenticated users, everyone, etc.) the real limit is 1,015 groups..of any type. If you go over this limit you may see a a logon error stating “the system cannot log you on due to the following error: during a logon attempt the user acquired to many security identifiers.” Oops!!

So the bottom line is the largest value your token size could be is approximately 42,160 bytes (1024 x 40 + 1200). This falls under the 65,536 byte maximum, but far above the 12,000 byte default limit. So if you want to protect yourself against any possible token logon problems increase the maxtokensize to 65,635 and keep group membership to 1,015 groups or less. This impacts both Kerberos and NTLM authentication protocols.

There are some good Microsoft KB articles that talk about this problem which are worth checking out. They are: 906208, 263693, and 327825. Microsoft also wrote a very detailed white paper on access token limitations you can download here. Microsoft also has a token size troubleshooting utility (tokensz) you can download here. Before you go changing any registry keys thoroughly read all of these resources.