VMworld 2013: Exchange on VMware Best Practices

Twitter: #VAPP5613, Alex Fontana (VMware)

This session was skillfully presented and was jam packed with Exchange on VMware best practices for architects and Exchange administrators. Can you use Exchange VMDKs on NFS storage? Can you use vSphere HA and DRS? How can you avoid DAG failover with vMotion? What’s the number one cause of Exchange performance problems? All of these questions and more were answered in this session. If you just think a “click next” install of Exchange is adequate for an enterprise deployment then you need to find a new job. Period.

Agenda

Exchange on VMware vSphere overview
VMware vSphere Best Practices
Availability and Recovery Options
Q&A

Continued Trend Towards Virtualization

Move to 64-bit architecture
2013 has 50% I/O reduction from 2010
Rewritten store process
Full virtualization support at RTM for Exchange 2013

Support Considerations

You can virtualize all roles
You can use DAGs and vSphere HA and vMotion
Fibre Channel, FCoE and iSCSI (native and in-guest)
What is NOT supported? VMDKs on NFS, thin disks, VM snapshots

Best Practices for vCPUs

CPU over-commitment is possible and supported but approach conservatively
Enable hyper-threading at the host level and VM (HT sharing: Any)
Enable non-uniform memory access. Exchange is not NUMA-aware but ESXi is and will schedule SMP VM vCPUs onto a single NUMA node
Size the VM to fit within a NUMA node – E.g. if the NUMA node is 8 cores, keep the VM at or less than 8 vCPUs
Use vSockets to assign vCPUs and leave “cores per socket” at 1
What about vNUMA in vSphere 5.0? Does not apply to Exchange since it is not NUMA aware

CPU Over-Commitment

Allocating 2 vCPUs to every physical core is supported, but don’t do it. Keep 1:1 until a steady workload is achieved
1 physical core = 2400 Megacycles = 375 users at 100% utilization
2 vCPU VM to 1 core = 1200 megacycles per VM = 187 users per VM @ 100% utilization

Best Practices for Virtual Memory

No memory over-commitment. None. Zero.
Do not disable the balloon driver
If you can’t guarantee memory then use reservations

Storage Best Practices

Use multiple vSCSI adapters
Use Eager thick zeroed virtual disks
Use 64KB allocation unit size when formatting NTFS
Follow storage vendor recommendations for path policy
Set power policy to high performance
Don’t confuse DAG and MSCS when it comes to storage requirements
Microsoft does NOT support VMDKs on NFS storage for any Exchange data including OS and binaries. See their full virtualization support statement here.

Why multiple vSCSI adapters?

Avoid inducing queue depth saturation within the guest OS
Queue depth is 32 for LSI, 64 for PVSCSI
Add all four SCSI controllers to the VM
Spread disks across all four controllers

In the two charts below you can see the result of the testing when using 1 vSCSI adapter vice four. When using just one adapter the performance was unacceptable, and the database was stalling. By just changing the distribution of the VMDKs across multiple vSCSI adapters performance vastly increased and there were no stalls.

When to use RDMs?

Don’t do RDMs – no performance gain
Capacity is not a problem with vSphere 5.5 – 62TB VMDKs
Backup solution may require RDMs if hardware array snapshots needed for VSS
Consider – Large Exchange deployments may use a lot of LUNs and ESXi hosts are limited to 255 LUNs (per cluster effectively)

What about NFS and In-Guest iSCSI?

NFS – Explicitly not supported for Exchange data by Microsoft
In-guest iSCSI – Supported for DAG storage

Networking Best Practices

Use vMotion to use multiple NICs
Use VMXNET3 NIC
Allocate multiple NICs to participate in the DAG
Can use standard or distributed virtual switch

Avoid Database Failover during vSphere Motion

Enable jumbo frames on all vmkernel ports to reduce frames generated – helped A LOT
Modify cluster heartbeat setting to 2000ms (samesubnetdelay)
Always dedicate vSphere vMotion interfaces

High Availability with vSphere HA

App HA in vSphere 5.5 can monitor/restart Exchange services
vSphere HA allows DAG to maintain protection failure
Supports vSphere vMotion and DRS

DAG Recommendations

One DAG member per host, If multiple DAGs, those can be co-located on same host
Create an anti-affinity rule for each DAG
Enable DRS fully automated mode
HA will evaluate DRS rules in vSphere 5.5

vCenter Site Recovery Manager + DAG

Fully supported
Showed a scripted workflow that fails over the DAG

And finally the key take aways from the session..