VSP3116: Resource Management Deep Dive

I finally managed to get into a session by one of the VMware rockstars, Frank Denneman, who has co-authored several books that I highly recommend. Frank stated this topic could be a four day class alone, and this was just an hour, so it went quite quickly and just scratched the surface of the topic at hand. But nonetheless it was informative.


  • Resource entitlement
    • Dynamic: CPU and memory
    • Static: Shares, reservations, limits
  • Short term contention
    • Load correlation – Where two servers ramp up/down together (e.g. web and SQL)
    • Load synchronicity – All servers hammered at once (user logon storm at 8am)
    • Brown outs – System wide virus scanning at the same time
  • Long term contention
    • Ultra high consolidation ratios
    • Hardware limits exceeded
    • Massive overcommitment
  • VM-Level shares
    • Low (1), Normal (2), High (4)
  • VM CPU Reservation
    • Guarantees resources
    • Influences admission control
    • CPU does not use resources when VM doesn’t need processing time (fully refundable)
    • CPU reservation does not equate to priority
  • VM Memory Reservation
    • Guarantees a level of resources
    • Influences admission control
    • Non-refundable. Once allocated it remains allocated.
    • Will reduce consolidation ratios
  • VM Limits
    • Applies even when there are enough resources
    • Often more harmful than helpful (don’t use them often unless you like a hole in your foot)
    • Can very likely lead to negative impacts since the guest OS is not aware of the limits
    • Any extra memory the guest OS wants comes from swap (after TPS, memory compression), which is very slow.
    • De-schedules the CPU even if their are resources available and the VM wants them
  • DRS treats a cluster as one large host
  • Resource pools – Do not place VMs at the same level in the vCenter hierarchy as a resource pool. Always put VMs inside the appropriate resource pool.
  • Simple method to estimate resource pool shares
    • Step 1: Match defined SLA to pool (.e.g. 70 to production, 20 to test, 10 to dev)
    • Step 2: Make up shares per VM (.e.g. 70/Prod, 20/test, 10/dev).
    • Step 3: Based on the number of vCPUs per pool multiply  shares per VM * vCPUs
      • E.g. 10 vCPUs for Prod = 700 shares; 5 vCPUs for test = 100 shares; 20 vCPUs for dev = 200 shares.
    • Schedule at task to do these calculations and set the shares per pool on a nightly basis. As you add VMs and change vCPUs new calculations are needed. Check out Frank’s blog for an example script.
  • When you configure pool limits remember that each VM has overhead, which is between 5-10% of the total memory. Less VM overhead in ESXi 5.0 than previous versions.
  • Use resource pool limits with care as they can do more harm than good.
  • DRS affinity rules
    • Must run on – Cannot violate under any circumstances. You cannot even power on the VM if it’s on the wrong host. Always honored, even though HA events like host failures.
    • Should run on – Can be violated as needed, such as during HA events.
    • NOTE: You must disable Must Run On or Should Run On rules BEFORE you disable DRS, as those settings are honored even when DRS is disabled and you can’t change the rules when DRS is disabled.
  • Distributed Power Manager (DPM)
    • Frank did a poll of the room and hardly anyone is using this feature.
    • vCenter looks at the last 40 minutes and the host must be completely idle to be suspended.
    • If vCenter senses a ramp up in resource requirements in the last 5 minutes it will take the server out of stand by.
    • DPM will NOT degrade system performance to save power

Resource pools, shares, limits and reservations can be quite complicated. I strongly recommend checking out Frank’s books for a lot more details.

VSP3111: Nexus 1000v Architecture, Deployment, Management

This session focused on the Cisco distributed virtual switch, the Nexus 1000v. The speaker was very knowledgeable and a great presenter. Lots of great details, but as fast as he was going I didn’t get all of the details. You can check out the his blog at jasonnash.com.


  • The VSM is a virtual supervisor module, which acts as the brains of the switch just like a physical switch.
  • The VEM is a virtual ethernet module, which is in essence, a virtual line card that resides on each ESXi host.
  • VSM to VEM communications are critical and you have various deployment options
    • Layer 2 only: Uses two to three VLANs and is the default option, and the most commonly deployed architecture.
    • Layer 3: Utilizes UDP communications over port 4785, so it can be routed
  • When in layer 2 mode you need to configure the control, management and packet networks
    • Management: End point that you SSH into to manage the VSM and maintains contact to vCenter. Needs to be routable.
    • Control: VSM to VEM communications (This is where most problems occur.)
    • Packet: Used for CDP and ERSPAN traffic
  • Nexus 1000v deployment best practices
    • Locate each VSM on different datastores
    • You CAN run vCenter on a host that utilizes the N1K DVS
    • ALWAYS, ALWAYS run the very latest code. Latest as of Sept 1, 2011 is 1.4a, which does work with vSphere 5.0.
    • Don’t clone or snapshot the VSM, but DO use regular Cisco config backup commands
    • Always, always deploy VSMs in pairs (no extra licensing cost, so you are dumb not to do it).
  • Port profile types
    • Ethernet profile: Used for physical NICs and are used as uplinks out of the server. These use uplink profiles.
    • vEthernet profile: Exposed as port groups in vCenter and is the most common type of administrative change made in the VSM.
  • Uplink teaming
    • N1Kv supports LACP, but the physical switch must support it as well.
    • vPC-HM – Requires hardware support from the switch and more complex to troubleshoot
    • vPC-HM w/ MAC pinning – Most common configuration and easy to setup/troubleshoot.
  • On Cisco switches enable BDPU filter and BDPU guard on physical switch ports that connect to N1K uplinks.
  • Configure VSM management, control, packet, Fault Tolerance, vMotion as “system” VLANs in the N1K so they are available at ESXi host boot time and don’t wait on the VSM to come up.
  • For excellent troubleshooting information check out Cisco DOC 26204.
  • You can also check out the N1KV v1.4a troubleshooting guide here.
  • The network team may prefer to use the Nexus 1010, which is a hardware appliance that runs the VSMs. This removes the VSM from the ESXi hosts, and could be better for availability, plus the network guys can use a serial cable into the 1010. You would deploy 1010s in pairs, and they have bundles that really bring down the price.
  • You can deploy multiple VSMs on the same VLANs, but just be sure to assign each VSM pair a different “DOMAIN” ID.

Not mentioned in this session are additional Cisco products that layer on top of the 1000v, such as the forthcoming Virtual ASA (firewall), a virtual NAM, and the virtual secure gateway. The ASA is used for edge protection while the VSG would be used for internal VM protection.

Did you know about Cisco UCS Express?

Today while I was walking around the vendor expo at VMworld 2011, I saw a very interesting product from Cisco. I was familiar with the datacenter UCS product, but a mini version caught my attention. Called UCS Express this is a micro ESXi server that slides directly into their ISR G2 chassis, which is a branch office router.

This is a micro server that currently supports dual cores and upto 8GB of RAM with two 1TB HDs. With RAID 1 you get about 500GB of usable space. It will run ESXi, so you could put very lightweight services like AD/DNS, DHCP or print server out at your branch office without deploying a full rack mount server. List price is around $4K for just the mini server, which isn’t bad. On the road map is a double wide server which will support more cores and up to 48GB of RAM. That should be coming in 2012.

They will also be working on a centralized management console, so if you have a lot of these micro servers on your network you have a single pane of glass to manage them through.

If your business has remote offices with limited space, and you only need very minimal Windows services, then this could be a great option for you. I don’t think this product gets much press, as I had never seen it before.

BCO1946: Making vCenter highly available

vCenter is a business critical service that when it goes down can cause substantial chaos, although VMs will happily keep running while it is down. Using VDI? Forget spinning up new VMs, or rebooting VMs. Using vCD? Forget doing anything while it’s down. HA? Yes that will keep working (for one failure in 4.x, and indefinitely in 5.0). So this session focused on various means to make vCenter highly available since it has no built-in means to be HA, so a little help is needed.

  • Linked mode does NOTHING for HA. A little data is replicated between the various instances, but if one of the vCenters goes offline you can’t manage the hosts it was servicing.
  • You really need to establish RTO and RPOs for vCenter so you know what to design for.
  • Other infrastructure like AD, DNS and SQL are critical and must be available. Also remember that network connectivity must be maintained.
  • The main options the speaker offered up as HA solutions are:
    • Traditional backup and restore: Does your backup solution need vCenter to do restores? This is a manual recovery process and doesn’t help with planned downtime like OS patching. You need a DR plan in place. See VMware KB 1023985 for some tips.
    • Cold Standby: Easy if SQL DB is local, RTO can be shorter, but a manual recovery process. Harder to do if physical.
    • Windows Clustering: Not supported by VMware as vCenter is not cluster aware.
    • VMware HA and APIs: Neverfail and Symantec offer clustering/HA products. These are incomplete as their process monitoring is very basic and does not cover all scenarios. Better than nothing, but far from complete. Fairly automated, compliments HA, and fairly easy.
    • VMware vCenter heartbeat:
      • Not the cheapest solution, but it is the most comprehensive
      • Active/passive configuration, share nothing model
      • Protects against OS, HW, application and network failures
      • Can be triggered on performance degradation of vCenter
      • Protects against planned and unplanned downtime
      • Protects vCenter plug-ins, SQL databases (even if on separate server) and VUM
      • Works across the LAN or WAN
      • Limited to a 1:1 topology

The bottom line is if you want an automated and comprehensive vCenter protection mechanism you are really left with one option, vCenter Heartbeat. I did a quick evaluation of it a couple of years ago before it supported 64-bit operating systems and the GUI/installation had a lot of room for improvement. I haven’t tried newer releases, so I hope it feels more like an integrated product than the Neverfail engine bolted on to some VMware customizations.

VSP1999: Advanced esxtop usage

This session was quite advanced and had a lot of troubleshooting examples which are hard to adequately capture without the slides, so I’ll just touch on some of the counters he used during some troubleshooting examples. Maybe in future posts I’ll focus on one subject like storage stats and recreate a couple of the presented examples since I thought he did a very good job of showing you want to look for when problems rear their ugly heads.

  • There are a variety of management tools for ESXi
    • vCenter Alarms
    • vCenter Operations
    • vCenter Charts
    • esxtop – Live stats
    • esxplot (free utility)
  • Interpreting esxtop stats manual here
  • New counters in ESXi 5.0
    • CPU screen now shows the number of VMs and total vCPUs
    • %VMWAIT stat
    • Power: CPU pStates in Mhz (BIOS must be in OS controlled power mode)
    • Failed disk I/Os
    • VAAI block delete
    • Low latency swap – LLSWR, LLSWW (broke in 5.0, look at stats in vCenter GUI)
    • NHN – Wide NUMA indication
  • CPU counters are misunderstood
    • RDY – VM wants to run but can’t due to scheduling issues (bad)
    • CSTP – Co-stopped state. Co-scheduling overhead due to multiple vCPUs
    • Run – VM is using processor time
    • (Note: At this point the speaker went into great depth about CPU utilization and even more states, so the stats above just scratch the surface.)
  • Storage stats
    • DAVG – Most important disk stat to monitor
    • QAVG – Should be nearly, if not, zero all the time
    • DQLEN – Driver queue length

So there you go..some very low-level counters that you can look at to start troubleshooting performance problems. If you use the vMA you can use resxtop to monitor real-time stats from an ESXi host, so you don’t need to SSH in and do it locally. Better for security, and easier to grab stats from multiple ESXi hosts at once. The esxplot utility is great for analyzing a lot of captured data and easily graph/search it for what you want.

VSP3864: Best practices for virtualized networking

This session was a bit more high level and basic than I had hoped for, but here are the highlights:

  • Virtual Port ID load balancing is the default option and the least complicated option.
  • IP hasing is more advanced and requires Etherchannel to be configured on the switch
  • VTS (virtual switch tagging) is the most common vSwitch configuration
  • Private VLANs provide for L2 isolation. Really good for DMZs.
  • If you use IP hashing on the Cisco switch side you must configure Etherchannel for IP-SRC-DST, which is a global policy on the switch. The default mode on older IOS versions was MAC hashing which is not compatible.
  • If you use beacon probing (not recommended) it really needs three or more NICs to work properly.
  • Enable portfast on and use BPDUGuard to ensure STP boundaries
  • The VMware dVS has smarter load balancing
  • General tips:
    • How to change the VM MAC: KB 1008473
    • Using MS NLB Multicast? KB 1006525
    • Enabling CDP KB 1007069
    • Beacon probing and IP hashing do not mix KB 1017612 and 1012819
    • Check drivers and firmware against the HCL (very important)
    • Use VLAN 4095 on the switches for promiscuous mode
    • In ESXi you can use tcpdump-uw for packet captures. KB 1031186

Nothing earth shattering, but a few good tidbits of information.

CIM1264: Private VMware vCloud Architecture Deep Dive

This was a pretty advanced session about the vCloud Director, which is a complex product. The speakers were very, very good, but given the advanced nature of the discussion it will be hard to recap the session in full and some of the concepts need more explanation than I can provide here. But that being said, here are some of the highlights:

  • Value of chargeback for an organization: accurate TCO/ROI analysis, accountability
  • vCloud architecture
    • Horizontal scaling
    • Multi-tenancy
    • Limit single points of failure in the architecture
    • Leverages load balancers
    • You must make the database highly available or the whole vCD management subsystem goes offline, although existing VMs will continue to run.
    • A vCD architecture is pretty complex and hard to wrap your head around
  • Typically you setup a dedicated management cluster that runs the vCD infrastructure like vCD, AD/DNS, vCenter, SQL, etc.
  • Resource groups are compute resources
  • A virtual datacenter is typically divided into a provider vDC which has a single type of compute and storage resource (single tier of storage).
  • An organization vDC is an allocation from the PvDC
  • vCD has various allocation models which cannot be changed once you instantiate it
    • Pay as you go – Dynamic, unpredictable
    • Allocation Pool – % of resources, can burst, but pretty predictable. Most common type.
    • Reservation pool – Hard caps, not dynamic, cannot burst
  • Networking has three layers
    • External – Internet access, IP storage, backup servers, etc.
    • Organization – Allows vApps to communicate with each other
    • vApp – Private network for communications within the vApp
  • You can define network pools of various types
    • Portgroup backed – Manually create with vCenter
    • VLAN backed – Uses the vDS and you give it a range of VLANs to use. v1.5 supports the N1K
    • vCloud Network isolation (VCD-NI) – Creates networks on the fly and uses MAC on MAC encapsulation. Need to increase your MTU to 1524. VMware’s secret sauce for multi-tenant isolation.
  • New features in vCD 1.5 include:
    • Microsoft SQL server (no more Oracle requirements!!)
    • vSphere 5.0 support
    • Custom guest properties
    • Much faster VM provisioning

The take away from this session is that vCD is very powerful, but also very complex. Today most use cases are test/dev and NOT production. The speakers said possibly next year they will see more production usage. vCD is the replacement for lab manager, which was discontinued last year.

VSP2884: vSphere 5.0 Performance enhancements

This session covered some of the major performance enhancements in vSphere 5.0. The presenter flew through the slides at 100 MPH and didn’t spend much time on the bullets so I wasn’t able to capture all of the highlights. But here’s what I did capture:

  • 32-way vCPUs with 92-97% of native performance
  • CPU scheduler improvements for up to 30% performance increase
  • vNUMA for NUMA aware applications (mostly for HPC). Turned off for < 8way VMs, turned on for 8-way or greater VMs.
  • vCenter can now process double the number of concurrent operations
  • 9x faster HA reconfiguration time
  • 60% more VMs failover in the same time period with the new HA engine
  • NetIOC – True QoS tagging at the MAC layer and user defined network pools
  • Splitrxmode can reduce packet loss dramatically under specific circumstances (30K packets per second, more than 24 VMs on a host)
  • TCP/IP optimizations that boost iSCSI performance
  • Netflow5 support in the dVS
  • Multi-NIC vMotion enablement
  • Storage migration with write mirroring
  • Host cache – SSDs for swap. Memory hierarchy is: Transparent page sharing, ballooning, compression, then host cache.
    • 30% performance improvement over spinning disk swap
  • Storage is the root cause for most virtualization performance problems. !!!
  • (Note, presenter covered many new storage enhancements that I wrote about in previous blogs so I stopped taking notes.)
  • Software FCoE initiator has nearly the same performance as traditional FC HBAs
  • An example vMotion improvement for a 28GB Exchange 2010 VM was from 71 seconds on 4.1 to 47 seconds on 5.0 using 10GbE.
  • VDI workload denisty has also been increased more than 25%

There was a whole bunch of other tidbits that I just couldn’t write down fast enough, but the list above is a good start. vSphere 5.0 has over 200 new features, so clear this list is far from complete.

BCO1269: SRM 5.0 What’s new and Recommendations

Unlike the last session I attended, this one was very good and had quite a bit of great technical material that went beyond common sense most IT guys have. SRM 5.0 is a major release that addressed many of the shortcomings of the 4.x releases. Highlights include:

  • Customers need simple, reliable DR
  • vSphere Replication is storage array independent and happens at the ESXi host level
  • RPO is 15 minutes to 24 hours. You cannot do less than 15 minutes, so this is not suited for applications that must use synchronous replication and have zero data loss like the finance sector.
  • You can take snapshots of the source VM, but they are collapsed at the recovery site
  • Recommend that you don’t use snapshots
  • The replication engine dynamically changes block sizes and speed to ensure the RPO is met. If your RPO is 1 hour it doesn’t wait for 59 minutes then try to burst the data in 60 seconds.
  • Replication is a property of the VM
  • Array based replication can have much shorter RPOs, compression, and WAN optimizer friendly. vSphere replication has none of these features.
  • v1.0 limitations include:
    • ISOs and floppies are not replicated
    • Powered off or suspended VMs are not replicated
    • Non-critical files like swap, stats and dumps are not replicated
    • pRDMs, FT, linked clones and VM templates are not replicated
    • Requires VM HW version 7 or 8
    • A vCenter server is required at both sites
  • Scalability enhancements for SRM include
    • Protection of up to 1000 VMs
    • 10 parallel recovery plans
    • 150 protection groups
    • 500 VMs per protection group
    • These limits are not enforced, just the most VMware has tested and approved
  • Planned migration is a new workflow
    • Used when a controlled migration can be used instead of the smoking hole scenario
    • Will stop if any errors are encountered
    • Shuts down the VMs gracefully
    • Very orderly failover process
    • Application consistent recovery
  • Failback is a new workflow
    • Failback in 4.x was in the nicest sense extremely scary and very manual with high risk
    • Replays existing recovery plan in reverse
    • “Reprotect” feature in the GUI
    • Only supported by SRA/LUN-level replication
    • Failback only supports original VMs, not any new VMs stood up after the original failover
  • Brand new GUI
    • Both sites visible in a single pane of glass
    • Still strongly recommend that customers use Linked Mode
    • Able to set IPs for both sites in the GUI
    • IPv6 support
    • No more sysprep or customization specs needed
    • Huge Re-IP performance increase (huge!)
    • Supports in-guest callouts and scripts for custom app control
    • Extended the APIs for better integration
  • Dependencies
    • Increased to 5 priority groups
    • All actions are parallel within a group unless otherwise specified
    • Able to now craft more elaborate and controlled dependencies but in a far easier manner
    • Tip: Don’t get TOO creative or it will extend your recovery time and may miss your RTO

For customers that have previously used or looked at SRM before and didn’t bite, it’s time to look at it again. The failback support with array based replication is huge! You may still find you need a different product, like InMage, but SRM of yesterday is not the SRM of tomorrow. SRM 5.0 is due out “soon” according to VMware. I did a hands-on lab of SRM, and I will stay it was quite slick and I came away very impressed with the changes.

SPO3990: Best Practices for Storage Tiering and Replication

This session was a bit higher level and more common sense tips, so I split after 30 minutes and went to another session. The speakers had 8 best practice tips, and both of them were from Dell/Compellent. So they had a few minutes sales pitch before they started the meat of the session. Highlights included:

  • Select an array built for virtualization
    • Dynamic storage with wide striping
    • Ability to change RAID levels on the fly, extend LUNs
    • Rely on metadata to intelligently manage storage
  • Let tiered storage do the heavy lifting
    • Sub-LUN data tiering
    • vSphere 5.0 storage DRS is not sub-LUN tiering aware so it may not behave as expected when measuring latency
  • Use thin provisioning on the array and with VMDKs
    • I would actually disagree here and say use array-based thin provisioning and EZT VMDKs if your array supports VAAI, zero detect and thin provisioning.
  • Leverage storage snapshots
    • Protect your data
    • Deploy new VMs (who would really do this?) by cloning a LUN then running sysprep on the cloned VMs. Really?!?

After those four I lost interest and ran over to another session….SRM 5.0 What’s New.