VSP3116: Resource Management Deep Dive

I finally managed to get into a session by one of the VMware rockstars, Frank Denneman, who has co-authored several books that I highly recommend. Frank stated this topic could be a four day class alone, and this was just an hour, so it went quite quickly and just scratched the surface of the topic at hand. But nonetheless it was informative.

Highlights:

Resource entitlement

Dynamic: CPU and memory
Static: Shares, reservations, limits

Short term contention

Load correlation – Where two servers ramp up/down together (e.g. web and SQL)
Load synchronicity – All servers hammered at once (user logon storm at 8am)
Brown outs – System wide virus scanning at the same time

Long term contention

Ultra high consolidation ratios
Hardware limits exceeded
Massive overcommitment

VM-Level shares

Low (1), Normal (2), High (4)

VM CPU Reservation

Guarantees resources
Influences admission control
CPU does not use resources when VM doesn’t need processing time (fully refundable)
CPU reservation does not equate to priority

VM Memory Reservation

Guarantees a level of resources
Influences admission control
Non-refundable. Once allocated it remains allocated.
Will reduce consolidation ratios

VM Limits

Applies even when there are enough resources
Often more harmful than helpful (don’t use them often unless you like a hole in your foot)
Can very likely lead to negative impacts since the guest OS is not aware of the limits
Any extra memory the guest OS wants comes from swap (after TPS, memory compression), which is very slow.
De-schedules the CPU even if their are resources available and the VM wants them

DRS treats a cluster as one large host
Resource pools – Do not place VMs at the same level in the vCenter hierarchy as a resource pool. Always put VMs inside the appropriate resource pool.
Simple method to estimate resource pool shares

Step 1: Match defined SLA to pool (.e.g. 70 to production, 20 to test, 10 to dev)
Step 2: Make up shares per VM (.e.g. 70/Prod, 20/test, 10/dev).
Step 3: Based on the number of vCPUs per pool multiply shares per VM * vCPUs

E.g. 10 vCPUs for Prod = 700 shares; 5 vCPUs for test = 100 shares; 20 vCPUs for dev = 200 shares.

Schedule at task to do these calculations and set the shares per pool on a nightly basis. As you add VMs and change vCPUs new calculations are needed. Check out Frank’s blog for an example script.

When you configure pool limits remember that each VM has overhead, which is between 5-10% of the total memory. Less VM overhead in ESXi 5.0 than previous versions.
Use resource pool limits with care as they can do more harm than good.
DRS affinity rules

Must run on – Cannot violate under any circumstances. You cannot even power on the VM if it’s on the wrong host. Always honored, even though HA events like host failures.
Should run on – Can be violated as needed, such as during HA events.
NOTE: You must disable Must Run On or Should Run On rules BEFORE you disable DRS, as those settings are honored even when DRS is disabled and you can’t change the rules when DRS is disabled.

Distributed Power Manager (DPM)

Frank did a poll of the room and hardly anyone is using this feature.
vCenter looks at the last 40 minutes and the host must be completely idle to be suspended.
If vCenter senses a ramp up in resource requirements in the last 5 minutes it will take the server out of stand by.
DPM will NOT degrade system performance to save power

Resource pools, shares, limits and reservations can be quite complicated. I strongly recommend checking out Frank’s books for a lot more details.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Casper42

September 2, 2011 12:47 pm

Something else I caught was DPM is enabled by peer hosts in the cluster, not by vCenter. So if you have 2 hosts and at least 1 VM with HA enabled, the secondary host will never go to sleep because if the primary was to fail, there will be no mechanism to then wake up the secondary and get the VMs back online. Which IMHO makes no sense as I know DPM with HP Servers talks to the iLO, so why couldn’t the wake signal be sent from vCenter as opposed to a peer host (often the peer host and… Read more »

Related Posts

Home Assistant: iOS Focus Mode Automation Triggers

Digital Privacy Decoded: Simple Ways to Secure Your Information

Part 2: Ruckus Unleashed (200.18+) Best Practices Guide

Part 1: Ruckus Unleashed (200.18+) Best Practices Guide