VMworld 2015: DRS Advancements in vSphere 6.0

Session INF5306

DRS is the #1 scheduler in the datacenter today

92% of clusters have DRS enabled. 79% are in fully automated mode. 87% have affinity and anti-affinity rules.

43% of clusters have resource pools enabled and use them

99.8% of cluster use maintenance mode

Bottom line: DRS is popular

DRS collects innumerable stats every 20 seconds for its calculations

  • CPU Reserved
  • Memory reserved
  • CPU active, run and peak
  • memory overhead, growth-rate
  • Active, consumed and idle memory
  • Shared memory pages, balloon, swapped, etc.
  • VM happiness is the most important metric (if demands/entitlementws are always met, then VM is ‘happy’)

Constraints for initial placement and load balancing

  • Constraints are a big part of decision making
  • HA admission control policies
  • Affinity and anti-affinity rules
  • # concurrent vMotions
  • Time to complete vMotion
  • Datastore connectivity
  • vCPU to pCPU ratio
  • Reservations, limits and share settings
  • Agent VMs
  • Special VMs (SMP-FT, vFlash, etc.)

Cost Benefit and minGoodness

  • Cost-benefit analysis – VM happiness is evaluated against the cost of a migration
  • Cost considerations: per vMotion of 30% CPU core for 1Gb and 100% of a core for 10Gb; Memory consumption of ‘shadow VM’ at the destination host
  • Benefit considerations: Positive performance benefit to VMs at the source host, overall workload distribution has to be much better
  • Each analysis results in a rating from -2 to +2
  • MinGoodness (migration threshold slider) is -2 to +2. User can set this.

Takeaway

  • VM happiness is the #1 influence
  • Influenced by real time stats, constraints and cost/benefit analysis
  • A small imbalance should not be a concern
  • Default setting of DRS aggressiveness is best

New Features in vSphere 6.0

  • Network-aware DRS – ability to specify bandwidth reservation for important VMs
  • Initial placement based on VM bandwidth reservation
  • Automatic remediation in response to reservation violations due to pNIC saturation, pNIC failure
  • Tight integration with the vMotion team and will do a unified recommendation for cross-vCenter vMotion
  • Runs a combined DRS and SDRS algorithm to generate a tuple (host, DS)
  • CPU, memory, and network reservations are considered as part of admission control
  • All the constraints are respected as part of the placement
  • VM-to-VM affinity and anti-affinity rules are carried over during cross-cluster and cross-vCenter migration
  • Initial placement enforces the affinity and anti-affinity constraints
  • Improved overhead computation – greatly improves the consolidation during power-on

Cluster Scale and Performance Improvements

  • Increased cluster capacity to 64 hosts and 8K VMs
  • DRS and HA extensively tested at maximum scale for VCSA and Windows
  • Up to 66% performance increase in vCenter (power on, DRS calcs, etc.)
  • VM power-on latency has reduced by 25%
  • vMotion operation is 60% faster
  • Faster host maintenance mode

Extensive Algorithm Usage

  • DRS is the lynchpin of the SDDC vision
  • vSphere HA
  • VUM
  • vCloud Director
  • vCloud Air
  • Fault Tolerance
  • ESX Agent Manager

Best Practices

  • Tip #1: Full storage connectivity
  • Tip #2: Power management settings – Set BIOS to OS control and vSphere to balanced.
  • Tip #3: Threshold setting – Default of 3 works great.
  • Tip #4: Automation level – Fully automated is best choice
  • Tip #5: Beware of resource pool priority inversion. Make sure that cramming more VMs won’t dilute the shares.
  • Tip #6: Avoid setting CPU-affinity

Future Directions

Proactive HA

  • Proactive evacuation of VMs based on hardware health metrics
  • Partnering with hardware vendors to integrate and certify
  • Moderately degraded mode and severely degraded modes
  • VI admin can configure the DRS action for each health state event
  • Host maintenance mode and host quarantine mode
  • VI admin can filter events

Network DRS v2

  • Take pNIC saturation into account
  • Tighter integration with NSX
  • Ensure mice and elephant flow doesn’t share same network path
  • Network layout topology – leverage topology for availability and performance optimizations

Proactive DRS

  • Tighter integration with VRops analytics engine
  • Periodic and seasonality demands incorporated into decision making

What-if Analysis

  • A sandbox tab in UI to run ‘what if’ analysis
  • VM availability assessment by simulating host failures
  • Cluster over commitment during maintenance window

Auto-scale of VMs

  • Horizontal and vertical scaling to maintain end-to-end SLA guarantees
  • Spin-up and spin-down VMs based on workload
  • Will first be offered as a service in vCloud air
  • Increase CPU and memory resources to meet performance goals
  • CPU/memory hot add is an additional option for DB tier

Hybrid DRS

  • Make vCloud-air a seamless extension of enterprise datacenter capacity through policy based scheduling

 

 

Print Friendly, PDF & Email

Related Posts

Subscribe
Notify of
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments