Speaker: Pete Boone
This session covered various aspects of a virtualized infrastructure that need to be looked at when optimizing performance. The basic four food groups are memory, CPU, network, and storage.
- Benchmarking and Tools
- Consistent and reproducible results
- Important to have a baseline of acceptable performance
- Determine baseline of performance prior to deployment
- Avoid subjective metrics, stay quantitative
- Benchmarking should be done at the application layer
- Use application-specific benchmarking tools and load generators
- Isolate variables, benchmark optimum situation before introducing load
- Understand dependencies (human interaction, compare apples-to-apples)
- Tools – vCenter Operations, ESXtop
- Memory
- vRAM + overhead = maximum physical memory
- Transparent page sharing
- Ballooning
- Compression
- Swapping
- Right sizing – Better to over-commit than under-commit
- Don’t use memory limits!
- Ballooning is a warning sign, but not a problem
- Swapping is a problem if over an extended period
- Swapping/paging at the guest level – Under-provisioned guest memory
- Missing balloon driver (VMware Tools)
- Best practices
- Avoid high active host memory over-commitment
- Right-size guest memory
- Ensure there is enough vRAM to cover demand peaks
- Use fully automated DRS cluster
- Use resource pools with high/normal/low shares
- Avoid custom shares setting
- CPU
- CPU cores/threads have to be shared among all VMs
- ESXtop
- %USED – Physical CPU usage
- %SYS – Percentage of time in the VMkernel
- %RUN – Percentage of total scheduled time
- %WAIT
- %IDLE – %WAIT – %IDLE can be used to estimate IO wait time
- vCPUs
- Relaxed co-scheduling in vSphere 4.x and higher
- Idle vCPUs incur a scheduling penalty
- Configure only as many vCPUs as needed
- Use uniprocessor VMs for single-threaded applications
- CPU Ready Time
- Does not necessairly indicate a problem
- vCPU to pCPU allocation – Hyper-threading adds about 30% performance
- Don’t set too may limits or reservations
- Right sizing vSMP VMs
- Storage
- ESXTOP views – Adapter (d), VM (v), Disk device (U)
- High DAVG – Issue beyond the adapter
- High KAVG – Issue is in the kernel storage stack – Driver issue, queue
- Use Storage DRS
- Snapshots – causes extra load to locate blocks
- Excessive traffic down one HBA/switch/storage processor can cause latency
- Use paravirtual SCSI adapater
- Networking
- Load balancing on Port ID is the most compatible
- Check counters for NICs and VMs
- 10Gbps NICs can incur significant CPU load when running at 100%
- If using jumbo frames, ensure it is enabled end to end
- Use VMXNET3 adapter