San Diego VMUG: The Power of Server Side Caching

This was a good session presented by Proximal Data, focusing on using flash-based cache in your VMware environment. They have a product called AutoCache that boosts storage performance for VMware environments. Sounds like an interesting product. Session went by very fast, but here are some of the notes I took:

Proximal Data Company profile:

Vision: I/O Intelligence in the hypervisor is a universal need
Near term value is in making use of flash in virtualization

Overview: Proximal Data AutoCache

I/O caching software for ESXi 4.x to 5.x
Up to 2-3x VM density improvement
Business critical apps accelerated
Transparent to ESXi value like vMotion, DRS, etc.
Converts a modest amount of flash
Simple to deploy: Single “VIB” installed on each ESXi host
vCenter plug-in: Caching effectiveness, cache utilization by guest VM

Case Study

Month end processing report now takes 6.5 hours instead of 36.5 hours
Eliminated need to vMotion other guests off during month end processing
Tripled VM density on database servers
Decreased SAS analytics report time by 85%

Flash – The Good

Much faster than disks for random I/O – Sequential I/O performance difference is not as dramatic
Cheaper than RAM

Flash – The Bad

More expensive than spinning disks
Slower than RAM
Asymmetric read/write characteristics – Reads are much faster, writes cause a lot of wear
Wears out/limited lifespan

Flash – The Ugly

Must be erased to be written
Erase granularity is not the write granularity
Typical write granularity is 512 bytes, typical erase granularity is 32K, 64K or 128K
Write/erase characteristics have lead to complexity (Flash translation layers, fragmentation, garbage collection, write amplification)

Flash – Not all are equal

Steady state performance of controllers – as much as 50% performance loss in steady state vs new (stay with Intel, Micron, LSI, Sandforce, not third-tier)
MLC is much cheaper and higher density and is the future, but not as robust and wear out faster than SLC

Flash – Ideal Usage

Random I/O requests – greatest performance gains
A lot more reads than writes
Write in large chunks
Avoid small writes to same logical locations
If data is critical use SLC
Read caching is an ideal use of flash

Caching is Everywhere

Disks have caches, array/RAID controllers, HBAs, OS, application

Caching Basics

Working set of data is likely a subset of the data
Caches are used to manage the “working set” in a resouce that is smaller, faster and more costly than the main storage resource
Cache works best when data flows from a slower device to a faster one
Read caches primarily help read bound systems
Write-back cache primarily help bursty environments
Caches will continue to exist in all layers of the infrastructure

Flash in a Hypervisor

Most caching algorithms developed for RAM caches – No consideration for device asymmetry
Hypervisors have very dynamic I/O patterns
Hypervisors are I/O blenders
Must consider shared environment (latency, allocations, etc.)

Complications of Write-Back Caching

Writes from VMs fill the cache
Cache ultimately flushes to disk
Cache over runs when disk flushes can’t keep up
If you are truly write-bound, a cache will not help
Write-back cache handles write bursts and benchmarks well but is not a panacea

Disk Coherency

Cache flushes MUST preserve write ordering to preserve disk coherency
Hardware copy must flush caches
Hardware snapshots do not reflect current system state without a cache flush

Evaluating Caching

Results are entirely workload dependent
Benchmarks are terrible for characterizing devices. You can make IOmeter say anything you want.
Run your real storage configuration for meaningful results
Beware of caching claims of 100s or 1000x times improvements

Flash Caching Perspective