San Diego VMUG: The Power of Server Side Caching

This was a good session presented by Proximal Data, focusing on using flash-based cache in your VMware environment. They have a product called AutoCache that boosts storage performance for VMware environments. Sounds like an interesting product. Session went by very fast, but here are some of the notes I took:

Proximal Data Company profile:

  • Vision: I/O Intelligence in the hypervisor is a universal need
  • Near term value is in making use of flash in virtualization

Overview: Proximal Data AutoCache

  • I/O caching software for ESXi 4.x to 5.x
  • Up to 2-3x VM density improvement
  • Business critical apps accelerated
  • Transparent to ESXi value like vMotion, DRS, etc.
  • Converts a modest amount of flash
  • Simple to deploy: Single “VIB” installed on each ESXi host
  • vCenter plug-in: Caching effectiveness, cache utilization by guest VM

Case Study

  • Month end processing report now takes 6.5 hours instead of 36.5 hours
  • Eliminated need to vMotion other guests off during month end processing
  • Tripled VM density on database servers
  • Decreased SAS analytics report time by 85%

Flash – The Good

  • Much faster than disks for random I/O – Sequential I/O performance difference is not as dramatic
  • Cheaper than RAM

Flash – The Bad

  • More expensive than spinning disks
  • Slower than RAM
  • Asymmetric read/write characteristics – Reads are much faster, writes cause a lot of wear
  • Wears out/limited lifespan

Flash – The Ugly

  • Must be erased to be written
  • Erase granularity is not the write granularity
  • Typical write granularity is 512 bytes, typical erase granularity is 32K, 64K or 128K
  • Write/erase characteristics have lead to complexity (Flash translation layers, fragmentation, garbage collection, write amplification)

Flash – Not all are equal

  • Steady state performance of controllers – as much as 50% performance loss in steady state vs new (stay with Intel, Micron, LSI, Sandforce, not third-tier)
  • MLC is much cheaper and higher density and is the future, but not as robust and wear out faster than SLC

Flash – Ideal Usage

  • Random I/O requests – greatest performance gains
  • A lot more reads than writes
  • Write in large chunks
  • Avoid small writes to same logical locations
  • If data is critical use SLC
  • Read caching is an ideal use of flash

Caching is Everywhere

  • Disks have caches, array/RAID controllers, HBAs, OS, application

Caching Basics

  • Working set of data is likely a subset of the data
  • Caches are used to manage the “working set” in a resouce that is smaller, faster and more costly than the main storage resource
  • Cache works best when data flows from a slower device to a faster one
  • Read caches primarily help read bound systems
  • Write-back cache primarily help bursty environments
  • Caches will continue to exist in all layers of the infrastructure

Flash in a Hypervisor

  • Most caching algorithms developed for RAM caches – No consideration for device asymmetry
  • Hypervisors have very dynamic I/O patterns
  • Hypervisors are I/O blenders
  • Must consider shared environment (latency, allocations, etc.)

Complications of Write-Back Caching

  • Writes from VMs fill the cache
  • Cache ultimately flushes to disk
  • Cache over runs when disk flushes can’t keep up
  • If you are truly write-bound, a cache will not help
  • Write-back cache handles write bursts and benchmarks well but is not a panacea

Disk Coherency

  • Cache flushes MUST preserve write ordering to preserve disk coherency
  • Hardware copy must flush caches
  • Hardware snapshots do not reflect current system state without a cache flush

Evaluating Caching

  • Results are entirely workload dependent
  • Benchmarks are terrible for characterizing devices. You can make IOmeter say anything you want.
  • Run your real storage configuration for meaningful results
  • Beware of caching claims of 100s or 1000x times improvements

Flash Caching Perspective

  • Flash will be pervasive in the enterprise
  • Chose the right amount (as little as 200GB can provide a large boost)
  • The closer the cache to the processors, the better the performance
Print Friendly, PDF & Email

Related Posts

Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments