VMworld 2012: vSphere HA and Datastore Access Outages INF-BCO2807

This session was extremely technical and went over the inner workflows of HA. For a better and more in-depth details, I would strongly suggest getting the VMware vSphere 5.1 Clustering Deepdive book.

  • HA protects against three failure modes: Host/VM failures; host network isolated and datastore PDL; Guest OS hangs and apps crashes
  • Datastore accessibility outages occur infrequently but have a large cost
  • vSphere 5.0 introduced FDM, or Fault Domain Manager, which completely replaces the 4.x HA agent and software.
  • Datastores are used for two purposes by HA: Communications channel between FDMs and persistent storage for configuration information
  • Heartbeat datastores – two chosen by each host, enables the master to detect VM power states.
  • Best practice: Use “leave powered on” host isloation response option
  • In 5.0 U1, Permanent Device Loss (PDL) the guest I/O will trigger the VM to be killed, and HA will restart it on a host that can access the datastore.
  • Futures for HA
    • Add support for All Paths Down (APD)
    • Tiggere by PDL/APD declaration rather than guest I/Os
    • Full customization of responses
    • Full user interface and detailed reporting
    • VM placement sensitive to accesibility
