VMworld 2016: VM and App Protection

Session: INF8939

4 Step Program for Success

  • Define – Gather requirements – RPO/RTO
  • Research and design – look at various technologies
  • Acquire and implement – Document
  • Test and operate – Continuous testing, continuous research

Disaster recovery and business continuity

  • DR is recovery of data
  • BC is the full business process of recoverying

Define Requirements

  • What are you  trying to protect? apps, VMs, DBs, etc.
  • What are you protecting against? data loss, data corruption, disaster, etc.
  • What is your RPO? zero, minutes, hours, days
  • What is your RTO?
  • How long do I need to keep data? retention policy, archiving, etc.

Protection Tiers

  • Tier 1 – mission critical
  • Tier 2 – Required for longer term business continuity
  • Tier 3 – Nice to have but not required

Tape Backup

  • Cheapest medium
  • RPO of hours to days
  • RTO – depends on how much data
  • Good for archival/long term retention

Hardware Snapshots

  • Snap/restore data in seconds from GB to TB
  • Application consistent storage snapshots – not needed for all VMs
  • Data on primary storage can be expensive

Array Replication

  • Async or sync
  • Only changed data sent
  • Flexible RPO options
  • RTO is based on how data is restored

Site Recovery Manager

  • Integrates with vSphere for site failover
  • Able to test and re-test
  • Requires array integration

Continuous Data Protection

  • Flexible RPO options
  • RTO based on amount of data
  • Only changed data sent


  • May be appliance or software based
  • Most  integrate with traditional backup
  • Integrates with CDP


  • Typically continuous backup so low RPO
  • Backup and recovery limited by bandwidth
  • May have longer recovery times
  • Can take a long time to seed backups

vSphere Metro Cluster

  • Zero RPO/RTO (time to restart apps is  not zero)
  • Great for site protection
  • Layer 2 stretching
  • No application specific backup/restore

Fault Tolerance

  • Limited in supported vCPUs
  • Requires high bandwidth between hosts
  • Does not protect against OS/App failures


.Next 2016: Business Continuity for Tier-1 Apps

Presenters: Partha R (NTX), Mike McGhee (NTX), Ryan Sheldon (AMGEN)

Tier-1 apps are different for different customers. Could be ERP, EPOS, email, etc.

Need to be prepared for the worst: IT Productivity, recovery, end-user productivity, business disruption, lost revenue, etc.

Data protection requirements: RPO, RTO

Understand consistency: Application consistency (e.g. VSS). Most beneficial in the context of backup and restore.

Technologies: Application (Exchange DAGs, SQL AAGs, Oracle Dataguard), Hypervisor (hyper-v replica, vsphere replication), storage (infrastructure centric, Nutanix Data Protection)

What am I protecting? Application dependencies? One or multiple methods? How frequently do you backup and replicate? Sync vs. async?

Where does Nutanix native replication fit in?

Timestream – RTO of minutes, RPO of minutes

Cloudconnect – RTO hours, RPO hours

Async replication – RTO minutes, RPO minutes

Sync replication – RTO near-zero, RPO zero

Nutanix Local Snapshot (Time stream)

  • Protection against guest os corruption
  • Snapshot of VM environments
  • VM or vdisk granularity
  • Low performance impact
  • VM and application level consistency

Nutanix Async Replication

  • Delta changes
  • Dedupe on the wire
  • Compression on the wire
  • Flexible topologies
  • Bandwidth schedules

Nutanix Cloud Connect

  • Hybrid cloud solution from Nutanix
  • Integration with Azure and AWS
  • For archiving and backup
  • Easy to setup and manage
  • WAN optimized replication
  • Interop with Nutanix DR

Volume Groups with Async Replication

  • Exchange + iSCSI
  • MS SQL Clustering
  • Oracle RAC
  • Bare Metal

Consistency options: VSS

  • VMware Tools or Nutanix guest tools
  • Pre and post scripts for Linux

Consistency options: Consistency Groups

  • For a group of VMs and a consistent restart point


Manufacturing is key. They leverage Metro availability to provide high uptime. Customer set it up in about 10 minutes. With MA they can converge services like DNS, DHCP, etc. and save on licensing costs.


What’s next: Metro Availability Witness coming out in Asterisk, which will automate site-failover.

VMworld 2015: What’s new in SRM

  • Application  uptime is key for businesses
  • 40% of companies still use tape for DR purposes
  • Legacy DR solutions can lead to extended periods of downtime
  • Announcing SRM 6.1 and site recovery manager air
  • From private cloud to public cloud – the hybrid cloud
  • SRM automates every workflow of DR orchestration
  • Non-disruptive testing, automated failback, automated failover, planned migrations
  • Introduced in 2008

What’s new in SRM 6.1

  • Policy-based management – New protection groups using vSphere policies & vRA integration
  • Integration with VMware NSX – Automated network mapping
  • zero-downtime application mobility – Orchestrated cross-vCenter vMotion using recovery plans

Policy Based DR

  • Association of new datastores with SPPG (storage-profile protection group)
  • Protection of VMs on replicated datastores within SPPG
  • Removal of VMs from SPPG when datastore is removed

NSX Integration

  • Network virtualization reduces OpEx and accelerate recovery
  • SRM 6.1 supports NSX 6.2 cross-vCenter logical switches
  • Automatic mapping of networks
  • Federated NSX security rules on recovered VMs
  • Faster recovery time by 40%

SRM support for Active-Active Datacenters

  • New support!
  • Production apps at both sites
  • Zero downtime for planned events
  • Typically limited to metro distances
  • Uses cross-vCenter vMotion for planned events
  • Day 0 support for EMC VPLEX, IBM SVC, HDS VSP
  • Can enable zero RTO/RPO

SRM Family enables hybrid cloud availability and mobility

  • DRaaS
  • Cloud on-ramp
  • Fast time-to-market
  • Site Recovery Manager air automates vCloud air disaster recovery
  • Deployed and managed as DRaaS
  • Not available today, but coming in the future
  • Provides detailed reports of recovery execution plans
  • Demos SRM air

VMworld 2014: BC/DR: Solution Overview

Session: BCO2410

43% of data center outages are due to power, 31% by IT hardware failure

A comprehensive portfolio for cost-effective IT resilience: Ranges from Oracle RAC, Microsoft MSCS, replication and backup

The hypervisor opens up new opportunities: Knows the needs of all apps in real time, sits directly in the I/O path, global view of underlying infrastructure, hardware agnostic

VMware Software defined storage has BC/DR services like data protection and data replication

Local application availability: vMotion, storage vMotion, fault tolerance, high availability, and app HA

Data protection: vSphere Replication, vSphere Data protection advanced

Site application availability: vCenter site recovery manager, vCloud Air disaster recovery

  • array based replication for zero data loss, available through storage partners
  • vSphere replication: RPO from 15 minutes to 24 hours
  • vSphere Data protection advanced: One day minimum RPO and backup data replication

vSphere Data Protection (VDP)

  • Agentless product
  • Based on EMC Avamar
  • end-to-end integration with vSphere
  • Simplifies backup and recovery of VMs
  • 4x more efficient through dedupe and 6x faster recovery
  • Comes in two editions: VDP and VDP advanced. VDP is appropriate for 50 VMs; VDP advanced for 200 VMs

What’s new in VDP

  • Improved in 5.8 is app-aware agents for SQL clusters and Exchange DAGs
  • New in 5.8 are backup proxies and enhanced restore from replicated backups
  • Configurable parallel backups (up to 24 VMs at a time)
  • Replicate and restore anywhere
  • Support for Linux LVM and EXT4
  • Support up to 20 VDP appliances per vCenter

vSphere Replication

  • Hyervisor based replication
  • VM-centric and storage array agnostic
  • Flexible RPO (15 minutes to 24 hours)
  • Network efficient

Building Blocks for disaster recovery

  • DR Orchestration
  • Replication
  • Backup and Recoery
  • Compute
  • Storage

Site Recovery Manager (SRM)

  • Introduced in 2008
  • 14,000 Customers
  • 2.4 million VMs protected
  • Centralized recovery plans for thousands of VMs
  • non-disruptive recovery testing
  • Automated DR workflows
  • Integrated with the VMware product stack
  • Lowers the cost of DR by 50%
  • Eliminates complexity and risk of manual processes
  • Enables predictable RTOs
  • Provides policy driven DR control for any app
  • 1-click DR
  • Test to rest
  • For disaster recovery, disaster avoidance, and planned migration

What’s new in SRM 5.8?

  • 5x scale of the protection – up to 5,000 VMs
  • 2x scale of recovery – to 2,000 VMs
  • vSphere web client plug-in

vCloud Air Disaster Recovery

  • Warm standby VM in the cloud
  • 15 minutes to 24 hours RPO
  • Terms are 1m, 12m, 24m, 36m subscriptions