VMworld 2013: Software Defined Storage the VCDX Way

This was a great “put your architecture cap on” session by two well known VCDX’s, Wade Holmes and Rawlinson Rivera. Software defined <insert virtualization food group here> is all the rage these days. Be it SDN (networking), SDS (storage), SDDC (datacenter) or software defined people. Well maybe not quite at the people stage but some startup is probably working on that.

Given the explosion of SDS solutions, or those on the near horizon, you can’t put on your geek hat and just throw some new software storage product at the problem and expect good results. As an engineer myself “cool” new products always get my attention. But an IT architect has to look at SDS from a very different perspective.

This session gave an overview of the VCDX Way for SDS. I took a different approach to this session’s blog post from most other ‘quick publish’ VMworld session notes. Given the importance of SDS and the new VMware products, I’ve made this post a lot longer and tried to really capture the full breadth of the information presented by Rawlinson and Wade.

Session Introduction

How do you break down the silos in an organization? How do you align application and business requirements to storage capabilities? In the “old” days you matched up physical server attributes such as performance, high availability and performance to a specific workload. Big honking database servers, scale out web servers, or high IOPS email systems.

In the virtual era you gained flexibility, and can better match up workloads to pools of compute resources. Now it is much easier to implement various forms of high availability, scale-out performance, and greatly increase provisioning speed. But some subsystems like storage, even with tools like storage IO control, VASA, and storage DRS were blunt instruments trying to solve complex problems. Did they help? Absolutely, are they ideal? Not at all.

The final destination on the journey in this session is software defined storage (SDS). The remainder of this session covered the “VCDX Way” to SDS. This methodology enables efficient technology solution design, implementation and adoption to meet business requirements. I’ve heard from several people this week the array of storage solutions is nearly bewildering and so following the methodology can help you make your way through the SDS maze and ultimately be very successful in delivering solid solutions.

VCDX Way

  • Gather business requirements
  • Solution Architecture
  • Engineering specifications
  • Features: Availability, Manageability, Performance, Recoverability, Security

Software-Defined Storage

Software defined storage is all about automation with policy-driven storage provisioning backed by SLAs. To achieve this storage control logic is abstracted into the software layer. No longer are you tied to physical RAID sets, or using blunt instruments like a VMFS datatore to quasi match up application requirements with performance, availability, and recovery requirements.

The control plane needs to be flexible, easy to use and automatable like crazy. The presentation slide below shows Storage Management with SDS of “tomorrow”. At the top level is the policy-based management engine, better known as the control plane. Various data servers are then offered, such as replication, deduplication, security, performance, and availability. In the data plane you have the physical hardware which would be a traditional external storage array, or the new fangled JBOD scale-out storage tier.

software defined storage

Three Characteristics of SDS

  • Policy-Driven control plane – Automated placement, balancing, data services, provisioning
  • App-centric data services – Performance SLAs, recoverability, snapshots, clones, replication
  • Virtualized data plane – Hypervisor-based pooling of physical storage resources

Solution Areas – Availability

Availability is probably one of the first storage properties that pops to mind for the average IT when you think about storage. RAID level and looking at the fault domain within an array (such as shelf/cage/magazine availability) are simple concepts. But those are pre-SDS concepts that force VMs to inherit the underlying datastore and physical storage characteristics. The LUN-centric operational model is an operational nightmare and the old way of attempting to meet business requirements.

If you are a vSphere administrator then technologies such as VAAI, storage IO control, storage DRS, and storage vMotion are tools in your toolbox to enable meeting application availability and performance requirements. Those tools are there today for you to take advantage of, but were only the first steps VMware took to provide a robust storage platform for vSphere. You also need to fully understand the fault domains for your storage.

Take into account node failures, disk failures, network failures, and storage processor failures. You can be assured that at some point you will have a failure and your design must accommodate it while maintaining SLAs. SDS allows the defining of fault domains on a per-VM basis. Policy based management is what makes VM-centric solutions possible.

Instead of having to define characteristics at the hardware level, you can base it on software. VM storage profiles (available today) is an example of a VM-centric QoS capability. But those are not widely used. Think about how you scale a solution and the cost. Cost constraints are huge, and limit selection. Almost nobody has an unlimited budget, you carefully need to initial capital costs, as well as future expansion and operational costs.

Solution Areas – Management

Agility and simplified management are a hallmark of SDS, enabling easy management of large scale-out solutions. The more complex a solution is, the more costly it will be over the long term to maintain. In each release of vSphere VMware has been introducing building blocks for simplified storage management.

The presenters polled the audience and asked how many were using VASA. Only a couple of people raised there hand. They acknowledged that VASA has not seen wide adoption. In the graphic below you can see VMware’s progression from a basic set of tools (e.g. VASA 1.0), to the upcoming VSAN product (VASA 1.5), to the radically new storage model of vVOLs (VASA 2.0). No release data for VVOLs was mentioned, but I would hope they come to fruition in the next major vSphere release. VSAN is a major progression in the SDS road map, and should be GA in 1H 2014.

software defined storage management

The speakers ran through the VSAN VM provisioning process, and highlighted the simple interface and the ability to define on a per-VM level the availability, performance and recoverability characteristics you require. As stated earlier, we are now at the stage where we can provide VM-centric, not datastore or LUN centric, solutions. Each VM maintains its own  unique policies in the clustered VSAN datastore.

Management is not just about storage, but about the entire cloud. Think about cloud service provisioning which is policy-based management for compute, networking and storage resources. Too many options can become complex and difficult to manage. Personally, I think VMware still has room for improvement in this area. VSAN, Virsto, vVOLS, plus the myriad of third-party SDS solutions like PernixData, give customers a lot of options but can also be confusing.

Solution Areas – Performance

Clearly storage performance is a big concern, and probably the most common reason for slow application performance in a virtualized environment. Be it VDI or databases or any other application key performance indicators are  IOPS, latency and throughput. Applications have widely varying characteristics, and understanding them is critical to matching up technologies with applications. For example, is your workload read or write intensive? What is the working set size of the data? Are the IOs random or sequential? Do you have bursty activity like VDI boot storms?

With VMware VSAN you can reserve SSD cache on a per-VM basis and tune the cache segment size to match that of the workload. These parameters are defined at the VM layer, not a lower layer, so they are matched to the specific VM workload at hand. VMware has recently introduced new technologies such as Virsto and Flash Read Cache to help address storage performance pain points. Virsto helps address the IO blender effect by serializing writes to the back-end storage, and remove the performance penalty of snapshots, among other features. 20130829_124601 The VMware VSAN solution is a scale-out solution which lets you add compute and storage node in blocks. There were several sessions at VMworld on VSAN, so I won’t into more details here. 20130829_124250

Solution Area – Disaster Recovery

Disaster recovery is extremely important to most businesses, but is often complex to configure, test, and maintain. Solutions like SRM, which use array-based replication, are not very granular. All VMs on a particular datastore have the same recovery profile. This LUN-centric method is not flexible, and complex to manage. In contrast, future solutions based on vVOLS or other technologies enable VM-level recovery profile assignment. Technologies such as VMware NSX could enable pre-provisioning of entire networks at a DR site, to exactly match those of the production site. The combination of NSX and VM-level recovery profiles will truly revolutionize how you do DR and disaster avoidance.

20130829_124819

Solution Area – Security

Security should be of concern in a virtual environment. One often overlooked area is security starting at the platform level by using a TPM (trusted platform module). TPM enables trusted and measured booting of ESXi. Third party solutions such as Hytrust can provide an intuitive interface to platform security and validate that ESXi servers only boot using known binaries and trusted hardware.

I make it a standard practice to always order a TPM module for every server, as they only cost a few dollars. How does this relate to SDS? Well if you use VSAN or other scale-out storage solutions, then you can use the TPM module to ensure the platform security of all unified compute and storage blocks. On the policy side, think about defining security options on a per-VM basis, such as encryption, when using vVOLs. The speakers recommended that if you work on air-gapped networks, then looking a fully converged solutions such as Nutanix or Simplivity can increase security and simplify management.

20130829_124900

Example Scenario

At the end of this session Wade and Rawlinson quickly went through a sample SDS design scenario. In this scenario they have a rapidly growing software company, PunchingClouds Inc. They have different application tiers, some regulator compliance requirements, and short staffed with a single storage admin.

20130829_125124

The current storage design looks like the model fibre channel SAN with redundant components. The administrator has to manage VMs at the LUN/datastore level.

20130829_125145

At this point you need to do a full assessment of the environment. Specifications such as capacity, I/O profiles, SLAs, budget and a number of other factors need to be thoroughly documented and agreed upon by the stakeholders. Do you have databases that need high I/O? Or VDI workloads with high write/read ratios? What backup solution are they currently using?

20130829_125206

After assessing the environment you need to work with the project stakeholders and define the business requirements and constraints. Do you need charge back? Is cost the primary constraint? Can you hire more staff to manage the solution? How much of the existing storage infrastructure must you re-use? All of these questions and more need to be thoroughly vetted.

20130829_125239

After thorough evaluation of all available storage options, they came up with the solution design as shown in the slide below. It consists of a policy-based management framework, using two isolated VSAN data tiers, but also incorporates the existing fibre channel storage array.

20130829_125308

Summary

The SDS offers a plethora of new ways to tackle difficult application and business requirements. There are several VMware and third-party solutions on the market, with many more on the horizon. In order to select the proper technologies, you need a methodical and repeatable process, “The VCDX Way”, to act as your guide along the SDS path. Don’t just run to the nearest and shiniest cool product on the market and just hope that it works. That’s not how an enterprise architect should approach the problem, and your customers deserve the best-matched solution as possible so that you become a trusted solution provider solving business critical needs.

VMworld 2013: Virtualizing HA SQL Servers

Twitter #VAPP5932; Presenter: Scott Salyer (VMware)

This session was focused on the various Microsoft SQL server high availability options and how they mesh with vSphere and its HA options. Unlike Exchange 2013, SQL 2012 has several HA architectures to choose from. Some applications may only support one or two SQL HA models, so don’t jump on a particular bandwagon without doing a requirements analysis and product compatibility research. For example, only recently have applications started to support SQL 2012 AlwaysOn AGs, and even then, they may not support them using SSL encryption. Also, don’t just build a SQL cluster for the hell of it. Carefully consider your requirements, since clustering is somewhat complex. Do you really need 99.999% availability?  Do you have the skillset to manage it?

Finally, some DBAs may be stuck in a rut and think that physical SQL clusters are better than virtualizing them. With today’s hypervisors and best practices, there’s no reason why tier-1 SQL databases can’t be fully virtualized. However, that requires careful planning, sizing, and following best practices. Don’t think that SQL will run inherently slower on vSphere, because it’s not vSphere that maybe impacting performance. It’s one or more subsystems that were not properly configured or tuned which is making it run slow. As we move towards the fully SDDC (software defined datacenter) virtualizing all workloads is important to realizing all of the benefits of moving away from physical instances of services.

Agenda

  • Why virtualize
  • Causes of downtime and planning
  • Baseline HA
  • AlwayOn AGs
  • SQL Server failover cluster
  • Rolling Upgrades
  • DR and Backup

 

Causes of Downtime

  • Planned downtime – Software upgrade, HW/BIOS upgrades
  • Unplanned downtime – Datacenter failure, server failure, I/O failure, software data corruption, user error

Native Availability Features

  • Failover Clustering – Local redundancy, instance failover, zero data loss. Requires RDMs; can’t use VMDKs. Not the current preferred option.
  • Database mirroring – Local server and storage redundancy, DR, DB failover, zero data loss with high safety mode
  • Log Shipping – Multiple DR sites, manual failover required, app/user error recovery
  • AlwaysOn – New in 2012, multiple secondary copies
  • DB mirroring, log shipping and AlwaysOn are fully supported by HA, DRS

Planning a Strategy

  • Requirements – RTO and RPOs
  • Evaluating a technology
  • What’s the cost for implementing and expertise?
  • What’s the downtime potential?
  • What’s the data loss exposure?

VMware Availability Features

  • HA protections against host or OS system failure
  • What is your SLA for hardware failures? Re-host your cluster on VMware for faster node recovery
  • VM Mobility (DRS) – Valid for all SQL HA options except failover clustering
  • Storage vMotion

20130828_124317

20130828_124229

AlwaysOn High Availability

  • No shared storage required
  • Database replication over IP
  • Leverage ALL vSphere HA features, including DRS and HA
  • Readable secondary
  • Compatible with SRM
  • Protects against HW, SW and DB corruption
  • Compatible with FC, iSCSi, NFS, FCoE
  • RTO in a few seconds

Deploying AlwaysOn

  • Ensure disks are thick eagered zeroed disks
  • Create DRS anti-affinity to avoid running VMs on the same host
  • Create Windows Failover cluster – use node and file share majority
  • Create AG for database
  • Create database listener for the AG
  • Monitor AG on the SQL dashboard

SQL Server Failover Clustering

  • Provides application high-availability through a shared disk architecture
  • Must use RDMs and FC or iSCSI
  • No protection from database corruption
  • Good for legacy app support that are not mirror aware
  • DRS and vMotion are not available
  • KB article 1037959 for support matrix

Rolling Upgrades

  • Build up a standby SQL server, patch, then move DBs to standby server and change VM name/IP to production name
  • Think about using vCenter orchestrator for automating the rolling patch upgrade process

Disaster Recovery and Backup

  • VMware vCenter SRM
  • Use AlwaysOn to provide local recovery
  • Use SRM to replicate to a recovery site
  • Backup – In guest backup can increase CPU utilization

20130827_172303

VMworld 2013: Exchange on VMware Best Practices

Twitter: #VAPP5613, Alex Fontana (VMware)

This session was skillfully presented and was jam packed with Exchange on VMware best practices for architects and Exchange administrators. Can you use Exchange VMDKs on NFS storage? Can you use vSphere HA and DRS? How can you avoid DAG failover with vMotion? What’s the number one cause of Exchange performance problems? All of these questions and more were answered in this session. If you just think a “click next” install of Exchange is adequate for an enterprise deployment then you need to find a new job. Period.

Agenda

  • Exchange on VMware vSphere overview
  • VMware vSphere Best Practices
  • Availability and Recovery Options
  • Q&A

Continued Trend Towards Virtualization

  • Move to 64-bit architecture
  • 2013 has 50% I/O reduction from 2010
  • Rewritten store process
  • Full virtualization support at RTM for Exchange 2013

Support Considerations

  • You can virtualize all roles
  • You can use DAGs and vSphere HA and vMotion
  • Fibre Channel, FCoE and iSCSI (native and in-guest)
  • What is NOT supported? VMDKs on NFS, thin disks, VM snapshots

Best Practices for vCPUs

  • CPU over-commitment is possible and supported but approach conservatively
  • Enable hyper-threading at the host level and VM (HT sharing: Any)
  • Enable non-uniform memory access. Exchange is not NUMA-aware but ESXi is and will schedule SMP VM vCPUs onto a single NUMA node
  • Size the VM to fit within a NUMA node – E.g. if the NUMA node is 8 cores, keep the VM at or less than 8 vCPUs
  • Use vSockets to assign vCPUs and leave “cores per socket” at 1
  • What about vNUMA in vSphere 5.0? Does not apply to Exchange since it is not NUMA aware

CPU Over-Commitment

  • Allocating 2 vCPUs to every physical core is supported, but don’t do it. Keep 1:1 until a steady workload is achieved
  • 1 physical core = 2400 Megacycles = 375 users at 100% utilization
  • 2 vCPU VM to 1 core = 1200 megacycles per VM = 187 users per VM @ 100% utilization

Best Practices for Virtual Memory

  • No memory over-commitment. None. Zero.
  • Do not disable the balloon driver
  • If you can’t guarantee memory then use reservations

Storage Best Practices

  • Use multiple vSCSI adapters
  • Use Eager thick zeroed virtual disks
  • Use 64KB allocation unit size when formatting NTFS
  • Follow storage vendor recommendations for path policy
  • Set power policy to high performance
  • Don’t confuse DAG and MSCS when it comes to storage requirements
  • Microsoft does NOT support VMDKs on NFS storage for any Exchange data including OS and binaries. See their full virtualization support statement here.

Why multiple vSCSI adapters?

  • Avoid inducing queue depth saturation within the guest OS
  • Queue depth is 32 for LSI, 64 for PVSCSI
  • Add all four SCSI controllers to the VM
  • Spread disks across all four controllers

In the two charts below you can see the result of the testing when using 1 vSCSI adapter vice four. When using just one adapter the performance was unacceptable, and the database was stalling. By just changing the distribution of the VMDKs across multiple vSCSI adapters performance vastly increased and there were no stalls.

20130828_14351020130828_143605

When to use RDMs?

  • Don’t do RDMs – no performance gain
  • Capacity is not a problem with vSphere 5.5 – 62TB VMDKs
  • Backup solution may require RDMs if hardware array snapshots needed for VSS
  • Consider – Large Exchange deployments may use a lot of LUNs and ESXi hosts are limited to 255 LUNs (per cluster effectively)

What about NFS and In-Guest iSCSI?

  • NFS – Explicitly not supported for Exchange data by Microsoft
  • In-guest iSCSI – Supported for DAG storage

Networking Best Practices

  • Use vMotion to use multiple NICs
  • Use VMXNET3 NIC
  • Allocate multiple NICs to participate in the DAG
  • Can use standard or distributed virtual switch

Avoid Database Failover during vSphere Motion

  • Enable jumbo frames on all vmkernel ports to reduce frames generated – helped A LOT
  • Modify cluster heartbeat setting to 2000ms (samesubnetdelay)
  • Always dedicate vSphere vMotion interfaces

High Availability with vSphere HA

  • App HA in vSphere 5.5 can monitor/restart Exchange services
  • vSphere HA allows DAG to maintain protection failure
  • Supports vSphere vMotion and DRS

DAG Recommendations

  • One DAG member per host, If multiple DAGs, those can be co-located on same host
  • Create an anti-affinity rule for each DAG
  • Enable DRS fully automated mode
  • HA will evaluate DRS rules in vSphere 5.5

vCenter Site Recovery Manager + DAG

  • Fully supported
  • Showed a scripted workflow that fails over the DAG

And finally the key take aways from the session..

20130828_145813

VMworld 2013: Distributed Switch Deep Dive

Twitter: #VSVC4699, Jason Nash (Varrow)

Jason Nash is always a good speaker, and keeps the presentations interesting with live demos instead of death by PowerPoint. This was a repeat session from last year, with a few new vSphere 5.5 networking enhancements sprinkled in. vSphere 5.5 does not have any major new networking features (NSX is a totally different product), but as you will see from the notes gets some “enhancements”. This session does not cover NSX at all, it is just about the vSphere Distributed switch. I always try and attend a session by Jason each year, and in the past he’s had Nexus 1000v sessions which I found very helpful for real-world deployment.

Standard vSwitches

  • They are not all bad
  • Easy to troubleshoot
  • Not many advanced features
  • Not much development doing into them

Why bother with the VDS?

  • Easier to administer for medium to large environments
  • New features: NOIC, port mirroring, NetFlow, Security (private VLANs), ingress and egress traffic shaping, LACP

Compared to Others?

  • VDS (vSphere Distributed Switch)
  • Cisco Nexus 1000v
  • IBM 5000v (little usage)
  • VDS competes very well in all areas
  • Significant advancements in 5.1 and minor updates in 5.5

vSphere 5.5 New Features

  • Enhanced LACP – Multiple LAGs per ESXi host
  • Enhanced SR-IOV – Most of the software stack is now bypassed
  • Support for 40g Ethernet
  • DSCP Marking (QoS)
  • Host level packet capture
  • Basic ACLs in the VDS
  • pktcap

Why should you deploy it?

  • Innovative features: Network I/O control, load-based teaming
  • Low complexity
  • Included in Enterprise Plus licensing
  • No special hardware required
  • Bit of a learning curve, but not much

Architecture

  • VDS architecture has two main components
  • Management or control plane are integrated into vCenter
  • Data plane is made up of hidden vSwitches on the vSphere host
  • Can use physical or virtual vCenters
  • vCenter is key and holds the configuration

Traffic Separation with VDS

  • A single VDS can only have one uplink configuration
  • Two options: Active/Standby/Unused or multiple VDS
  • Usually prefer a single VDS
  • Kendrickcoleman.com

Lab Walk Through

  • If using LACP/LAG, make sure one side is active, one is passive
  • LACP/LAG hashing algorithms must match on BOTH sides otherwise weird problems can happen
  • When using LAG groups, the end state must have all NICs active (can’t use active/standby)
  • Private VLAN config requires physical switch configuration and support
  • Netflow switch IP is just the IP address shown in the logs to correlate the data to a switch. The traffic will not be coming from that IP.
  • Encapsulated remote mirroring (L3) source is the most common spanning config
  • Switch health checks runs once per minute – Checks things such as jumbo frames and switch VLAN configuration
  • Don’t use ephemeral binding if you want to track net stats (could be used for VDI)
  • Use static port binding for most server workloads

VMworld 2013: vSphere 5.5 Web Client Walkthrough

Twitter: #vsvc5436; Ammet Jani (VMware), Justin King (VMware)

This was a great session by Justin King where he conveyed a logical and compelling story why users should migrate to the web client for managing their vSphere infrastructure. Yes, vSphere 5.5 is REALLY the last version to have a Windows C# client. In vSphere v.Next, it shall go the way of the dodo bird. The tweaks in the vSphere 5.5 web client should ease some of the pain points in 5.1, such as slow context menus. Bottom line is: Start learning the web client. Do I hear you asking..what about VUM and SRM in the web client? Those questions are answered in my session notes. Oh and using Linux and want to access the web client? That little nugget is below as well.

Agenda

  • Where the desktop client fell short
  • New face of vSphere administration
  • Multi tiered architecture
  • workflows
  • vSphere web client plug-ins
  • SDK
  • Summary

Web Client

  • Last client release for VI Client (5.5)
  • Why did VMware keep it around? VUM and Host Client
  • There will be a VUM successor that will have a full web interface

Where the Desktop Client Fell Short

  • Single Platform (Windows) – Customers really want Mac access
  • Scalability Limits – Can become very slow in large environments
  • Inconsistent look and feel across VMware solutions
  • Workflow lock – Tivo-like functionality not present like in the web client
  • Upgrades – Client is huge, and requires constant upgrades for new releases

Enhanced  – vSphere Web Client

  • Primary client for administering vSphere 5.1 and later
  • All new 5.1 and later features are web client only
  • In vSphere 5.5 all desktop functionality is in the web client
  • Browser based (IE, FF, Chrome)
  • If you use Linux, check out Chromium which has built-in Flash support. Not officially supported by VMware, but give it whirl.

Multi-Tiered Architecture

  • Inventory service obtains optimized data live from the vCenter server
  • Web server and vCenter components
  • VI client: 100 sessions = 50% CPU
  • Web client: 200 connections = 25% CPU

vSphere Web Client – Availability

  • A single instance of vSphere client can be seen as a single point of failure
  • Make vSphere web client highly available
  • Run web client in a separate VM with HA enabled

Workflows

  • Shows how the web client shows relationships and not the legacy hierarchy view
  • No more scrolling through a long row of tabs
  • Right clicking on objects is now faster in vsphere 5.5 (unlike vSphere 5.1)
  • “Work in progress” state is a paused task in case you find you need to perform another action during a wizard
  • Search is drastically improved – saved searches
  • Tag – Can apply to any object and searchable
  • Tags are stored in the inventory service file system, NOT in the vCenter database
  • Objects can have multiple tags

Web Client Plug-Ins

  • vcOPS
  • vSphere Data Protection
  • Horizon
  • VUM to scan, create baseline, compliance, etc. – Cannot patch
  • No SRM plug-in support
  • HP, EMC, Dell, Cisco, VCE, etc. all have plug-ins
  • Log browser viewer is built-in – Rich user interface for search

VMworld 2013: General Session Day 2

Today is the second full day of VMworld 2013, and the second keynote of the week. To start off the 0900 keynote Carl Eschenbach took the stage. A few minutes into the presentation they bring out Kit Colbert, a VMware engineer.

Background

  • Business relies on IT
  • Focus on innovation
  • Increasing velocity in IT
  • Deliver IT-as-a-Service – Bringing to life at VMworld 2013

Three Imperatives

  • Must virtualize all of IT
  • IT management gives way to automation
  • Compatible hybrid cloud will be ubiquitous

Architectural Foundation – Software defined Datacenter (SDDC)

vCloud Automation Center

  • IT-as-a-Service
  • Service Catalog for multiple types of services
  • Hybrid cloud support
  • Breaks down costs for an app into OS licensing cost, labor, etc.
  • Shows the ability to configure autoscale for an application
  • Rolls up application health into the portal
  • Application owner can self-service and provision applications either on-prem or in the cloud

vCloud Application Director

  • Creates an execution plan that understand dependencies of VMs
  • Integrates with existing automation tools like Puppet
  • Provisions a multi-tier application
  • This is no a vApp – it’s a full application deployment solution
  • Takes care of infrastructure configuration
  • Decouples the application from the infrastructure configuration

Networking with NSX

  • L2 switching, L3 routing, firewall, load balancing is built-in
  • When provisioning an app, it deploys L2-L7 services along with it
  • Moving the switching intelligence to the hypervisor
  • Routes on exiting physical network without changes
  • Moves routing into the hypervisor – no more hair pinning for VMs talking to each other on different subnets
  • Router is no longer a choke point on the network
  • Up to 70% of traffic in a datacenter is between VMs
  • Moves firewall intelligence into the hypervisor – Can enforce security at the VM layer
  • Ability provision networking config in minutes
  • Showed off vMotioning a VM to a NSX switch with zero downtime

NSX Delivers

  • Speed and efficiency
  • Same operating model as compute virtualization
  • Extends value of existing network infrastructure

VMware VSAN

  • Allows you to attach a storage performance policy to a VM and it follows the VM across datastores
  • Enables you to dynamically extend VSAN datastore space without downtime
  • Ability to define a policy that requires 2 copies of VM data, for example
  • Auto re-builds any failed disks, seamlessly, and without the VM aware a failure occurred

IT Management

  • Introducing policy based automation
  • Shows off vCloud Director with auto-scaling out configured and automated
  • Proactive response
  • Intelligent analytics
  • Visibility into application heath for the app owner
  • vCOPS can pull in data from partners (HP, NetApp, EMC, etc.) and make intelligent recommendations for performance remediation

Big Data Analytics

  • VMware is shipping Log Insight for IT analytics
  • Log Insight can sift through millions and millions of data points

Hybrid Cloud

  • vSphere Web Client 5.5 has a button for the VMware Public Cloud
  • Seamless view into vCloud Hybrid Service (e.g. looking at VM templates)

VMworld: What’s new in vSphere 5.5 Storage

Twitter: #VSVC5005; Kyle Gleed, VMware; Cormac Hogan, VMware

This session was a bit of a bust. The first 20 minutes storage wasn’t even mention; it was a recap of vSphere 5.5 platform features. The next 20 minutes was a super high level storage feature overview, and the session ended 20 minutes early. It really didn’t say much more than keynote sessions. The session title was misleading and I would have skipped it if I had known the agenda. But for what it’s worth, here are my session notes.

Agenda

  • vSphere 5.5 Platform Features
  • vCenter 5.5 Server Features
  • vSphere 5.5 Storage Features

vSphere 5.5 Platform Features

  • Scalability – Doubled several config maximums, HW version 10
  • Hardware version 10: LSI SAS for Solaris 11, new SATA controller, AHCI support, support latest CPU architectures
  • vGPU Support: Expanded to support AMD (including NVIDIA). vMotion between GPU vendors
  • Hot-Pluggable SSD PCIe Devices – Supports orderly and surprise hot-plug operations
  • Reliable Memory – Runs ESXi kernel in the more reliable memory areas (as surfaced by the HW server vendor)
  • CPU C-States – Deep C-states in default balanced policy;

vCenter Server Features

  • Completely new SSO service
  • Supports one-way, and two-way trusts
  • Built-in HA (multi-master)
  • Continued support for local authentication (in all scenarios)
  • No database needed
  • Web client: Supports OS X (VM console, OVF templates, attach client devices)

vCenter Application HA

  • Protects apps running inside the VM
  • Automates recovery from host failure, guest OS crash, app failure
  • Supports: Tomcat 6/7; IIS 6.0-8.0; SQL 2005-2012, and others
  • HA is now aware of DRS affinity rules

Storage

  • 62TB VMDK maximum size
  • Large VMDKs do NOT support: Online/hot extension, VSAN, FT, VI client, MBR partitions disks
  • MSCS: Supports 2012, iSCSI, FC, FCoE, and round-robin multipathing

PDL AutoRemove

  • PDL (permanent device loss) – bases on SCSI sense codes
  • PDL autoRemove removes devices that PDL from the array
  • I/Os are now not sent to dead devices

VAAI UNMAP

  • New simpler VAAI/UNMAP command via ESXCLI
  • Still not automated (maybe in the future)

VMFS Heap Improvements

  • Issues when 30TB of open storage per ESXi host in the past
  • Can now address the full 64TB of a VMFS

vSphere Flash Read Cache

  • Read-only cache, write through
  • Pool resources, then carve up on a per-VM basis
  • Only one flash resource per vSphere host
  • New Filesystem called VFFS
  • Can also be used for host swap cache
  • On a per-VM basis you configure cache reservation and block size

VSAN

  • Policy driven per-VM SLA
  • vSphere & vCenter Integration
  • Scale-out storage
  • Built-in resiliency
  • SSD caching
  • converged compute & storage

VMworld 2013: vMotion over the WAN (Futures)

This was a pretty short (30 minute) session on possible futures of vMotion, which focused on vmotion between datacenters and the cloud. To be clear, these features are not in vSphere 5.5, and may never see the light of day. But maybe in vSphere 6.0 or beyond we will see them in some form or shape. Some of the advanced scenarios that could be enabled with these technologies are live disaster avoidance, active/active datacenters, and follow the sun workloads. Integration with SRM and NSX are particularly cool, and automate tasks such as pre-configuring the network parameters for an entire datacenter. Or how about vMotioning live workloads to the cloud?

vMotion Recent Past

  • 5.0: Multi-NIC vmotion; Stun during page send (for VMs that dirty pages at a high rate)
  • 5.1: vMotion without shared storage

vMotion Demo

In this demo VMware showed vMotioning a VM migrating from Palo Alto to Bangalore. The migration of the VM took about 3 minutes and featured a bunch of forward looking technology including:

  • Cross-vCenter migration
  • L3 routing of vMotion traffic
  • Cross vSwitch VM migration
  • VM history and task history are preserved and migrated to target vCenter

Futures Highlights:

  • vMotion across vCenters (LAN or WAN)
  • vMotion across vSwitches (standard or distributed)
  • Long Distance vMotion (think 5,000 miles or more and 200+ms latency)

Cross-vCenter Details

  • vMotion now allows you to pick a new vSwitch during the migration process. Supports vSS and vDS.
  • You can migrate vmS between vCenters, be they LAN or WAN connected
  • VM UUID maintained
  • DRS/HA affinity rules apply and maintained during/post migration
  • VM historical data preserved (Events, Alarms, Task history)
  • Must be in the same SSO domain

Long-Distance vMotion Mechanics

  • No WAN acceleration needed
  • VM is always running, either on the source or destination (no downtime)
  • Maintain standard vMotion guarantees
  • vMotion traffic can cross L3 boundaries
  • Can configure a default gateway specifically for vMotion
  • NFC network (network file copy) lets you configure which vmkernel port it flows over (usually flows over management network)
  • Requirements: L3 connection for vMotion network, L2 connection for VM network, 250Mbs bandwidth per vMotion, same IP at destination
  • Future integration with NSX, for pre-vMotion network configuration

SRM Integration

  • SRM could issue long distance vMotion command to live migrate workloads to DR site
  • Orchestrate live migrations of business critical VMs in Disatance Avoidance scenarios
  • Integrate with NSX to pre-configure and on-demand network configs at the destination site

vMotion w/Replicated Storage

  • vMotion is very difficult over array-based replicated LUNs
  • Leverage VVol (virtual volumes) technology in the future to provide VM-level replication and consistency granularity
  • You can now replicate VMs at the object level with VVOLs
  • VMware is looking at all forms for synchronous and asynchronous storage replication and will likely enable vMotion for such scenarios

Long Distance vMotion to the Hybrid Cloud

  • Support per-VM EVC mode to allow for flexible migration
  • 10ms, 100ms, 200ms vMotion times are the same, given the same bandwidth

VMworld 2013: Advanced VMware NSX Architecture

Twitter: #NET5716; Bruce Davie, VMware

This was by far the best session of the day. While my background is not in networking, even I got excited about what software defined networking can do for an enterprise. The session was also a fire hose of NSX advanced details, and the standby line was huge. Even if you aren’t a networking professional, this will have a big impact on how server and virtualization administrators consume network services. According to the speaker, SDN is the biggest change to networking in a generation. NSX is a shipping product, so this wasn’t some pie in the sky PowerPoint slidedeck about what may be possible in the future. Per the keynote this morning, EBay, GE, and Citi corp are using NSX on a massive scale in production.

Why we need network virtualization

  • Provisioning is slow
  • Placement is limited
  • Mobility is limited
  • Hardware dependent
  • Operationally intensive
  • A VM is chained to the network infrastructure

Network Virtualization Abstraction Layer

  • Programmatic provisioning
  • Place any workload anywhere
  • Move any workload anywhere
  • Decoupled from hardware
  • Operationally efficient
  • Ebay shrunk network change window from 7 days to a matter of minutes (any they were highly automated to begin with)

What is network virtualization?

  • Provides full L2, L3, L4-7 Network services
  • Requirement: IP transport
  • Starting point is the virtual switch
  • NSX works on vSphere, KVM, Xen Server
  • Controller cluster maintains state
  • NSX API is how you programmatically create the cloud management platform
  • The local vSwitches are programmed with forwarding rules to provide L2, firewall functionality, etc.
  • Packets on the wire are tunneled between hypervisors. You just need IP connectivity.
  • When you change the virtual networks, the underlying physical switches won’t know the difference.
  • NSX Gateway: Connects to physical hosts and across the WAN. It is an ISO image that can run in a VM or on baremetal
  • Big announcement: Hardware partner program

VMware NSX Controller Scale out

  • Controller Cluster
  • Logically centralized, but a distributed highly available scale-out x86 servers
  • All nodes are active
  • Start out with three nodes
  • Live software upgrades – Virtual networks stay up, packets keep flowing
  • Workload sliced among nodes
  • Each logical network has a primary and backup node
  • Biggest deployment has 5 nodes and supports 5K hypervisors and 100K ports
  • Fault tolerant

Tunnels

  • STT for hypervisor to hypervisor comms
  • VXLAN for third party networking devices (chip level support)

Visibility and Virtual Networks

  • You can monitor networks via the NSX API
  • You can show health, and a whole slew of functionality for the entire virtual network with a single point with software
  • Hyper visibility into the network state
  • All from a single controller API
  • Can synthetically insert traffic as if the VM sent it

Hardware VTEPs

  • Benefits: Fine-grained access and connect bare metal workloads with higher performance/throughput
  • Same operational model (provisioning, monitoring) as virtual networks
  • Consistent model (HUGE) regardless of workload or non-VM workloads
  • Partners: Arista, HP, Brocade, Dell, Juniper, Cumulus Networks

Connecting the Physical to the Virtual

  • Physical switch connects to NSX controller cluster API
  • Shares a VMAC and PHYMAC databases
  • No multicast requirement for underlay network
  • State sharing to avoid MAC learn flooding
  • Physical ports are treated just like virtual ports

Distributed Services

  • NSX architecture allows many services to be implemented in a fully distributed way
  • Examples include firewalls (statefull or stateless), logical routing load balancing
  • Scale: No central bottleneck, no hairpinning,
  • Ensure all packets get appropriate services applied (e.g. firewall)
  • Distributed L3 Forwarding
  • vSwitch does 2/3 of the work – L2 and L3
  • Controller cluster calculates all needed info and pushes the config to each hypervisor host virtual switches

Connecting across the WAN

  • Option A: Map logical networks to VLANs. Manual process (creating VRFs, etc.)
  • Future: Will have a much more automated solution- NSX gateway will label MPLS packets

What’s Next?

  • Snapshot, rollback, what if testing
  • Federation Multi-DC use cases
  • Physical/Virtual Integration
  • Advanced L4-L7 services
  • Use business rules to define complaint networks (e.g. HIPPA, PCI, etc.) and make them cookie cutter

VMworld 2013: What’s new in vSphere 5.5

Twitter:#VSVC4605

This session was a fire hose of the top vSphere 5.5 features. There’s a lot that’s new in this release, and they’ve addressed many of the vSphere 5.1 SSO headaches. So if you skipped vSphere 5.1 (like I did) for production environments, then get ready for the vSphere 5.5 train and jump on board. This is a release that you won’t want to miss. Also learn why vCloud Director will be going the way of the Windows c# vSphere client (hint, think dodo bird).

Cloud Management Offerings

  • vSphere with Operations Management – new SKU in March 2013 -vSOM Enterprise + is $4245 per socket
  • vCloud Suite per CPU: Enterprise Plus is $11,495
  • Operations management – A large customer found 90% of VMs were over provisioned

What’s new in vSphere 5.5

Applications

  • vSphere Big Data Extensions – Optimize Hadoop workloads and extend project Serengeti
  • Pivotal and VMware vSphere – Building PaaS on-Prem
  • Latest chip set support – Intel E5 V2, Intel Atom C2000
  • OpenStack – Delivering architecture choices

Performance and Scale

  • 2x in configuration maximums
  • Up to 62TB VMDKs
  • Low latency application configuration 31% latency improvement
  • 320 pCPUs, 4TB RAM, 16 NUMA nodes, 4096 vCPUs
  • 4GB ESXi minimum RAM (e.g. for labs)

vSphere App HA

  • Detect and recover from application or OS failure
  • Supports most common packages apps (Exchange, SQL, Oracle, SharePoint, etc.)
  • vCloud Extensibility – APIs and ecosystem
  • Deployed as two virtual appliances
  • Tier 1 application protection at scale

vSphere Flash Read Cache

  • Virtualized flash resource managed just like CPU and memory
  • Per-VM hypervisor based read caching using server flash
  • Compatible with vMotion, DRS and HA
  • Accelerates performance for mission critical apps by up to 2x
  • Enables efficient use of server flash in virtual environments
  • Fully transparent to VMs

vSphere Big Data Extensions

  • Elastic scaling
  • Easy to use interface
  • Enhanced HA/FT leveraging vSphere
  • Higher cluster utilization

vSphere Replication

  • Still 15 minute RPO
  • Multiple point in time copies
  • Multiple replication appliances per vCenter
  • Support storage vMotion and storage DRS

vSphere Data Protection

  • 4x greater scalability – Advanced SKU (more $$)
  • Agent-based application awareness of Exchange and SQL – Advanced SKU only (extra $$)
  • Direct recovery – can recover VMs without vCenter
  • Restore individual VMDKs
  • Can restore with a different VADP appliance
  • 6x faster recovery
  • 4x more storage efficient
  • Managed from vSphere web client

vCenter Server 5.5

  • SSO: Improved user experience. SSO no longer requires SQL database.
  • vCenter Appliance supports 500 vSphere hosts and 5000 VMs
  • vCenter Databases – Official support for database clustering – Oracle RAC, SQL cluster
  • Added support for OS X vSphere web client
  • VM console access, deploy OVF templates
  • Drag and drop

Best of the Rest

  • Hardware version 10
  • MSCS support enhancements
  • VMFS heap enhancements
  • Enhanced LACP support
  • Enhanced SR-IOV
  • QoS tagging
  • Packet capture
  • 40G support
  • Support “reliable memory”
  • Hot-plug SSD PCie Devices
  • Expanded vGPU and GP-GPU support

License SKUs

  • Enterprise: Adds big data extensions and reliable memory
  • Enterprise Plus: Flash read cache and App HA

vSphere 5.5 Support Lifecycle

  • Normal 5 year support would end 2016 (based on vSphere 5 starting in 2011)
  • Support will be extended to 2018
  • Only applies to ESXi and vCenter 5.5

Reduce Complexity

  • vCloud Director is GOING AWAY post vSphere 5.5. Functionality migrated to vCAC and the virtualization platform
  • vCloud Automation Center – vCAC
  • vCloud director will also have extended support period like vSphere 5.5