VMworld 2017: DR with VMware on AWS

Session: MMC2455BU, GS Khalsa

Legacy (physical) DR solutions are not adequate – Long RTOs, lots of surprises, unreliable
vSphere is an enabler for DR – consolidation, hardware independence, encapsulation (VM is a file)

Long distance DR solutions with async replication
-Active/passive
-Active/Active
-bi-directional failover
-Shared site recovery

Metro DR Solutions with sync replication
-Availabiity – Zero RPO/RTO
-Mobility – active/active datacenters
-Disaster avoidance

DR to the cloud with AWS
-Co-located DR costs are high
-DR to the cloud is less expensive

VMware Cloud on AWS
-Managed SDDC stack running on AWS
-Consistent operational model enables hybrid cloud
-Leverage cloud economics
-Goals of DR: Deliver as a service, build on VMware (SRM, vSphere replication, etc.)
-Working on flexible SRM pairing – Decouple on-site upgrade from VMC/AWS
-Loosening version dependencies across vCenter, SRM & vSphere Replication releases
-Working on major UI improvements – HTML5 and “clarity” UI standard
NEW: SRM Appliance based on photon OS

GS then shows a number of video demos showing the full SRM configuration, setup, and failover process. Anyone familiar with SRM will be accustomed to the same workflow, but with a nice new coat of paint on the GUI.

 

 

 

VMworld: PowerCLI What’s New

Session: SER2529BU Alan Renouf

PowerCLI Overview
-623 cmdlets and counting
-PowerCLI is 10 years old
-Name change – VMware PowerCLI
-Move-VM now includes cross vCenter vMotion
-Automate everything with VSAN
-Independent disk management cmdlets – new-vdisk, get-vdisk, copy-vdisk, move-vdisk
-VVOL replication cmdlets
-New Horizon View module
-SPBM cmdlets
-More inventory parameters
-DRS cluster groups and VM/host rule cmdlets

Install: install-module VMware.PowerCLI

Release Frequency
-Less features, but more often
-Less wait on bug fixes
-Focused on your input

PowerCLI 6.5.2
-New ‘inventoryLocation’ parameter – move-vm, import-vapp, new-vapp
-Mount a content library ISO with new-CDDrive
-Fixes and enhancements

Multiplatform Fling
-Photon OS, Mac OS, Linux, Docker

VMware Cloud on AWS?
-Works exactly the same as on-site vCenter

Endless Possibilities
-Content library – more cmdlets to come
-Parameter auto-complete
-vSphere REST API high-level cmdlets
-Powershell DSC (desired state config) – Chef, Puppet, Ansible, Saltstack
-New vSphere Client and Rest API support for Onyx (automated code generator)
-PowerCLI multiplatform 6.0

Community Projects
(FREE) OpBot – Connects vCenter to slack. Download: http://try.opvizor.com/opbot
(NEW!!) PowerCLI Feature request page: https://vmwa.re/powercli

 

VMworld 2017: Architecting Horizon 7 & Apps

Session: ADV1588BU

Note: This session had a multitude of complex architecture diagrams which I did not capture. See the session slide deck, after VMworld, for all the details.

Why? –Business objective/drivers
How? Meet requirements
What? Design and build
Deliver Build and integrate
Validate Met requirements?

Design Steps

  1. Business drivers & use case definition
  2. Services definition
  3. Architecture principles and concept
  4. Horizon 7 component design
  5. vSphere 6 design
  6. Physical environment design
  7. Services integration
  8. User experience design

Use a repeatable model when scaling up:

 

Physical Environment Considerations
-AD
-GPO
-DHCP
-Licensing

Identity Management

Profiles and User Data
-Folder redirection
-Mandatory profile
-User environment manager

AppVolumes
-AppStack replication
-Single site or multiple site
-Use writeable volumes very sparingly

VMware Horizon Apps

Speaker goes over a highly detailed reference architecture with lots of complex slides. And he goes over the LoginVSI setup, both hardware and software.

VMworld 2017: vSphere 6.5 Upgrade Customer Perspective

Session: SER2508BU

Note: The session slides have a lot more details, KB links, etc. so grab the slides if want more details.

High Level Plan
-Enablement
-Workshop
-Test Environment
-Design
-Migration

Enablement

-Product landing page and KBs
-Product documentation
-Whitepapers like what’s new?
-Check the readme for bugs/issues
-Check blogs (Emad Younis)
-Hands-on labs

Workshop

-Migration timeline
-Stakeholder involvement/support
-Scope for the deployment features
-Scope for the migration environment
-Ask: Greenfield or brownfield?

Test Environment (Lab)

-Learn new features
-Test/validate features
-Determine deployment considerations
-Document your design
-Physical, nested, or home lab options
-Test plan –  PSC HA, vCenter HA, VM encryption, etc.
-Determine features to implement, feature configuration, runbook

Design

-Topology – PSC – embedded or external?
-Hardware – EVC mode, VMFS version, networking
-Document features – Predictive DRS, etc.
-Migration plan – The what, who and when (maintenance windows, etc.)
-Output: Design docs, run books, migration plan

Migration

-Use GSS – Basic, production, business critical, mission critical
-Consider VMware Professional services
-Output: Complete environment, updated design doc, updated run books, stakeholder sign-off

 

VMworld 2017: vSphere 6.5Host Resources Deep Dive Pt. 2

Session: SER1872BU Frank Denneman, Niels Hagoort

Note: This was a highly technical session with lots of diagrams. Best bet is to get Frank and Niel’s book for all the details.

Compute Architecture: Shows a picture of a two NUMA node server. Prior to Skylake processors, two DIMMSs per memory channel are optimal. Skylake processors increased the number of memory channels and have a maximum of 2 DIMMS per channel.

QPI Memory performance: 75ns local latency, but 132ns latency to other NUMA node

Quad channel local memory access: 76GB/s. Remote access will be noticeably slower.

vNUMA exposes the physical NUMA architecture to a VM. vNUMA ‘kicks in’ when a VM has more than 8 vCPUs and if the core count exceeds the physical CPU package. ESXi will then evenly split the vCPUs across the two physical CPU packages.

If you use virtual socks, mimic the physical CPU package layout as much as possible. This allows the OS to optimally manage memory and the cache.

“PreferHT” can be useful, see KB 2003582. This forces the NUMA scheduler to count hyperthreads as cores. Use this setting when a VM is more memory intensive vs. CPU intensive.

What if the vCPUs can fit in a socket, but VM memory cannot? numa.consolidate=FALSE can be useful.

One AHCI storage IO needs 27K CPU cycles. If you want a VM do do 1M IOPS, you need 27GHz of CPU power.

NVMe 1 I/O needs 9.1K CPU cycles, which is vastly less than AHCI storage.

With 3D crosspoint, it can max I/O performance at a very low queue depth. This makes it quite useful as a caching tier in vSAN.

CPU Utilization vs. Latency

Workload latency sensitive? No, then tune CPU for power savings. Yes, then tune for lowest latency. SAP HANA, for example, could benefit from low latency.

Interrupt coalescing, is enabled by default on all modern NICs. This can increase packet latency. You can increase ring buffers by using KB2039495, which can help with dropped packets.

Polling vs. interrupts

Pollmode driver (DPDK) can optimize network I/O performance.

Low CPU utilization = higher latency
Higher CPU utilization = lower latency

vSphere 6.5 as vRDMA, which can significantly boost network throughput.

VMworld 2017: Virtualizing AD

Session: VIRT1374BU: Matt Liebowitz

AD Replication
-Update sequence number (USN) tracks updates and are globally unique
-InvocationID – Identifies DC’s instance in the AD database
-USN + InvocationID = Replicable transaction

Why Virtualize AD?
-Fully supported by Microsoft
-AD is friendly towards virtualization (low I/O, low resource)
-Physical DCs waste resources

Common objections to virtualizing DCs
-Fear of stolen vmdk
-Privilege escalation – VC admins do not need to be domain admins and vice versa
-Must keep xx role physical – no technical or support reason. Myth
-Timekeeping is hard in VMs

Time Sync
-VM guest will get time re-set with vMotion and resuming from suspend. If there’s a ESXi host with bad time/date, it can cause weird “random” problems when DRS moves DCs around.
-There’s a set of ~8 advanced VMX settings to totally disable time sync from guest to ESXi host. Recommended for AD servers. See screenshot below.

Virtual machine security and Encryption
-vSphere supports VMDK encryption
-Virtualization based security – WS2016 feature – supported in future vSphere version

Best Practices

Domain Controller Sizing
USN Rollback
Happens when a DC is sent back into time (e.g. snapshot rollback)
-DCs can get orphaned if this happens since replication is broken
-If this happens, it’s a support call to MS and a very long, long process to fix it

VM Generation ID
-A way for the hypervisor to expose a 128-bit generation ID to the VM guest
-Need vSphere 5.0 U2 or later
-Active Directory tracks this number and prevents USN rollback
-Can be used for safety and VM cloning

Domain Controller Cloning
-Microsoft has an established process to do this, using hypervisor snapshots.
-Do NOT hot clone your DCs! Totally unsupported and will cause a huge mess.

VMworld 2017: Extreme Performance

Session: SER2724BU

Performance Best Practice Guide for vSphere 6.5 guide is now out. Download now!

Baseline best practices
-Use the most current release
-HW selection makes a difference
-Refer to best practice guides
-Evaluate power management
-Rightsize your workloads
-Keep hyperthreading enabled
-Use DRS to manage contention
-Do NOT use resource pools – more harm than good
-Monitor oversubscription
-Use paravirtualized drivers

Monitoring
-Compute: Contention – CPU ready, co-stop
-Memory: Oversubscription – balloon, swap
-Storage: Service time – device and kernel latency

vNUMA
-Poor NUMA locality (N%L)
-pNUMA does not match vNUMA
-VM config should match physical topology (don’t make wide VMs)
-Don’t create a VM with a larger vCore count than pCores

Keep things up to date
-Virtual hardware can make a performance difference
-38 changes were made in vHW 11 alone
-Use latest vHW

Power Management
-New in 6.5 is %A/MPERF in ESXtop to see power management. Over 100% means turbomode.
-“Balanced” mode allows turbomode
-Always set BIOS to “os controlled”
-High performance caps turbo opportunity – good for large VMs – required for latency sensitive workloads
-“high performance mode” should be used for benchmarking since it results in the most stable results

Hyper-threading
-25% more performance, approximately
-Latest processes may be higher performance

VMworld 2017: vSphere SSO Architecture

Session: SER2940BU. Speakers: Emad Younis, Adam Eckerle

Embedded PSC: Totally supported for production usage. It’s not just test/dev. Use this model if you don’t need enhanced linked mode. This is a simple model, and use it if it supports your needs.

External PSC: Allows linking of vCenters via linked mode. Tags, roles, global permissions, licensing all replicate throughout the entire SSO domain. Up to 15 vCenters can point to a single PSC in 6.5 U1. Not recommended, but you can do it.

In vSphere 5.5 you can consolidate SSO domains. So consolidate BEFORE you deploy any 6.x versions. After you deploy any 6.x component, you are locked into your SSO domains. If doing this merge, make sure you un-install/remove the embedded SSO component before you upgrade to vSphere 6.x.

Within an SSO domain, you can’t mix versions of products. So if you have islands of vCenters, you may NOT want them linked together. This will require that you upgrade everything together. Very applicable to vBlock environments and their islands of vCenters.

A site is a logical grouping of PSCs. PSCs are multi-master and replicate every 30 seconds.

Recommendation: If you have multiple PSCs spread across multiple sites, you can optionally use “vdcrepadmin” to add more replication agreements. Do NOT add just for the sake of adding. Only add agreements if absolutely needed.

In vSphere 6.5 you can only repoint a vCenter intrasite to another PSC (not across sites). Refer to “cmsso-util”. This is not allowed due to the added latency and causing performance issues.

VMware recommends a max of 100ms between PSCs in the “same” logical site. VMware will support all PSCs in the same site, but it’s not recommended. VMware does not want vCenters talking to remote PSCs.

There’s no current method to migrate from a Windows vCenter with an external PSC o the VCSA with an embedded PSC. VMware said in the future this scenario may be possible.

You can NOT move a vCenter from one SSO domain to another (today).

Built-in SSO load balancing is possibly in a future vSphere release. No third party LB needed, such as F5 or NetScaler.

If you globally want to deploy multiple vCenters, don’t do a global SSO domain. It can be a disaster. Setup regional SSO domains for best performance.

VMworld 2017: Predictive DRS Best Practices

Session: SER2849BU

Case 1: VMs performance can suffer due to resource constraints/surges

Case 2: Inefficient usage of resources due to reserving capacity for peak loads.

Reactive
-Move VMs after contention occurs

Proactive
-Statically reserve more resource
-Learn workload pattern, and move before VMs spike

What is the best solution? Predictive DRS

What is Predictive DRS?
-DRS enabled with predictions
-DRS scheduling + vROPs analytics

How does it work?
-Resource usage from vCenter
-vROPs consumes the data
-Predictions are made
-DRS invoked to perform optimizations

vROps Dynamic Thresholds (DT)
-Sophisticated analytics – 10 algorithms
-Learns normal behavior
-Detects hourly, daily, monthly patterns
-Generates upper and lower dynamic thresholds
-Predictions are then sent to vCenter

Software Requirements
-vSphere 6.5 Enterprise Plus
-vROps 6.4 or 6.5
-Time sync between vCenter and vROps needs to be less than 5 minutes

Speaker shows a demo of  a ‘follow the sun’ scenario with workloads spiking at different times on a regular pattern. pDRS learned the pattern, and vMotioned VMs to make sure VMs had enough resources. He shows a performance graph, where pDRS headed off performance issues and it resulted in consistent VM performance.

DPM with Predictions
Speaker asks audience to raise hands if anyone is using DPM. Two people raise their hands.
-Predictions can proactively power up ESXi hosts to absorb the workload demand

FAQ
-Workloads it can predict: Periodic usage pattern
-Short spikes of a few minutes will not be predicted
-The more consistent the workload, the more accurate it will be

Learning Period
-Set to 14 days by default
-The longer the period, the better the accuracy
-Predictions only happen after 14 days

Tuning
-Compute dynamic thresholds – Calculated once a day, or push a button to force a new calculation.
-Lookahead interval – Amount of time DRS looks ahead while accounting for predictions – default is 1 hour

Identify vMotions due to Preditions
-Not a clear answer as there can be a mix of VMs with predictions and those without
-pDRS moves are only in logs

 

VMworld 2017: Day 2 Keynote

Pat Gelsinger walks on stage and welcomes Michael Dell to the stage. Pat and Michael are doing a prepared Q&A.

First question is regarding lackluster support, such as quality of people and hold times. Pat says he is disappointed in hearing such feedback, as he thinks they have good NPS scores. But Pat said they are very focused, and will have some internal followup. VMware is also introducing Skyline, and proactive support. They want to be your best technology partner.

Second question is about AI, future topics, machine learning. Michael says he are in the most exciting times in human history. Cost of making something intelligent is almost zero. It’s game on! An enormous amount of data created from IoT is amazing. They overlay interesting computer science on top of this data, and the possibilities are endless. Dell thinks a lot about data and AI. Dell is seeing lots of new use cases. If you are not thinking about how you will be using this new data, you are doing it wrong. Pat speaks up and says in 1984 he was architecting the 486 and how they could use it for AI back then. But real AI wasn’t possible until the scale of compute and storage today. Michael says there is a coming boom in edge data, and new requirements in how you deal with that data.

Next question is in the area of SMB, and “don’t forget about us.” Michael says most new jobs are created in small and medium size business. Dell has added 10s of thousands of new small and medium customers. Dell is reimagining their products to support the SMB. Pat states they have 500,000 customers and most of them are SMBs. Pat commits to make sure SMB is kept in mind.

Next question is about ecosystem, HCI, and breadth of partnerships. Michael says he is committed to a thriving ecosystem, and is key part of VMware’s success. He says the partner ecosystem is as strong as ever. Amazing things going on with NSX. VMworld has 400+ partners here. Pat says he’s not so excited about some of the partnerships.

Synergy. Creating value together. How is Dell and VMware innovating together? Michael says cross-selling and deep level of deep technical integration with their stacks, while retaining an open ecosystem. Pat also states strong synergy, like VXrail. Michael says customers have lots of innovation around containers and new applications.

CEO of Pivotal (Rob) comes on stage.

Pivotal says they are now working with world class companies, and in use at 50% of the fortune 500. Michael says all customers are facing similar challenges, such as finding new value in their apps. Rob says customers has thousands of legacy applications. Pivotal has been working with containers for a very long time. Pivotal container (with a K) service (PKS) is a new offering! It uses Google cloud engine, NSX, and comes out of the box with full integration and kubernetes. Sam from Google cloud comes on stage. Sam says containers are coming at warp speed. Google has taken 10 years of container orchestration and know how to run billions of containers. They’ve poured that into Kubernetes, and GKE (Google Kubernetes Engine). Sam says customers want to run compute wherever they want. PKS built on Cloud Foundry. Michael says customers wanted partnership and integration.

VMware CTO Ray O’Farrell comes on stage. Ray says many of the new services are delivered as SaaS. They are also aiming at developers to help make your company unique. Ray will now do demos of a variety of VMware products. Ray plays a video about a fictitious company to help illustrate problems that companies face today. They then run through a few VMware products and how they solve the customer problems. They give a demo of AppDefense and various other products.