CIM225: Automated Infrastructure using VMware vCenter Operations

This session covered a product which is a result of an acquisition last year, and has since undergone some significant enhancements. vCenter Operations is a product that provides operations intelligence for your virtual infrastructure. Major points in this session:

The speaker stated that 90% of performance problems for virtualized applications get blamed on the virtualization team, even though typically that is not where the problem resides.
The infrastructure teams and operations teams can be at odds with each other, pointing fingers about the root cause of issues. Even with advanced and costly performance tools, often it is the user that first complains about a problem, not a tool.
Changes to applications and infrastructure changes are the number one cause of downtime. Layer 8 (humans), not hardware, is what causes most of your unplanned downtime.
vCenter ops addresses three major areas: Performance, capacity, configuration
It lets operators easily pinpoint root causes of issues, and provides detailed capacity planning (CapacityIQ is integrated).
Speaker stated that cross-Silo tools are typically lacking in the enterprise (network, storage, compute, applications), and thus correlations are hard to make which translates into it being very hard to find true root causes of problems.
Optimizing IT is very difficult, if not impossible, due to these silos and disparate tools.
VMware’s approach is to use analytics, covered by several patents, that distill information from many sources into actionable information that is visually displayed in very eye pleasing graphics.
The visualizations that were shown looked very professional, used heat maps, performance scores, and other business intelligence-inspired layouts that really let you know what is going on at a glance. It’s not just a simple up/down dashboard.
The product will have progressive integration with third party monitoring tools such as Microsoft Operations Manager, Tivilo, HP Openview, and many other tools.
It features self learning algorithms that don’t require you to manually configure thresholds, and will alert on non-normal conditions and provide technical details.
It will ship in various editions, with enterprise being the most full featured SKU and required to interoperate with third-party tools to manage heterogeneous systems like Windows, Linux and storage.
The many dashboards provide advanced BI features such as Health, risk, efficiency, and performance stats. For example, it can show ‘stressed’ VM and what is causing the stress. Heat maps let you quickly spot trends.
You can define KPIs (key performance indicators), which show “bound by” conditions, that are your root causes.
It also has guided remediation which can help you determine what the root cause is, and suggest ways to remediate the problem, then do the remediation.
It can coordinate configuration changes to the environment then roll back changes if performance issues occur as a result of the change.
Smart alerts have a root cause pane to help the operator troubleshoot
CapacityIQ has been integrated into the product
The tool shows resource waste and ways to remediate it
There is also a built-in orchestration/automation engine

Overall I was very impressed with the tool, the analytics engine, and the dashboards. In fact, I think they are more informative that say Systems Center Operations Manager, which more “BI” type intelligence with score cards, risk analysis, and root cause details. It will be interesting to see what type integration it has with SCOM and other third-party tools. This is one tool that organization should certainly take a good look at, as I think it’s pretty unique in its capabilities.