Nutanix NPX Architecture Guide How-To (Part 2)

This post is part 2 of 2, of my NPX Architecture Guide how-to series. In this post we will cover sections 9 through 14, from the outline below. You can check out the first part of this series here. At the end I will also give you some more tips on various standard tables that I used throughout the document. 

The major sections in the architecture guide are:

  1. Overview
  2. Current State and Operational Assessment
  3. Design Overview
  4. Nutanix Capacity and Sizing
  5. Nutanix Cluster Design
  6. Host Design
  7. Network Design
  8. Storage Design
  9. Security and Compliance
  10. Management Components
  11. Virtual Machine Design
  12. Data Protection and Recoverability
  13. Datacenter Infrastructure
  14. Third-Party Integration

9.0 Security and Compliance

NPX architectuire guide security and compliance

Security is often a weak point for many architects, so be sure to not skimp on this section. If it's light on details, be prepared for defense questions. Topics to cover here include, but are not limited to: RBAC design (Prism/hypervisor/applications); SSL certificates (Prism/NGT/Hypervisor); system hardening (STIG, PCI/DSS, etc.); network security (microsegmentation, VLANs, ACLs, etc.); patching/compliance reporting; use of SSH and hardening (e.g. SSH keys); syslog configuration (TCP/UDP); PulseHD; use of two factor authentication; Nutanix password complexity settings. 

And depending on your technical and business requirements, there very well could be additional security areas you need to cover. Have you read the Nutanix security guide? Make sure every "i" is dotted and every "t" crossed. 

10.0 Management Components

NPX architecture management components

The control plane for your solution is very important. Don't make management overly complex, as one of the beauties of a Nutanix solution is simplicity. Things to consider here include: Prism configuration; Nutanix patches and AOS upgrades; how to monitor Nutanix, what about OS patches?; Hypervisor patching; what tools are you using to monitor the network?; What about monitoring the hypervisor?; You of course have VMs, so what tools are monitoring them?  

In the management arena don't forget about advanced automation tools such as Puppet, Chef, PowerShell, and Nutanix Calm. What are you using for Syslog? Splunk? Are you using Prism Central or Prism Pro? If so, why, or if not, why not? 

11.0 Virtual Machine Design

NPX Architecture Virtual Machine Design

As called out in the NPX blueprint, you must include your virtual machine design. What is that? Well it should cover topics such as VM templates, VM virtual hardware, are you making use of SCSI unmap or not? And what are the implications of using or not using unmap? What's the difference between Linux/Windows unmap? How about VM affinity or anti-affinity rules? What is the lifecycle of your VMs (cradle to grave)? Do you have monster VMs? What is your NUMA boundary and do any VMs cross it? What are the implications of NUMA? 

12.0 Data Protection and Recoverability

NPX architecture data protection and recoverability

What good is your solution if it's not protecting your business critical data, and ensure that you can recover from a disaster? Here think about covering: Backup software (Netbackup, Veeam, HYCU, etc.); Nutanix data protection (Protection domains, replication, snapshot frequency, sync/async replication, etc.); network configuration protection (e.g. nightly switch config backups); storage protection; hypervisor control plane backups; how to protect critical infrastructure services like AD/DNS; what is your VM backup frequency?; What are your operational procedures for areas such as change control, patch management, and business continuity? 

13.0 Datacenter Infrastructure

NPX architecture datacenter infrastructure

To many people the datacenter infrastructure can be a scary topic, and lightly covered in the architecture guide. But it's critical. What if you miscalulate (or don't calculate at all) the heat load for your proposed solution and it melts down in the rack or causes a fire? What does your rack elevation look like? Did you allow space in the rack for logically placing new nodes? What is the datacenter rating for the maximum heat load of an individual rack? What types of PDUs are you using and how many? Did you adjust your estimated amps usage for the powerfactor? Are you running your PDUs at more than 80%, sustained during a failure situation? What is your datacenter facility rated for in terms of downtime a year? And is that downtime planned or unplanned? How many more nodes can you add to the rack before you exceed the rated limits (cooling, power, or weight)? Is your solution going to fall through the floor because you didn't validate your assumption about maximum load rating (did you even ask?)? 

14.0 Third-Party Integration

NPX architecture third-party integration

Another scoring area for the NPX is your ability to cover third-party integrations. What does that mean? For the NPX, that's any non-Nutanix product which you include in your solution. I recommend a separate section, even if you have touched on these solutions throughout your guide. Why? Makes finding it easier, and your panelists will like that. The areas you will cover here a highly solution dependent, so you may have fewer, more, and likely different products to cover than I did. For my solution it used Splunk, NetBackup, VMware vSphere, and also VDI. 

Sample Design Decision Table

NPX architecture design decision table

Throughout your architecture guide you absolutely must thoroughly document your major design decisions. How many design decisions you will have totally depends on your solution, and how thorough you want to be. In my case I had 60 design decisions, and each one was captured using the template above. The I placed full design decision table in the appropriate major section of the guide (e.g. networking, security, etc.). At the front of my architecture guide I had another table, consisting of one line per design decision, for easy reference. 

Now this design decision table is not "perfect" and in fact I would argue needs supplementation. I could, and should, have done it better. But first let's start with what's in the able, and what I left out which I think you should have in it. First, you need to label and capture a one sentence description of the decision. For example, are you using the VMware standard switch or distributed switch? Next, every single decision has an impact. What is that impact? Describe it. Nearly every decision has a risk....what is it? And every risk needs a mitigation, so what is that? 

Now, what do I think you should include that I didn't? "Alternative design decisions". Why do I think you need alternative design decisions for nearly EVERY decision? Because you will likely be asked about it during your defense. For example, let's say Design Decision 40 was to use LACP. Ok that's fine and dandy, but what are the alternative(s) to using LACP and why didn't you use them? Or, what if you chose the NX-3060-G6 node for your baseline node type. What would be an alternative node type that could also work? These are EXACTLY the types of questions you need to be prepared for during your live defense. But thinking about them before AND documenting them in your guide, you are that much closer to successfully defending your NPX design. 

So yes, IMHO, I think every single design decision you have should be documented with: Impact, risks, risk mitigation and alternatives. 

Sample Assumptions Table

NPX architecture assumptions

You might be thinking, what's so special about an assumption table? Don't you just capture all of your assumptions and call it a day? NOPE! Epic fail! For each assumption you need to validate, if at all possible, your assumption. Document how you will validate it. 

Sample Summary Table

NPX architecture design summary table

Now this table I think is optional, but I included it for both my VCDX and NPX designs. At the end of each major tech section (e.g. 4.0 - 14.0) I have a section called "Summary and Design Decisions". The summary is the table above, which captures and quickly displays all of the referenced requirements, constraints, assumptions, risks, and design qualities covered in that major section. I think of the table as a 'double checking' that I've covered all of the requirements, constraints, assumptions and risks applicable to that major section. Is this table required? Nope. Do I like it? Yup. Should you use it? Totally up to you.

Additional Architecture Guide Tips

One of the hallmarks for a "X" level (VCDX/NPX) level architecture guide is traceability. What is that? It means for every labeled item (requirements, constraints, assumptions, risks, design decisions) it needs to be called out at least ONCE in the main body of your document. NPX examiners DO use the search functionality quite often to see, for example, if risk RS05 is actually addressed in your document. As you write your guide, and as a final QA, take an afternoon and search for every single labeled item and MAKE DARN SURE it's referenced in the body of your guide. 

Another tip that I find exceptionally helpful for starting a new architecture guide is this: First construct the outline of all the areas in the NPX (or VCDX) blueprint as major sections (e.g. network design, storage design, security and compliance, etc.). The under each major heading, just like I have, construct your sub-headings of conceptual, logical and physical. Then at the end of each major section have a design justification summary, and then your summary table and design decision tables. After you do all of this 'pre' work, you will have a nicely outlined guide that you can now start filling in the details. Easy, right? 

Applicable to VCDX?

So you may be thinking, well thanks for all of the tips for a successful NPX architecture guide, but does this apply to the VMware VCDX certification or other enterprise architect certs? And the answer is ABSOLUTELY! In fact, I used the *exact* same format for my VCDX certification and it was accepted and successfully defended on my first attempt. If it's good enough for NPX/VCDX, it is good enough for customer facing docs? The answer here is also a resounding yes. 

The tips I've provided here are for an enterprise level architecture guide, and "X" level certifications like VCDX and NPX are very similar in the skillset they attempt to asses. So can you take your NPX architecture guide, if based on vSphere, and submit it for VCDX? With only minor modifications to ensure you cover VCDX blueprint areas, the answer is yes! I did the reverse....started out with a VCDX-level design, added Nutanix blueprint areas, and submitted it for the NPX. 


As you can see through these two long blog posts, a "X" (expert) level architecture guide can be a monster. It covers a lot of areas, needs full traceability, must be concise and easy to read, organized so that the examiners can easily score it, and also be technically accurate. It also needs enough depth to be considered "X" (expert) level. Although I will emphasize that there's no minimum page count, I would find it very hard to believe that something as short as, for example, 30 pages for an enterprise architecture guide would pass muster. 

If you follow my advice in these two posts, it should get you well on your way to having a well organized, detailed, and easy to read/score architecture guide for your NPX (or VCDX) defense. 


Sample VCDX-DCV Architecture Outline

This post covers my approach to writing my VCDX-DCV Architecture Guide. I’ve been debating in my mind for a while whether I should write this post or not. I hesitated for a few reasons. First, I’m just a regular guy that happened to jump through the VCDX hoops and have no “insider” information on how they score. Those that do know the scoring rubric can’t disclose it anyway. Second, there are 1000 different ways to write your VCDX-DCV architecture document. Third, there’s no “magic template” or “sure fire” outline that ensures your design gets accepted. Do not view this post as shortcut or cheat sheet.

What matters is your content, how it aligns to the VCDX blueprint, and that you convey expert level knowledge to the reader. It’s NOT about speeds and feeds, but rather the full traceability of customer requirements, constraints,  assumptions and risks throughout your design. Who cares if you’ve thrown every VMware product and feature at a solution if you haven’t met the business requirements? #Fail

So why did I publish this article? I know when I started the VCDX process it was a bit daunting to read the DCV blueprint and try to come up with an architecture guide that hit all the areas in a logical manner. I’ve heard from other candidates they experienced the same “VCDX writer’s block.” In fact several of us have scrapped our first attempts, and started over. Bottom line is you need to do what feels right to YOU, and what works for YOUR design while covering all the blueprint areas. You may not like my methodology or outline, which is perfectly fine and a valid way to feel.

I’ve also heard comments from VMware customers (like myself when I went through the process) that think since they aren’t a partner and don’t have access to the VMware SET templates that they are at a disadvantage. That’s not true,  IMHO. Yes the VMware SET docs are structured and may help you, but they aren’t directly aligned to the VCDX blueprint and need augmentation.

With all those caveats, I wanted to share my DCV architecture guide outline. Maybe it will help someone with writer’s block, or enable you to see some the areas that a VCDX design could cover. Your design may need additional areas, or less coverage. This is certainly not all inclusive, and it’s guaranteed your outline will be different. It is your responsibility to ensure your documents cover all blueprint areas, makes sense for your design, and something you feel comfortable with. Own your documentation.

Before I go any further, let me state that how I chose to incorporate the specific VCDX bootcamp book recommendations is somewhat unique to my style. Of the submissions I’ve seen none did it exactly this way, which proves that there is no “magic” template or style for VCDX submissions. I just felt it gave a better overall flow to the document.

You will see some common sub-sections in all design areas (e.g. cluster, storage, compute, etc.). For example, in most areas I had specific conceptual, logical and physical sections. This helped me show the traceability of customer input through the entire design process. Each major section also concludes with a Design Justification which is a summary of how I met the customer requirements and sites all of the applicable requirements, assumptions, constraints, and risks.

At the end of the Design Justification section I had two tables to help distill down the critical information. First, I had a summary table, shown below. All of the design quality items (e.g. C02) were referenced elsewhere in that section as applicable. Possibly overkill, but I liked the compact summary.


The second table was that of the applicable design decisions, each with the decision, impact, decision risks (after all, nearly every decision has a risk), and risk mitigation. A sample design decision is below.


WordPress was not cooperating with me for a clean outline format, so I’ve inserted a series of screen captures to maintain formatting.

Sample VCDX-DCV Architecture Outline


2014-10-06_16-50-08  2014-10-06_16-52-34