My Journey to Nutanix Platform Expert (NPX) #014

Almost four years ago to the month my career took a major turn. I just successfully passed the VMware VCDX datacenter virtualization certification, to become #125 in the world. You can read about my VCP5 to VCDX journey in 180 days here. And the following week I joined this somewhat little known and scrappy startup, called Nutanix.

Back then HCI (Hyperconvered infrastructure) was a new fangled technology that many, many were quite skeptical of. Was it enterprise ready? Was it good for anything more than VDI? Could it run SQL, Oracle, Exchange? Can it compete toe-to-toe with vBlock? Would Nutanix go belly up or be acquired? Betting my career on HCI was risky in 2014, but it's paid off in more ways than I could ever imagine.

Having the VCDX certification under my belt really prepared me well for dual role at Nutanix as both a Solutions Architect in engineering and a consulting Architect in our sales organization. As a solutions architect I wrote a number of published customer facing Nutanix Best Practice Guides, such as SQL, Veeam, Lync, Microsoft DFS, and others. And as a field Consulting Architect I worked with dozens of customers over the years in projects of all sizes and shapes. Both roles helped refine my enterprise IT architecture skills, and hands-on with our own products including AHV (Nutanix's hypervisor).

NPX (Nutanix Platform Expert)

Just about a year into my career at Nutanix in March 2015, Nutanix announced the NPX certification. You can read my blog post about it here. I was honored to be part of the team that helped develop NPX and came up with the criteria for what it means to be NPX. The bar that was set was even higher than other defense based certifications, such as VCDX. Why? You have to know two hypervisors at the "X" level as well as demonstrating enterprise grade IT architect skills. Our MQC (minimally qualified candidate) bar is high, and the first time pass rate is far from 100%.

Now you may wonder why it's now 2018, three years after NPX 'went live' and I am just now defending. Well to be frank, having dual roles in Nutanix for over 3 years left little to zero time in my life to do blogging or spend the hundreds and hundreds of hours it takes to prepare for the NPX. For my VCDX I estimated I spent over 1,000 hours of preparation and 250+ pages of documentation. So I knew NPX would be even harder.

I am a competitive person, and I also like proving to myself that I can be on a similar footing as my colleagues which I have immense respect for. They were getting their NPX's and they kept badgering me to get mine. Plus, I want be the best customer facing consultant that I can be, and I knew doing NPX would take my VCDX skills to an even higher level. I also very recently shifted roles a bit within Nutanix to focus on our largest global accounts. The job description for that role requires NPX-level skills. Immense pressure was building on me to successfully defend NPX.

The NPX Design

In early 2017 I decided to start putting time into my NPX preparation. NPX requires a real-world design that you've done, so I thought there's no better choice than taking my UCS/HP 3PAR VCDX design and migrate it a Nutanix based solution. So I dredged up all my VCDX documentation from 3 years ago, and read it over. I was shocked to remember how complex 3-tier solutions are, and in particular the SAN/RAID/LUN configuration.

Going through my VCDX design I was ripping out page after page of complexity. LUNs? Gone. SAN? Gone. Fibre Channel switches? Gone. Boot-from-SAN? Gone. Cisco service profiles? Gone. You get the idea. And the best part about it? The actual environment that my VCDX was based on, I was actively involved in the account to migrate them to almost entirely Nutanix. So my NPX had a dual purpose of both defending, and transforming a real Nutanix customer from 3-tier to Nutanix simplicity. Win-Win!

NPX Preparation

For anyone starting down the NPX path, the freely available NPX Blueprint is your Bible. It has all the topics you need to cover to properly submit and successfully defend for NPX. To get a copy, email npx@nutanix.com. It is absolutely critical that you follow the Blueprint to the letter and cover everything, including all of the required documents. Although all of the documents are important, to me the Architecture Guide is where you will spend the majority of your time. My VCDX Architecture guide was 185 pages, and my NPX version was 134 pages. That's nearly 50 pages less, nearly all due to removing complexity, while covering more topics for the same ​environment.

After you get all of your documentation in order, next comes submission time. The NPX application is quite detailed and requires things such a resume, 3 professional references, a web-scale essay, plus all of the documents you've spent probably 6-9 months working on. After submission your documentation is scored, and if it scores high enough, you are then invited to an in-person defense. Submission time is roughly 3 weeks prior to the published defense dates.

If accepted, now is time to start working on your PowerPoint slide deck for your defense. You will use this slide deck to walk the panelists through your 90 minute defense, where you will be asked questions about your design, alternatives, and why you did what you did.

Pro Tip: Take all the blueprint topics and create one slide for each topic. Fill each slide with what you think are the top items to cover. Even if you don't verbally cover all bullets on the slide, have the content there so the panelists can ask questions.  I had approximately 23 content slides in my presentation, plus a number of indexed backup slides. 

I've included my TOC for my NPX deck below. This is not a magical slide...it's all directly from the blueprint. This is just one way to do it...do what feels right to you. 

​Now that your slide deck is ready, you need to mock mock mock! Don't use a potted plant to talk to...use social media and your contacts to find other NPXs or people working on their NPX. Do a webex, Zoom, etc. Practice practice! Heck, if your design is based on vSphere, hit up VCDXs.

But don't forget to mock the troubleshooting and design scenarios. Those two areas are also key for scoring, and just don't wing it during your defense. Aim for multiple mocks for each of the three areas: defense, troubleshooting, design scenario.

My personal goal was to get through the slide presentation, uninterrupted, in 30 minutes. That leaves 60 minutes for panelists to ask questions. YMMV, but I'd advise not going much longer or you jeopardize your scoring chances.

​Dooms (I mean Defense) Day

​By now you should be comfortable with your design, mocked each of the three major sections of the defense, and probably didn't sleep too well the night before. But be rested! Also if you are traveling across time zones, try to arrive a couple of days early to help adjust. You don't want to be a jetlag zombie during your defense.

When you step into the defense room, for those that have done your VCDX, everything will look familiar. Three panelists, moderator, whiteboard, and a projector. The moderator will give you the rules of the road, then you start your presentation. Panelists can interrupt at any time during your presentation to ask questions. Questions are not bad! In fact, they are asked to help improve your score and make sure you know your design. After the 90 minutes you get a 15 minute break. 

Next up is the 30 minute troubleshooting scenario. You will be shown a few slides, then the timer will start. The panelists are looking for a methodical approach to solving the problem, not a scattershot process of asking random questions or throwing out guesses to the root cause. The goal is not to solve the problem, but show how you would solve it. Curve balls can be thrown if you get close to the 'real' answer. At the end of 30 minutes you get a 5 minute break.

Finally, is the 60 minute design scenario. Just like the VCDX, you are shown slides for a particular fictitious customer. The panelists then act as the customer, and you ask them questions about requirements, constraints, assumptions, and risks. You then start down the design path answering questions as you go. And before you know it, the 60 minutes are up!

Now that you are totally mentally drained, now is the waiting game. Thankfully, you won't have to wait long. My results came in about 90 minutes after I was done. I was on the London underground, which has quite spotty cell service. I got the results via Slack and email, but then cell coverage dropped for a few tube stops. So I couldn't tell anyone I had passed! LOL I did shed a couple tears of joy and a couple of passengers were looking at me oddly. 

​Final Thoughts

​Is the whole process worth it? Yes! Even if you don't successfully defend, just the entire learning process makes you a better enterprise architect. Passing is just icing on the cake. Just like VCDX, the first attempt pass rate is fairly low, so don't be discouraged if you don't do it the first time around. Think of it as a chance to make yourself even better and really kick butt the next time! ​

I want to give a huge shout out to my NPX partner in crime, Bruno Sousa. We collaborated on the entire design, and split up the documentation work. His insight and knowledge was impeccable.

As a side note, pair/group submissions are allowed, but each contributor will defend individually.

I also want to thank the numerous people that supported mock sessions, document reviews, and pushing to keep my head down and being a success to become NPX #014.

Resetting lost ESXi root password with Nutanix

The other day I was at a customer for a fresh installation of Nutanix using vSphere 6.5. And for whatever reason, when they were resetting the ESXi root password to their default, it was fat fingered. When they went to add the hosts to vCenter, they couldn’t add them since the password was wrong. So what to do? If this was a non-Nutanix environment, the only supported ESXi method of resetting a lost password is re-imaging the server. But, Nutanix has a CVM running on each node that is configured with SSH keys to access the ESXi host. We can use a private IP address and the embedded SSH keys to successfully reset.

The full process to reset a lost ESXi root password on Nutanix is:

1. ssh into the CVM on the host that has the lost ESXi root password, using the Nutanix account name.

2. Enter: ssh root@192.168.5.1

3. ESXi console: passwd root

4. If the account is locked out: pam_tally2 –user root –reset

If you then run the add host wizard in vCenter and your password doesn’t work, try rebooting the ESXi host. This procedure saved us from re-phoenixing the ESXi host.

Nutanix Veeam Backup & Replication v9.5 U3 BPG Update

I’m proud to announce that the Nutanix Veeam Backup & Replication v9.5 guide has now been updated with U3 details. Specifically, in U3 there’s a new registry key “EnableSameHostDirectNFSMode” that is now active for vSphere customers using DirectNFS. In Nutanix environments if you are using DirectNFS, you should configure this key with a value of “2”. See the updated guide here, for the full registry path and explanation. Do take note that this key is only active since U3, so if you are on prior Veeam Backup & Replication v9.5 builds, update to at least U3.

Update: Fixed link to the new guide, which can be found here. Or, if you are a Nutanix customer, the direct link to our support site here.

New Nutanix Community Edition Release (2017.07.20)

Hot off the press is the community edition of Nutanix AOS 5.1.1.1, Community Edition (CE) 2017.07.20. For what’s new in AOS 5.1.1.1, see my post here. You can find all the direct download links at the end of this post in the Next community. As always, you can perform a simple 1-click upgrade via our PRISM GUI.

Nutanix AFS 2.1.1 Released

Hot off the press is Nutanix AFS 2.1.1 (Nutanix File Services). In case you don’t know, AFS is a web-scale “NAS” that runs in a highly available configuration on Nutanix clusters. This has several important new features, plus a number of resolved issues. New features include:

  • AFS Sizing workflow at time of deployment
  • Ability to rename AFS clusters
  • Ability to clone AFS on AHV (used for backups, DR testing, recovery, etc.)
  • Microsoft Management console support for AFS management
  • Ability to manage AFS permissions via the file server administrator role (which can be linked to a AD user or group)

The full release notes can be found here. You can download the new package here.

Nutanix AOS 5.1.1.1 Released

Today I am glad to announce the general availability of Nutanix AOS 5.1.1.1. This is a patch release, but it also has a few new features. Of interest to you will be the security patches (11 total), and a good sized list of resolved issues. You can find the full AOS 5.1.1.1 release notes here and download the package here. Before any upgrade, do thoroughly read the release notes and make sure any prerequisites are met. There’s also a good Installation and Upgrades document here, which is a must-read before you upgrade.

New Features include:

  • Nutanix API v3 tech preview
  • GA of software-only support on Cisco UCS-B series servers
  • Expanded support for vSphere 6.5 (e.g. Dell XC)
  • Full support for ESXi 6.5a and vCenter 6.5

As always, this AOS update can be done via PRISM and our 1-click upgrade process. Zero downtime, and zero vMotions are needed. Customers often do AOS upgrades during the daytime. This release hasn’t yet been enabled for automatic download (it will in the coming weeks), so if you want it before the automated downloads are enabled just grab the gz package from our portal. If you are brand new to Nutanix and never done an AOS upgrade, feel free to call support. It’s dead easy and 100% GUI driven, but help is here if you want it.

If you haven’t yet upgraded to the 5.1 release train, now is a great time to do so.

Nutanix VirtIO 1.1 Drivers Released

If you are a Nutanix customer and using AHV (Acropolis Hypervisor), I have some great news for you. We have released VirtIO 1.1 drivers, and the big deal is that they are now Microsoft WHQL signed. I know the lack of signing was holding back some customers for production workloads. No longer!

VirtIO 1.1And there’s a nice Nutanix install wizard:

Nutanix VirtIOPlease do note, this signed driver package is for 64-bit operating systems only. The supported operating systems are:

  • Windows 7
  • Windows 8.x
  • Windows 10
  • Windows Server 2008 R2
  • Windows Server 2012
  • Windows Server 2012 R2
  • Windows Server 2016

You can directly download the drivers from here. We have them in both .MSI format and ISO, depending on your needs. All of the related documentation can be found here, including the special case of 32-bit operating systems. And don’t forget, if you want to inject the VirtIO drivers into your Windows ISO, check out my blog post here.

Nutanix AOS 5.1 & Companions are now GA

For the second time this year, Nutanix has released a major feature upgrade to AOS and companion software. Now available, is AOS 5.1! Top of the list of new features is vSphere 6.5 support for NX platforms (Nutanix branded gear). vSphere 6.5 support for OEM platforms is coming soon. But that’s not the only new feature. Here’s a rundown of some (not all) of the new features:

  • 1-click controller VM (CVM) memory upgrade
  • XenServer support on NX-1065-G5, NX-3060-G5, NX-3175-G5 (optionally with NVIDIA M60)
  • All-flash clusters now support adding hybrid nodes (e.g. cold storage only nodes). Minimum 2 AF nodes.
  • Automatic “admin” account password sync across all CVMs, Prism Web console, and SSH interfaces.
  • Docker container management through self-service portal.
  • Prism 1-click feature to install Docker host VM
  • Post-process compression is enabled by default on all new containers with Pro and Ultimate licenses
  • 1-click centralized upgrades from Prism Central
  • 1-click Prism central cluster registration and Prism Central Deployment
  • Pulse (telemetry) enabled for Prism Central
  • Auto-resolved alerts
  • User defined alerts
  • Graphics and compute mode for NVIDIA M60 GPU
  • CHAP authentication for Acropolis Block Services
  • Hot-plug CPU and memory on AHV VMs
  • Metro availability and synchronous replication supported across hardware vendors (NX, Dell, Lenovo). Async support continues.
  • VirtIO drivers updated to v1.1
  • Dynamically increase EC-X strip size as cluster is expanded
  • Much improved storage efficiency reporting in Prism (compression, dedupe, EC-X, etc.)
  • Disk rebuild time estimation
  • AFS supports Mac OS v10.10, v10.11, v10.12
  • Acropolis Block Service enhanced OS support (Solaris 11, RHEL 6, 7, 6.8)

Tech Preview Features include:

  • Software only support for UCS B-series blades
  • GPU pass-through for AHV guest VMs
  • Support 3rd-party network function VMs (e.g. load balancer, firewall, etc.) routed through Open vSwitch (OVS).

Companion Software Updates

  • Prism Central 5.1
  • Acropolis File Services (AFS) 2.1
  • Acropolis Container Services (ACS) 1.0
  • Foundation 3.7.2

Helpful Links

As of 5/1/2017, AOS 5.1 has not been enabled for automatic download and 1-click upgrades. As always, if you don’t want to wait for the automatic download switch to be flipped (in the near future), you can grab the AOS binary from the support portal and use our 1-click upgrade process. As always, thoroughly read the full release notes on the support portal before attempting an upgrade.

Nutanix AHV (Hypervisor) Update Released

Fresh off the press, is an updated version of the Nutanix hypervisor, AHV. Today we released AHV-20160925.43. Features include broader support for the AOS 5.0.x family, and a tech preview feature of hot plugging memory and CPU on VMs. A number of bug fixes were also incorporated. Noteworthy, is a network stack issue that has been fixed. This issue could be experienced by any customer, so upgrading is encouraged.

If you are using AHV, you are encouraged to check out the release notes here. AHV is 1-click upgrade enabled, so you can easily update your cluster in an automated fashion with no VM downtime.

 

Critical Zerto 5.0 update for Nutanix customers

If you are a Nutanix customer, and using Zerto, Zerto has release a very critical patch. Unpatched, the bug could result in data integrity issues. This has been seen in the field, particularly with Microsoft SQL. So it’s very imperative that you review the Zerto release notes and upgrade to Zerto 5.0 U2 ASAP.

The issue manifests itself during CVM failover operations and result in lost writes.  These failover operations are normally transparent to the guest, and in no way interrupt the normal I/O data path. The Nutanix business critical apps team was able to reproduce the issue in house at will, and confirmed Zerto 5.0 U2 has resolved the issue. Zerto is a great partner, and they were extremely helpful in resolving this issue. The issue is not exclusive to SQL, but is more readily apparent since DBAs do routine integrity checks which will fail if the bug has been triggered.

I do not know if versions prior to 5.0 U2 (such as the 4.x version) will receive the patch. Check with your Zerto account team. If the patch won’t be back ported, then you need to plan on upgrading to 5.0 U2 ASAP. We have many joint customers, that absolutely love the Zerto features coupled with Nutanix simplicity.

My boss, Michael Webster, also has a write up about this issue which you can find here.