A blog by Steven Johnston

August 18, 2014
by Steve

How VSANs Are Like Kit Cars (in light of recent VSAN HCL Update)

Sometimes the two things I love most, technology and cars overlap, or maybe I am just wanting an excuse to talk about them both. Either way, people seem to understand analogies using cars related to technology so I am going to use one here to describe something that has happened recently affecting one of the emerging trends in technology ‘Software Defined Storage’ (SDS).

Every now and again, car manufacturers who provide pre-built cars recall certain models due to issues. In the UK we even have a government website to check

So this brings me to what has happened recently. VMware announced they were removing support for a dozen Storage Controllers that were previously on the VSAN Supported Hardware Compatibility List (See the VMware blog post Here).

I wasn’t that surprise to see this, especially as two Months ago a colleague drew my attention to this post on Reddit, under the VMware Sub-Reddit (VSAN Outage Root Cause). This post describes a VSAN outage were the conclusion which came from VMware Support was that after experiencing the outage and then needing to rebuild, that their Storage Controller, a Dell H310 (with very low queue depth of 25) simply wasn’t able to support high enough IO throughput to perform the rebuild whilst still running production.

The amendments to the VSAN HCL are no doubt related to support cases like this or perhaps related to Storage Controllers that share the same family or characteristics which prevent them from being able to deal with the worst-case scenarios such as the one which happened to this unfortunate person.

It isn’t really surprising to see updates to Hardware Compatibility Lists (HCL) although perhaps less usual to see amendments removing items that were already on there. This brings me to the following conclusion. In the Software Defined World where VSAN in this case is the Software Defined Storage (SDS) product, it isn’t just about the software. The hardware is even more important.

So back to my car analogy. This time I am contrasting the differences between a pre-built car and a kit car.

When you buy Off-The-Shelf storage arrays from EMC/NetApp/HDS/3PAR etc these are designed specifically to meet a certain standard of performance, just like a pre-built car. When you buy one of the Porsche 911 models it comes with a distinctive look, feel and performance characteristic just like the Off-The-Shelf storage array.

When you buy a kit car you will get a frame and other parts that define the look and feel of the car (like the software in SDS) but what you put underneath it, the engine and drivetrain (the part that delivers the engines power to the wheels) you choose determines the performance of that vehicle. This is the same with a Software Defined Storage approach and in this analogy, the engine is like the disks and the Storage Controller like the drivetrain.

There is another point that this analogy can be used to explain. When you buy a pre-built car, you typically get a manufacturers warranty which covers everything for the first 3 years whereas if you buy a kit car then the individual components have their own warranty.
In a pre-built car, the sum of the parts are guaranteed to function as expected for the term of the warranty. Whereas in the case of the kit-car the individual components are covered or in the case of Software Defined Storage, the software and the hardware are covered independently, so the software is assured to function with the hardware designated as supported but the performance derived from the sum of the components is not covered. So remember the burden of ownership with regards to performance is on the customer not the supplier.

Kit cars don’t get recalled in their entirety, perhaps a component like a seat belt holder will get recalled just as a Power Supply from a server may be recalled. If a seat belt in a pre-built car was found to be faulty then the whole car would be recalled so that could be addressed but the reason a kit car isn’t recalled in it’s entirety is because it is not the whole product.
Like a pre-built car manufacturer, Off-The-Shelf Storage Array manufacturers will normally distribute any known issues and have a fix available for them. They may even just contact you to arrange a time to apply the hotfix included under your maintenance. In the case of a kit car or a Software Defined Storage array, the ownership of maintaining the setup is on the customer.

So if you are running VSAN, the burden is on you the customer to ensure you are not let down by either performance or any issues that have become known. Don’t cheap out on the components, not the disks or the Storage Controller. The Storage Controller other than the disks is probably the most important component to ensuring it performs as you expect it to.
For known issues I would suggest frequently checking for updates from VMware, to see if there are any Knowledge Base articles specific to your hardware configuration and then arrange to upgrade, patch or amend your setup to address the issues with it as soon as possible. To assist in keeping track of known issues use the VMware KB Digest, you can subscribe via RSS or follow their twitter account @vmwarekb.

I think Software Defined everything is great, it will help give us get many flexibility benefits just as virtualisation has already done. I think Vmware VSAN is great, a real step forward and perhaps something which will ultimately lower the entry level cost of shared storage whilst also increasing the mobility of data.
However compared to an Off-The-Shelf Storage Array, VSAN has slightly higher maintenance needs. It has many positive aspects too, such as it’s simplicity of operations and tremendous flexibility.
So if planning a VSAN it is worth being equipped with this knowledge before hand. Also check the latest VMware pre-validated configurations named ‘Virtual SAN Ready Nodes’ over here. It may be best to form your solution matching or close to one these.

Now I just need to get a home that has a garage so I can build a kit car or maybe I will just get saving for that Porsche 911 Carrera. Ha, even if I had the cash, my Wife would have something to say about either option!

July 2, 2014
by Steve

VPLEX Virtual Edition (VPLEX/VE) In The Real World

It seems so long ago now when I reviewed EMC World 2014. One of the things I wanted to learn more about while I was there was VPLEX/VE.
So far, everything I have found out makes me wonder, “Which type of customer it is designed to fit?”. To explain what I am talking about here you need to know the architecture.

VPLEX/VE Architectural Features:

  • Uses vApps and runs on ESXi.
  • Requires 4 vDirectors per site and each is statically bound to an ESXi Host.
  • Has a virtual management server per site that can reside on any of the ESXi hosts.


  • Has optional cluster witness feature and for ideal circumstances this needs a third site.
  • VPLEX/VE will only operate between 2 sites synchronously (VPLEX Metro).
  • 2x WAN links are preferred with latency up to 10ms roundtrip between replicated sites and up to 1000ms between replicated sites and witness site


  • Is iSCSI only (FC is only in the bigger brother full VPLEX).
  • Supports VNXe arrays only.
  • Is limited to 80K IOPS
  • Is managed via the vSphere Web Client
  • Needs distributed Switches for operation [Edit: Correction , Spotted by Leonard McCluskey and supported in EMC documentation. Is supported on vDS or vSS. Thanks!]

The Argument:

Having covered the above points lets extract what this really means. VPLEX/VE is an amazing feat of engineering. I welcome the software only version of the brilliant VPLEX hardware but I find its use may be somewhat limited currently. Perhaps I am not thinking openly enough?

My argument is based on the fact that VPLEX/VE supports VNXe and iSCSI only, so can only appeal to companies who would use this combination of storage array and protocol for production storage. i.e. small businesses.

I find the following areas conflict with the typical profile of small businesses:

  • 4 ESXi Hosts per site are required as a minimum. Due to needing distributing switches these Hosts will require Enterprise Plus licensing. Many small businesses aren’t likely to have as many as 8 hosts and usually license vSphere at lower versions due to costs. [Edit: Based on above correction]
  • The witness should reside on a third site. Many small businesses are lucky to have somewhere suitable to run their Server hardware at 1 site, let alone 3.
  • Having 2 WAN links between Site 1 and Site 2 with less than 10ms round trip time is a big ask for a small business. Even 2 WAN links between Site 1 & 3 and Site 2& 3 with 1000ms round trip time could be challenging in some small businesses. I appreciate however that it will work with 1 WAN link between each Site.
  • Implementing a stretched vSphere cluster doesn’t stop once compute resource and active/active multi-site storage has been provided. It requires networking configuration providing a stretched layer 2 subnet and this is again something a small business is less likely to have.

Many of these requirements are easily met in larger companies. Multiple sites with facilities to run hardware, 4 hosts per site on 2 sites with a third to run the witness, low latency WAN links.
These are all pretty trivial for larger customers but VNXe as a main production storage array running a workload important enough to give it a multi-site stretched vSphere cluster is something I think is unlikely to be present in those customers.

I appreciate that VNXe is frequently used in larger companies (e.g. branch, departmental use or backup targets) but those same companies are much more likely to run the full blown VPLEX with a high-end VNX or VMAX, especially for very important workloads.

VNXe as a production storage array in my experiences are primarily found in small businesses whereas the environment required to support VPLEX/VE is rarely found in companies of that size. There are always exceptions but to put it bluntly, if a company can afford the environment required to run VPLEX/VE, they are likely to use a higher caliber storage array (Not putting VNXe down, it is a great product).

Disagree? Let me know in the comments.

May 15, 2014
by Steve

My Review of EMC World 2014

This year, thanks to my employer Novosco (an EMC Partner), I got the chance to return state-side to EMC World in Las Vegas. I had an amazing time and am now only recovering properly from the jet lag and the madness that is Vegas.

photo 3

From the conference, there were a number of big announcements, those that excited me most were:

VNXe 3200 - Which includes a pretty impressive MCx overhaul as well as bringing in new features such as FAST Cache/ FAST VP and Fibre Channel connectivity which are all firsts for the VNXe family and makes it nearly as feature rich as its bigger brother the VNX.

ViPR 2.0 - With the introduction of data services rather than just operating in the control plane, this is something truly to get excited about. Its got Object but now it also has ScaleIO baked-in to provide Block services (How cool is that!). They also announced that the Controller now supports HDS arrays and vBlocks.

XtremIO Snapshots - XtremIO gets an upgrade to include snapshots with zero overhead meaning you can go nuts. I should also mention there is a Flash Rescue Program for customers who purchased a competitor All-Flash Array (AFA), are disappointed with how it performs or behaves and need an AFA that lives up to the hype.

There was also mention of a software-only version of OneFS (from Isilon and dubbed vOneFS) which will be a supported version of the virtual Isilon (which I have running on my home ESXi server and is well worth checking out!) and a software-only version of VNX (Project Liberty) targeted at test/dev.

One of my goals before heading to Vegas was to find out more about VPLEX VE. As a huge fan of VPLEX I wanted to see just what the Virtual Edition could offer so attended a session covering the architecture. It give me a greater insight into the product, its current abilities (VNXe only today) and merits further comment but too much for here so I will follow up on this in another post.

My personal highlight of this year’s conference was Chad Sakac’s Area 52. The session was held in the largest of the rooms (a definite requirement as Chad always draws a crowd). It’s hard to describe the scale but a large room in Vegas is an insanely massive room anywhere else, think of a football field or maybe even more, where the back row of seats would only just see Chad on stage and I’m not referring to his height either (for those who don’t know, Chad is not a tall man). The room was rigged up with multiple screens to display Chad, the other presenters on stage and the slide contents. I had 3rd row seats, as you can see below.
photo 1
Having followed Chad’s Virtual Geek posts on storage classification (which he likened to phylum) I found it engaging and thoroughly enjoyable with the session covering the same key points as his blog posts (check out the first one here). The difference was on top of his blog content, there were awesome demos of what EMC are doing in each of these areas with not every feature being completely ready for release but well on the way. They demoed Unisphere Central, EMC Storage Integrator (free), Protectpoint which is VMAX / Data Domain integration featuring lower backup load going over the network with a massive reduction backup time, XtremIO Snaps with the impact (or lack of) of its use, OneFS Cloud Pools which is basically Isilon spanning cloud services and ScaleIO running on cloud services and being moved with a live workload against it to the EMC Hybrid Cloud.
As if those demos weren’t awesome enough, he then went on to cover what he describes as the new 5th storage type thanks to their acquisition of DSSD (really excited to hear more on this – Chad has a post on it here).

After leaving Vegas the afterthought is very much that EMC are full throttle at creating software-only versions of their products and to make things easier they are integrating these into ViPR. Many of these software-only versions are available from their support site to get them out there, get the customer/partner familiar and to make them easier to adopt. On that note I highly recommend the virtual Isilon, go build yourself a cluster, its excellent. ViPR is also freely available since it is becoming the key management interface (with much more) it is also very much worth the download.
Continuing on my afterthought, it is clear that EMC recognised a while back that the era of having every product with tie in’s between tin and software are over, it has never been more clear as we are now seeing the fruits of their pursuit in providing software-only versions of everything. I make this comment knowing full well that tin/software tie in’s will probably always have their place in this industry for guaranteed results, I’m thinking hardware VNX/VMAX/XtremIO, VBLOCKs, or a non-storage example would be Apple products. Its very much about the synergy between the defined stack and software running on it but in todays world, in some cases people prefer mobility and flexibility which has its cost but as always is about weighing up the pro’s and con’s of the differing solutions.
It all makes me realise that we have an exciting period ahead where the need for customers to be aligned to knowledgable partner companies is, if anything, increasing because there are so many more choices today rather than just go with a hardware scale-up storage array and keeping on top of the options is a mammoth task which the typical customer doesn’t always have time for.

January 24, 2014
by Steve

Configuring vSphere App HA 1.0

When vSphere 5.5 was released, there were a number of features listed that I wanted to check out, App HA 1.0 being one of them. Recently I have been working with App HA in my homelab and realised there were a number of components which are probably new to many VMware Administrators.

This blog post is to explain what App HA does and to cover an overview of the install guided by what I carried out on my homelab.
I focus on parts of configuration which would be new to most people such as in the Hyperic interface and the configuration required within vSphere to enable the functionality. I gloss over common things like deploying vApp’s as I expect most people will have done so at some point but if they haven’t there is a ton of material out there covering that.
I hope others will find my post useful to quickly gain an understanding on the requirements to deploy it and to comprehend it’s functionality.

What does App HA do?
App HA will assist the VMware Admin in controlling application availability by displaying the status of applications, triggering notifications when services are unavailable and performing remedial actions such as restarting the services and resetting the virtual machine.

At the time of writing in App HA 1.0, the following Services are supported:

  • Apache Tomcat 6.0 and 7.0 on Windows and Linux
  • IIS 6, 7 and 8 on Windows
  • Microsoft SQL 2005, 2008, 2008R2 and 2012 on Windows
  • Apache HTTP Server 2.2 on Windows and Linux
  • SharePoint 2007 and 2010 on Windows with Hyperic 5.7.1 SharePoint plug-in
  • SpringSource tc Runtime 6.0 and 7.0 on Windows and Linux

This list of Services are provided by VMware here.

The Installation and Configuration Steps
For the installation of App HA the first thing of note is that vSphere App HA 1.0 has a number of components that’ll need installed prior to enabling it, this includes vCenter Hyperic, the App HA vApp and Hyperic Agents. App HA functionality is not present in the vSphere Client only within the vSphere Web Client.

The Installation and Configuration Steps for App HA are:

  • Deploy vCenter Hyperic 5.7x
  • Deploy App HA virtual Appliance
  • Hyperic Agent Installation (First of all on vCenter to allow the next step) (*)
  • Setup vFabric Hyperic vCenter Server Plug-in (enables alarms to be sent from vCenter Server)
  • Use Web Client to Configure App HA to see Hyperic
  • Set Cluster Configuration for “VM & Application Monitoring” (**)
  • (Optional) Configure Hyperic Files to Enable Automation and Mass Deployment
  • Configure and Assign App HA Policies (*)
  • Test Your Policies (*)

N.B. Steps with (*) need repeated for each VM.
Steps with (**) need repeated for each Cluster.

Deploy vCenter Hyperic 5.7x –

A pre-requisite for App HA is Hyperic, which is a product in it’s own right and came from one of VMware’s acquisitions. Hyperic is provided as either a vApp or an installer for selected Linux and Windows OS. One important point to note, if you intend on using Hyperic outside of the App HA use case, the vApp is recommended for large environments greater than 1000 managed platforms, other configurations are for “medium scale environments”.
The Hyperic deployment type isn’t so much an issue when used with App HA as it has a scale limitation of 400 agents as mentioned in the App HA 1.0 release notes but the vApp is easier to deploy.
The vApp is deployed via OVF and actually contains 2 VMs, one for the application and the other for the database (Remember to configure an IP Pool prior to deployment). The full deployment instructions can be found here.


The Hyperic Dashboard looks like this:

Deploy App HA virtual Appliance –

This is the other vApp required for enabling App HA, this one being the actual AppHA vApp. It is deployed the same as Hyperic using an OVF template. This one consists of a single VM. Full vApp deployment details are here

Hyperic Agent Installation –

Each Server which you intend to leverage App HA with, will require the Hyperic Agents to be deployed and the Hyperic Agent will need to be installed on the vCenter Server to allow the creation of the VFabric Hyperic vCenter Server Plug-in (the next step).

On the vCenter server follow the below instructions (they are the same for the deployment of the Hyperic Agent for all Windows Servers):

  • Download the zip archive
  • Extract it to a directory (e.g. "c:\hyperic-hqee-agent-5.7.1")
  • Navigate to the “bin” sub-directory (e.g. "c:\hyperic-hqee-agent-5.7.1\bin")
  • Execute “hq-agent.bat install” and wait until it finishes.
  • Execute “hq-agent.bat start”. Respond to the series of questions it will ask, they will be specific to your environment. Defaults will be correct in most cases except the final question for which the response must be changed to “yes”

These steps are shown in the command prompt window below:

The Platform (Server) will appear in the Hyperic Dashboard under the Auto-Discovery section. From there you can add it to the inventory by clicking the Add to Inventory button.

Perform the above Hyperic Agent install steps again for any other servers you wish to configure with App HA.
Look to the “(Optional) Configure Hyperic Files to Enable Automation and Mass Deployment” section for details on making this easier.

Setup vFabric Hyperic vCenter Server Plug-in –

To enable alarms from vCenter Server if remediation actions are triggered, the vFabric Hyperic vCenter Server Plug-in needs created.
The Hyperic Agent needs to be installed on vCenter Server for this. After the agent is deployed follow the steps below.

  • Go to Resource > Browse as shown below
  • vFabric_1

  • Click the vCenter server from the Platforms list. In my case this is NEO-VC01
  • Select Tools Menu > New Server
  • Enter Name as “VC”, select “VMware vCenter” from the list and Install Path “*”
  • Click on Configuration Properties
  • Fill in the URL replacing “localhost” with your vCenter name or IP. Complete user and pass and leave process query as it is. Finally ensure Auto-Discover Services? is unticked.
  • On the vCenter Server ensure the VC Resource has a green tick beside it (note this might take a short while) and thats the VC Plug-in sorted.

VMware documentation for this is here.

Use Web Client to Configure App HA to see Hyperic –

The next step is to configure App HA from within the Web Client to see the Hyperic Server. This is done from the Inventory under Administraton > vSphere App HA on the Settings tab.

Complete the details giving IP or Hostname of vFabric Hyperic Server, Port and Username/Password as shown below.
Full documentation on this is here.

Set Cluster Configuration for “VM & Application Monitoring” –

Each Cluster in which you wish to use App HA needs to have that functionality enabled.
To do this, select the cluster then go to Manage > Settings > vSphere HA. From there click Edit.

In this menu ensure Turn ON vSphere HA has the check box ticked if it isn’t already and under VM Monitoring select VM & Application Monitoring then click OK. The final step of this is shown in the image below.
Ensure you repeat these steps for each Cluster you wish to use App HA with.
VMware documentation covering this section is here.

(Optional) Configure Hyperic Files to Enable Automation and Mass Deployment –

When I came across this topic within the App HA documentation, the title was not the same as I have used for this section. I was a little confused and unsure why it was required. The reason for my confusion was the title and description in the first line of the docs mentioned nothing of automation or enabling mass deployment. The description on the first line was:
“To trigger vSphere App HA alarms on vCenter, you must configure certain properties in the relevant vFabric Hyperic file.”.

What it doesn’t explain here is that these steps are covered when you first run “hq-agent start” and are asked a series of questions regarding Hyperic IP Address, Username and Password etc (this is described in the deploying Hyperic Agents section above).

We already know in order to use App HA the Hyperic Agent needs to be installed so if you have deployed and started the service for the Hyperic Agent then you have already ran through the series of questions covering these options and this isn’t required.

So what this section within the VMware documentation is actually describing is how to automate the answers to the Hyperic Agent deploy process. This became obvious when I looked at what it was trying to achieve and was confirmed by the description within the commented section in the file. It states
“Use the following if you’d like to have the agent setup automatically from these properties.”
I highly recommend using these steps as it makes deploying the agent quicker, easier and most importantly enables mass deployment.

If you have already deployed the Hyperic Agent to a server and the service is running but wish to remove the configuration to try again, such as for testing then follow the clean-up steps below.
If the Hyperic Agent has never been deployed then you will not need to perform these clean-up steps. Skip to the section “Configure the File:”

Clean-up Steps (Only required if Agent already deployed):

  • Stop the vFabric Hyperic agent which is achieved by running the “hq-agent.bat stop” from a command prompt open at the agent bin directory. In my example from the "c:\hyperic-hqee-agent-5.7.1\bin\" directory.
  • Delete the “data” sub-folder from within the agent folder, in my example this is the "c:\hyperic-hqee-agent-5.7.1\data\" directory.
  • Next delete the agent Platform from the Hyperic server inventory (i.e. Delete the Server entry within the Hyperic Inventory)

The following section describes steps to edit the file and includes the properties the VMware documentation mentions as well as two other properties I found necessary to have the deployment fully automated. I found if I did not include these properties that it would ask me for a response when the Hyperic Agent was started for the first time. The extra properties are “agent.setup.camIP=” and “agent.setup.unidirectional=”. The first is the Hyperic Server IP address and the second is the option on whether the agent should communicate in a unidirectional manner.

Configure the File:

  • If the Hyperic Agent is not already on the Server, extract it to a local directory. Open the file found within the “conf” sub-directory. In my example from the "c:\hyperic-hqee-agent-5.7.1\conf\" directory and edit the following values replacing the entries between the asterisks (N.B. The Username is normally hqadmin) ensuring the “#” at the beginning of the line is deleted so the option is not commented out:


    Then save the file.

  • Start the Hyperic agent by running “hq-agent.bat start” from the command prompt opened at the agent “bin” directory.
    If it is configured correctly it should not ask any questions as it did prior to using the file.

To use this for other Servers you intend to use with App HA:
Extract the Hyperic Agent, copy the file with the modified properties above to the “Conf” sub-directory, install with “hq-agent.bat install” and start the service with “hq-agent.bat start”.

The VMware documentation covering this section can be found here.

Configure and Assign App HA Policies –

The next step for configuring App HA is to look at Policies.
Remember any servers you wish to monitor need the Hyperic Agent installed so ensure the agent is deployed before proceeding (Steps above).

Policies are created from within the vSphere Web Client at Administration > vSphere App HA on the Policies tab.

For my testing I created a SQL Server policy which can be seen in this list.

To create a Policy click the green plus seen in the above image. In the wizard assign a Policy Name, choose an Application Service with any further service specific options, choose Remediation for that service (Service restart and VM reset configuration) and whether you want to create an Alarms Definition. Policies configured can be used across multiple VM’s, there is no need to create duplicate policies unless the configuration options are different.

Now you have created a policy it just needs assigned to whatever machines you wish to provide App HA for. This is done at the cluster level.

On the cluster which contains the VM you wish to enable the policy with, navigate to Monitor then the Application Availability tab. This will list all machines with services Hyperic is aware of. This is shown in the image below.

From the Application Availability screen right click or use the Actions menu to Assign or Unassign a Policy.

Full details on Assigning a Policy to Application Services can be found here.

Test Your Policies –
As with anything you configure, it is prudent to test it to see if it reacts as you would expect. In my homelab I have configured a policy for SQL Server so used that as my test.
I stopped the SQL Service on my NEO-TOOLS01 server and within about 60 seconds the following was visible from the vSphere Web Client:

And nearly immediately after this alert came through, the service was restarted and the Availability Status returned to Available.

Wrap Up
This concludes the steps which I performed in order to get App HA 1.0 up and running in my homelab environment. If you’ve got through this far and look back at the start, at the list of tasks involved in configuration you’ll notice the steps marked with (*) which will need to be performed to add new VM’s/Applications to be monitored by App HA and steps with (**) that you’ll need to complete for each new Cluster configuration.
Once it is up and running you’ll find it’s extremely easy to add more and also once you are familiar with the process I recommending looking at methods of deploying the Hyperic Agent with the modified file in bulk.

Thanks for reading.

October 13, 2013
by Steve

What has changed in vCenter Server Appliance 5.5 (vCSA)?

Strategically it makes a lot of sense for VMware to decouple themselves from the Windows Server OS for management machines. There are many advantages to doing this which benefit SMB and Enterprises such as simplified install and no Windows or SQL licenses required. These advantages are also valid for Service Providers along with the automation possibilities it brings.

So lets look briefly at the history of vCSA:

In vSphere 5.0 along came vCenter Server Appliance (vCSA). DB support was embedded (DB2 or Postgres) or externally on Oracle. The single biggest problem was that using the embedded database it was severely limited. So much that it was relegated to test labs.

vCSA with vSphere 5.0/5.1 – Supports Max 5 Hosts and 50 virtual machines.

The new version with vSphere 5.5 has the same DB options (with embedded on Postgres) but has removed this embedded DB limitation taking that up to well past what most customers are ever likely to use on a single vCenter

vCSA with vSphere 5.5 – Supports Max 100 Host and 3000 virtual machines.

There are however still other things holding the vCSA back from dominance as the preferred method of vCenter deployment. vCenter matured on the Windows platform and had capabilities built on.

  • Linked Mode (uses Microsoft ADAM)
  • Heartbeat (only works with SQL)

These capabilities are what to this day are still missing from the virtual appliance and are some of the reasons vCSA cannot be used in certain production environments. For the vCSA to get these features means they need to be Linux friendly and that probably means using different methods rather than a port of existing software.

Even for those that can live without Linked Mode and Heartbeat the single biggest reason why people will stick with a Windows vCenter is that Update Manager (VUM) does not come integrated in the vCSA or have a standalone virtual appliance. There is also a 1 to 1 relationship between vCenter and Update manager meaning for every deployment assuming you want to use VUM you will need a Windows Server and a SQL database anyway, so why not use it for vCenter too.

Continuing to deploy vCenter on Windows is probably the route many people will still take as it takes no real extra thought. Components like SSO have made that install more complicated so it makes real sense to want to simplify this and the vCSA is definitely the way to do this.

Until at least the Update Manager component has its Windows dependencies removed, made into a virtual appliance or integrated into vCSA, we will continue to see vCSA reign as the quick and easy pop-up VM in the lab but not in Production. That said, let me not play down what an important milestone this is towards the removal of Windows OS’s in vSphere administration, the core vCenter functionality is now here in vCSA and this is a great win for VMware at this stage. Addressing the VUM component next is likely to swing sufficient number of customers over to using vCSA and it would likely snowball from there.

April 8, 2013
by Steve

VNXe 3100 with ESXi using iSCSI, lets clear up the confusion.

[Edit - I Wrote this blog post a very long time ago but never published it to the public as I felt I needed to address the addition of the SLIC cards to VNXe before I released this into the wild. I have decided to open it up anyway as it is still relevant today in VNXe or even just general iSCSI configurations.]

I do a lot of VNX work and until recently didn’t get an opportunity to deploy a VNXe. As with any product for the first time I did some research in to how it should be configured. It was a VNXe 3100 and would be used with ESXi 5.0 using iSCSI. I searched the web, asked colleagues and the results were that every configuration seemed to be slightly different. Part of the reason behind this is connectivity options. It comes with 2 on-board NICs per Storage Processor but some are also sold with an additional 4 port SLIC card in each SP and it supports link aggregation.

The VNXe is designed to be simple to implement and it is exactly that. The VNXe itself is a very clever box and it is because of this that it’s hard not to implement in a way that will work under most failure situations (thanks in part to FSN). Even if most configurations will allow the VNXe to work, there are still better ways to do things.

Part of the issue with VNXe is that it’s so simple, people who don’t normally dabble with storage solutions implement it and thats when people come up with these unnecessarily elaborate configurations because they aren’t familiar with multipathing, either with VMware ESXi or in general. This inspired me to create this blog post in the hope I can help steer people to the relevent documentation before they deploy their VNXe 3100.

A pre-requesit before even trying to deploy any iSCSI solution with ESXi is to read the VMware guidelines.

Software iSCSI Adapter configuration can be found for ESXi 4.1 and 5.0 in the following places.


The part I see people getting wrong time and time again with iSCSI configurations is with VMkernel ports and physical NICs. There should always be a 1 to 1 relationship between these. 1 VMkernel port with an IP to 1 Physical vmnic port. The guides above both show this in two specific configurations.

  • Option 1 – one vSwitch, multiple VMkernel Ports and multiple NICs. (Uses port-binding)
  • Option 2 – A seperate vSwitch for each VMkernel/NIC. (Easiest configuration, no port-binding required)

There are excellent videos showing Option 1 ( and Option 2 above ( The videos also show MTU being set for Jumbo frames which is important to help increase performance.

So after grasping the concept of 1-1 vmk to NIC the next question is, “Is one NIC per subnet enough? Shouldn’t we have redundancy at this level?

If you have the NICs then sure why not but whatever you do, DO NOT just add these as Standby NICs. This is not a supported configuration from VMware.

The correct way to do this is to maintain the 1-1 vmk to NIC relationship and if using Option 1 add a NIC for each additional vmk and use port binding as you will already have done. In this setup you can add vmk ports which are in the same subnets as the existing two, i.e. add an additional port in each.

For Option 2 I am lead to believe when two subnets are used, going beyond two vSwitches would be unsupported since a subnet would be required per switch.
The reason behind my opinion on this are the following line from the SAN Config guide on PG77.
NOTE If you use separate vSphere switches, you must connect them to different IP subnets. Otherwise,
VMkernel adapters might experience connectivity problems and the host will fail to discover iSCSI LUNs.
I guess a hybrid approach where you have two vmk ports in two vSwitches using port binding would still meet the requirments and be supported as long as the same subnets are used on the two vmk ports in the same vSwitch.

Ultimately the simplest configurations are normally the best and easiest to troubleshoot. The diagram below shows the same config as described in the “EMC VNXe High Availability Overview” white paper. The white paper can be viewed here
[Edit 14 May 2014 – This white paper is no longer available but a newer one covering the same topic and more can be found here,-VNXe3150,-and-VNXe3300-Overview.pdf?language=en_US]

The configuration below provides two pathes to each Datastore and uses the simplest vSphere configuration (Option 2 above). In my opinion if you are using the VNXe3100 without the add-on SLIC cards then its best to adhere to the document and have a deployment as below.


May 6, 2012
by Steve
1 Comment

Zerto Virtual Replication the PoSh way.

Recently I have had the pleasure of deploying Zerto Virtual Replication ( My first impressions are very good, it’s a well polished product. The install and configuration was very straightforward and it worked without glitches. ZVR deals with all the replication and even reconfigures the networking i.e. re-IP’ing at the DR site, this is all made effortless. The replication is asynchronous and during our testing over a 100Mbit link we set the RPO to 2 minutes but were seeing actual RPO’s of around 12 seconds which is impressive.

The architecture hinges around having a management server at each site called the Zerto Virtual Manager (ZVM) and having Virtual Replication Appliances (VRA) on each ESXi host. The product uses the concept of the Virtual Protection Group’s (VPG) to collate machines together with common configuration such as RPO and Journal CDP History which affects the point to which you can roll back to. When the VPG’s are protected you have the option to Move, Failover or Test the Failover. Among the configuration options on the VPG’s there are Pre and Post Scripts and this is where PowerShell (PoSh) comes in very handy. This post was inspired by a requirement at a customer’s site and focuses on a DNS record update script you may need to finish off your deployment.

The Admin Guide is well put together, provides good guidance on the configuration and suggests some additional functionality you may wish to provide by using scripts. Functionality suggested includes updating DNS records at the DR site after Failover and also recording Failover Testing in text files on the Zerto Virtual Manager servers. I decided to put all this functionality into one PowerShell script since both of these suggestions are great, updating DNS is an absolute must and recording Test Failovers seems sensible. I used the DNS Update powershell script from the Admin Guide as the basis for my script and added extra bits so only a single script is required for all VPG’s and so the Failover Test history is recorded using the same script (the Admin Guide has a separate batch file script to do this).

I realised DNS should only be updated if a Move or an actual Failover occurred. Failover Testing brings VMs up in your VM Port Group of choice but it’s very likely you’ll use an isolated network so definitely don’t want to change production DNS records. Zerto make achieving these goals easier by providing environment variables such as
%ZertoOperation% with values of Move, Failover or Test and
%ZertoVPGName% which contains the VPG name.
These Environment variables are explained in the Admin Guide and I have used them within my script.

The Post Script uses a Powershell ps1 file, Dnscmd.exe and a subfolder per VPG which contains 4 csv files. Dnscmd.exe is wrapped in the Powershell script and updates DNS records using the relevant csv files.
Dnscmd.exe can be installed on Windows Server 2008 R2 from the Features menu and for this script to run the executable needs copied to the script directory “C:\ZertoScripts\”.

The script is run from the site you are failing over to, a common path on both ZVM servers should be used for scripts to ensure they run when you failover and failback again. Also the service account you use for Zerto will require local admin privileges and needs to be added to the DNSAdmins group so the DNS records can be updated. Note that the csv file contents will be different for the VPG’s at each site as you will want to change DNS to use the IP’s for whichever site you are moving the VM’s to. The script deletes old DNS entries and imports new ones to replace them.

The script directory “C:\ZertoScripts\” should contain the following files. Note VPGName1 is an example subdirectory named exactly the same as the VPG name and will have 4 csv files within it.


In the Zerto tab within vCenter under the VPG’s options you’ll find the Post Script options. For this script they need to be configured as follows:
Command: %SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe
Params: C:\ZertoScripts\DNS-Change.ps1 %ZertoOperation% %ZertoVPGName%

DNS-Change.ps1 contains the following:

#### Zerto Failover Script

## Timestamp
$timestamp = get-date -format "dd-MM-yy_HHmm"

## Get Environment Variables
$ZertoOperation = $args[0]
$ZertoVPGName = $args[1]

if ($ZertoOperation -ne "Test"){

	## Set DNS servers
	$DNSservers= @("", "")
	## Filepath to script and CSV files
	$FP = "C:\ZertoScripts\"
		Foreach($DNSserver in $DNSservers)
			Import-CSV .\$ZertoVPGName\DNS-OldA.csv | foreach {
			.\dnscmd $DNSserver /RecordDelete $ $_.hostname A $_.ip /f}
			Import-CSV .\$ZertoVPGName\DNS-NewA.csv | foreach {
			.\dnscmd $DNSserver /RecordAdd $ $_.hostname A $_.ip}
			Import-CSV .\$ZertoVPGName\DNS-OldPTR.csv | foreach {
			.\dnscmd $DNSserver /RecordDelete $_.reversezone $_.lowip PTR $_.fqdn /f}
			Import-CSV .\$ZertoVPGName\DNS-NewPTR.csv | foreach {
			.\dnscmd $DNSserver /RecordAdd $_.reversezone $_.lowip PTR $_.fqdn}

Else {
	$LogContent = $ZertoVPGName + "   " + $timestamp
	Add-Content c:\ZertoScripts\Results\ListOfTestedVPGs.txt -value $LogContent


Example contents of the csv files are shown below. Remember these are within subfolders which are named exactly the same as the VPG names. Also note the header line in each file is required:


reversezone,lowip PTR,fqdn,10,server01.addomain.local


reversezone,lowip PTR,fqdn,10,server01.addomain.local

For convenience you can download the Powershell script and example csv files in a zip(Dnscmd.exe is not included!).
If you extract this to the “C:\ZertoScripts\” directory and copy Dnscmd.exe to this folder on each ZVM server you will be able to run the script.

I must caveat this post by saying although my initial revision of this script worked and the changes since then are minor, I currently don’t have access to an environment where I can test this. I have requested a trial license, when I get it I will test this script thoroughly and then remove this comment from my post.

March 2, 2012
by Steve

FAST Cache… Its simple but think first.

I love finding out how technology features work, It’s like finding out the secrets behind a magic trick. When you realise it’s not magic you can break it down, get a good understanding and learn how to maximise the benefits whilst negotiating the pitfalls.

Recently I have been working on getting an environment ready to run a VDI workload. The customer uses EMC products extensively, want to do VDI but want it isolated from their other storage. VNX with FAST Cache and FAST VP is the solution. It’s whilst working with this I have come to some conclusions and I wanted to share them.

Most in this industry are aware of the high IOps which can be achieved through the use of flash storage (EFD). This presents some challenges to storage vendors because the architecture in which sets of disks reside are limited by the backend bus on which they sit. It doesn’t take much to realise spreading this load around on different buses is beneficial and will help maximise the benefits of the flash you have in your array.

There are a number of EMC Primus articles covering FAST Cache and the location of the flash drives used for it, if you work in this space. I recommend you read them, emc251589 and emc285141 (a Powerlink account is required to do that). I will share my experiences in the hope it can help others.

Note I have gathered the points for this blog post from information within the above Primus articles and as such they are much more verbose and should be consulted before making any concrete decisions on your storage array layout. Also use your TC’s (EMC Technology Consultant’s) they are a great source of information.

Since FAST Cache utilises EFD RAID1 Mirrored pairs, the same rules apply and the considerations are as follows –

  • Spread the drives throughout different buses; avoid putting all the EFD drives on Bus0 (VNX OE runs here).
  • A point to make about this is that depending on model you are limited to the number and size of FAST Cache drives you can use (see emc251589 for full details), with models like the VNX5100 you wont have a choice but to use Bus0 since thats all it has but you also can’t use more than 2x 100Gb drives so this limits the impact. A VNX5300 has Bus0 and Bus1 so you will likely want to spread the load between these, this leads on to the next consideration.

  • Avoid using a mixture of drives which reside in the vault enclosure (the DPE).
  • FAST Cache uses Mirrored pairs and this recommendation is because drives in the DPE are protected by SPS whilst the other drive in the pair will not be. In the case of failures your mirror will become degraded whereas if they are both unprotected by the SPS they will power off at the same time.

  • Spread the Primary drives in each Mirrored pair among Buses, this helps with availability.
  • This is done by using the cli. Note that the order of drives specified dictates which are primary and secondary. When executing the command the first drive will be primary, then the next will be secondary, then primary, secondary etc. So Primary1 Secondary1 Primary2 Secondary2 Primary3 Secondary3 and so on.

    The command is:
    naviseccli -address [SP_IPAddress] cache -fast -create -disks [Disk_list] -mode rw -rtype r_1

    [SP_IPAddress] is the IP address and [Disk_list] is the disks that will be used for FAST Cache. The format of the disk list for example would be
    0_1_0 1_1_0 1_1_1 0_1_1
    Where the notation is “Bus”_”Enclosure”_”Disk” and they are seperated by spaces. In my example it would be Bus0 Enclosure1 Disk0 as the primary paired with Bus1 Enclosure1 Disk0 and another pair with primary Bus1 Enclosure1 Disk1 and secondary Bus0 Enclosure1 Disk1.

Something worth considering is that you will likely need to move the disks around from the configuration in which they were shipped. Typically they place a lot of EFD in the DPE and as mentioned above you wont want to mix these with disks in other enclosures.

In my case I am working with VNX 5300’s. As I mentioned they have two buses, Bus0 and Bus1. The array I am working with has the DPE and 3 additional DAE’s, they have 4 disks to use for FAST Cache and a Hotspare. After a chat with a very helpful TC about the Primus articles above I relocated all the EFD in the top two DAE’s. This means they are spread across Bus0 and Bus1 whilst not being protected by the SPS. I also set it so for FAST Cache one primary disk is in Bus0 with its secondary in Bus1 and the other primary is in Bus1 with its secondary in Bus0. This part I decided based on common sense and wanting to spread the load so two primarys were not on the same bus (comments on this welcome).

March 1, 2012
by Steve

Started this blog with good intentions…

Recently I had an illustrator create a character like me that I could use on this blog. You’ve probably noticed. If anyone knows me the peak hair cut and sharp appearance are a giveaway (maybe just the first point :p).

I design and implement solutions based on Cisco UCS, EMC VNX and VMware vSphere. This is what I do each week, I love it and find it very fulfilling. Recently got my VCP5 also, like the rest of the masses before the Feb deadline.

I am very busy but see the value in maintaining an online record of interesting things I come across in work. I would love it to spawn some discussion on the topics I cover as well so I want to use this blog not just as a tool to share but to build on my on knowledge and skills also. Mutually beneficial is the goal of this blog.

I bought this domain 1 year ago (I know because I just got the renewal through) and I want to make it work. Lets see how I go.