Classification and Detection of Zombie Virtual Machines … Which tools should I use?
In this post I want to discuss what makes an Orphan or a Zombie in a VMware environment. I also want to introduce some of my favourite tools for detecting these malodorous fiendish creatures which can wreak havoc within your virtual world.
From time to time you might come across Orphaned or Zombie virtual machine files that you need to deal with before being able to complete a task.
An orphaned virtual machine is one where the relationship between a vCenter Inventory object and associated objects living on datastores has become inconsistent. Think of it as a lack of synchronization between the vCenter view of the world, and what’s actually out there.
It is possible to have entire orphaned virtual machines, or a subset that comprise a single VMDK disk or other files. You might not be able to tell which parent virtual machine the subset of files belongs to, hence the use of the word orphaned.
Where do Zombies come from?
You can see from the VMware Knowledge Base article KB at this link some of the reasons why Orphaned objects can occur:
- After a vMotion or VMware DRS migration
- After a VMware HA host failure occurs, or after the ESX host comes out of maintenance mode
- vCenter Server is restarted while a migration is in progress
- Too many virtual machines are scheduled to be relocated at the same time
- Attempting to delete virtual machines when an ESX/ESXi host local disk (particularly the root partition) has become full
- Rebooting the host within 1 hour of moving or powering on virtual machines
- A .vmx file contains special characters or incomplete line item entries
This is not an exhaustive list and sometimes it’s hard to qualify why something has occurred. My experience is that most zombies have been caused by either:
- Operational issues or incomplete operations leading to stranded objects.
- Snapshot related issues.
You can simulate a scenario in a lab to become more familiar with resolving these issues. I will be doing that in subsequent posts, to illustrate how you can create a zombie yourself, and detect and remediate that situation.
Before we do that, what tools are people using today in the VMware Community to find these objects?
Free Tools to check for Zombies
Two of the most popular free toolsets within the VMware community are:
RVtools by Robware, which you can download from here http://bit.ly/1d6xHoj.
- This is a freeware tool created by Rob de Veij. It is widely used within VMware environments, and performs read-only queries against a vCenter instance.
- I suggest you deploy it in a test environment first, to be sure you’re happy to use it in production.
Point it at your vCenter server and it will pull information into a nice tabular format which you can filter on, and export to Excel format for further manipulation.
Quest PowerGUI and the VMware Powerpacks.
- From within the PowerGUI Console, you can run Power packs which are bundled sets of PowerCLI/Powershell scripts, to query vCenter via a nice intuitive interface which, behind the scenes, translates into Powershell code. You’ll see how you can get that that code below to modify it to meet your requirements.
- First download PowerGUI from Quest @ http://dell.to/1gVTn9w
After installation the standard vSphere Management Power Pack is included.
- To check out some of the other useful powerpacks you can add to PowerGUI look at this page: http://bit.ly/18RZR9M.
- Don’t forget the really great VMware Community powerpack developed initially by Alan Renouf and now under the auspices of the VMware Community.
- You need to download it separately and import the powerpack into the PowerGUI Administrative Console, which is where all the powerpacks run from. You can get that powerpack here http://bit.ly/18Sg1Qv
You import the Community powerpack it by running the PowerGUI Administrative Console. Then go to File -> Powerpack Management and click the Import option and select the Community power pack.
Used together, the VMware powerpacks are very powerful, both for reporting and diagnostic purposes.
For both RVtools and PowerGUI, you need to connect to vCenter to be able to execute queries. You can access individual hosts too but that is less useful than a consolidated view.
Log in to vCenter with the required credentials:
At this point RVtools will execute scripts which collect information from vCenter and populate its many tabs:
You need to add a managed host to PowerGUI in the vSphere Management Powerpack:
Now add your credentials, which will be subsequently cached. You can disconnect and clear the cache if you need to for security reasons.
Once that’s done you can click on queries in the Explorer view which will cause real time Powershell queries to be issued against the vSphere API:
If you navigate down the left hand side explorer view you will find the following query you can run which will show you all the zombie and orphaned objects within your environment:
And here’s another tip. If you want to modify the code, just select Properties for the object in question and the Powershell code being used to execute the query will be shown:
For other Powershell code available for download I suggest you check out the Blogs of Alan Renouf, Luc Dekens and Jonathan Medd.
Now … To establish the health of your environment using RVtools just click on the vHealth tab which will show you the results of a quick health check against your environment:
There are 11 queries which are run, as follows:
- VM has a CDROM device connected
- VM has a Floppy device connected
- VM has an active snapshot
- VMware tools are out of date, not running or not installed
- On disk xx is yy% disk space available! The threshold value is zz%
- On datastore xx is yy% disk space available! The threshold value is zz%
- There are xx virtual CPUs active per core on this host. The threshold value is zz
- There are xx VMs active on this datastore. The threshold value is zz
- Possibly a zombie VMDK file.
- Possibly a zombie VM.
- Inconsistent Folder Names
To set the threshold values above, just select the menu Health -> Properties and you can tailor to your organization’s thresholds and policies:
Until next time…..