1. Assumptions and conventions used in this guide
  2. System requirements
  3. What SnapVMX does an how
  4. SnapVMX Modules
  5. How to load SnapVMX
  6. Examples of usage
  7. Frequently asked questions (FAQ)
  8. Common errors
  9. About the author
  10. References
  11. Source Code


Creative Commons License
SnapVMX Documentation by Ruben Miguelez Garcia is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

While troubleshooting Virtual Machine (VM) snapshot problems some times it is important to retrieve a lot of information in order to take the most appropriate decision in accordance with the situation. That collection and arrangement of information may take long time especially if the VM has many snapshots.

SnapVMX was created to speed up the troubleshooting process bringing you instantaneously all the information that you need to evaluate the situation and take the correct decision, reducing the downtime to the bare minimum needed to solve the problem, because after all "time is money".



1. Assumptions and conventions used in this guide

1.1. Assumptions

I assume that you are used to working on a Linux shell and that you have previous knowledge about what the VM files are for and how to troubleshoot VM snapshot issues.

1.2. Conventions

Snapshot structure:
Graphical representation of snapshots taken on a VM. You can see it on the snapshot manager [2].
Commit snapshot:
Operation performed when you click Delete on the snapshot manager. In this guide the word commit will be used instead of Delete to avoid misunderstanding.


2. System requirements

This program is intended to be used on a ESX/ESXi Server version 3.5/4.x It may be possible to use it on other environments as well, however I haven't tested it.



3. What SnapVMX does an how

This program gathers information from the .vmx and .vmdk files of a VM and performs the following operations:

What information does it gather from each file?:

Note: SnapVMX assumes that the content of these configuration files is valid and readable. It also assumes that all the snapshot files are in the same directory as the .vmx file.

Note: SnapVMX assumes that the VM is using SCSI disks. This program was created when IDE disks was not an option and so far I haven't found anyone needing to use SnapVMX on a VM with IDE disks. However you can still grep the .vmx to obtain the .vmdk descriptors and then run SnapTree against them.

The theory behind SnapVMX can be found on the document "Troubleshooting Virtual Machine snapshot problems" [1].

This program was designed as a decision-making tool for the following cases :

When the snapshots structure is no longer present or valid and the VM has many snapshots, you need to carefully analyse and evaluate the situation before you start committing the snapshots. Otherwise you may end up with a completely full datastore and a failed 'commit snapshots' operation, which will make your problem just bigger. These kind of errors are not acceptable on a production environment where the clock is ticking and that is why I created this program, to assist you with your decisions.

The theory behind those problems and their solutions are out of the scope of this guide. I recommend you to contact VMware Technical Support if you have one of those problems and you don't know how to solve it. Alternatively you can use the VMware Knowledge Base or the VMware Communities to know more about these topics.



4. SnapVMX Modules

The whole program has 3 main parts. They use each other in reverse order:

As you will see on the How to use SnapVMX section all these functions can be used independently once all their dependencies have been declared.

I have tried to find the best balance between functionality and size, keeping it simple to understand and modify and robust.



5. How to load SnapVMX

SnapVMX, SnapTree and its helper functions are simple bash functions [3].

A function is a subroutine, a code block that implements a set of operations, a "black box" that performs a specified task. Functions are called, triggered, simply by invoking their names. A function call is equivalent to a command. The function definition must precede the first call to it.

5.1. ESX Instructions

You can load the SnapVMX code into the ESX Service Console in two different ways:

5.1.1. Pasting the code into the console

You can paste the code into your bash shell and use it directly.

Now, when I say paste you have to be careful about a couple of things:

My recommendation to be safe is:

  1. Copy all the code into a plain text editor and verify it looks like in the web
  2. Copy the code from the text editor and paste it into the console in two shots, without trimming the definition of the functions. If you remove comments and blank lines you may be able to paste it all in one shot.

Now you are ready to start using it.

5.1.2. Copying the file with the source code

Here you will find the SnapVMX source code.

If your ESX Service Console has connection to Internet you can download and load the code in one shot with this command:

# wget http://geosub.es/vmutils/SnapVMX.Documentation/SnapVMX.source.code.txt && source SnapVMX.source.code.txt

If not, you can download the SnapVMX source code to your Desktop, and then using the VI Client/vSphere Client, copy it to one Datastore.

  1. Open your VI Client/vSphere Client and connect it to your ESX/VC/vCenter
  2. Go to the Datastores view.
  3. Right click on a Datastore and select "Browse Datastore"
  4. Click on "Upload files to this datastore" button and select the file SnapVMX.source.code.txt that you have on your Desktop.

Then, using the Service Console, go to the place where you uploaded the file and run this command:

# source SnapVMX.source.code.txt

Now you are ready to start using it.

5.2. ESXi Instructions

ESXi has a slightly different source code and two extra steps.

If your ESXi Service Console has connection to Internet you can download the code with this command:

# wget http://geosub.es/vmutils/SnapVMX.Documentation/SnapVMX.source.code.ESXi_version.txt

If not, you can download the SnapVMX source code ESXi_version to your Desktop, and then using the VI Client/vSphere Client, copy it to one Datastore.

  1. Open your VI Client/vSphere Client and connect it to your ESX/VC/vCenter
  2. Go to the Datastores view.
  3. Right click on a Datastore and select "Browse Datastore"
  4. Click on "Upload files to this datastore" button and select the file SnapVMX.source.code.ESXi_version.txt that you have on your Desktop.

Then, using the Service Console, move or copy the file to /tmp/

# mv <path to file SnapVMX.source.code.ESXi_version.txt>   /tmp/

After that, copy and paste these two lines in the ESXi console (you can change the names on the top left for something smaller if you want):

SnapVMX ()  { sh /tmp/SnapVMX.source.code.ESXi_version.txt "SnapVMX \"$1\""; };
SnapTree () { sh /tmp/SnapVMX.source.code.ESXi_version.txt "SnapTree \"$1\""; };

Now you are ready to use both functions as you would do with ESX classic.

For your convenience these instructions will appear if you run:

# head /tmp/SnapVMX.source.code.ESXi_version.txt

6. Examples of usage

6.1. SnapVMX

After you have defined/loaded the functions you just need to go to the VM directory and type

# SnapVMX  <VMName.vmx>

It does not matter if the VM that the .vmx file represents is running and/or registered.

Let's see some examples.

If you point SnapVMX to a VM with no snapshots you will get only the list of disks with their sizes

# SnapVMX bf_Ubuntu_nfs.vmx
 Base Disk: Ubuntu-nfs.vmdk  Size: 3.0G
----------------
 Base Disk: Ubuntu-nfs_1.vmdk  Size: 10G
----------------
 Base Disk: Ubuntu-nfs_2.vmdk  Size: 4.0G
----------------

If the VM has snapshots, they will be displayed together with their sizes and the space needed to commit them all for the worst case scenario.

# SnapVMX PH-RHEL.vmx
PH-RHEL_1-000001.vmdk  Size: 656M
  PH-RHEL_1-000003.vmdk  Size: 2.1G
    PH-RHEL_1-000002.vmdk  Size: 48M
       Base Disk: PH-RHEL_1.vmdk  Size: 8.0G
Space needed on this Datastore to delete all the snapshots of this disk is:  3472.05 Megabytes = 3.39 Gigabytes
----------------

Now let's see an especially ugly case where I have manually broken a few things on the files of the VM. Here the VM has 3 disks with 8 snapshots. One of the chains is broken because of the CID, another has a .vmdk descriptor missing.

# SnapVMX VMName.vmx
VMName_1-000008.vmdk  Size: 2.5K
  VMName_1-000007.vmdk  Size: 2.5K
    VMName_1-000006.vmdk  Size: 2.5K
      VMName_1-000005.vmdk  Size: 20M
        VMName_1-000004.vmdk  Size: 30M
          VMName_1-000003.vmdk  Size: 2.5K
            VMName_1-000002.vmdk  Size: 50M
              VMName_1-000001.vmdk  Size: 2.5K
                 Base Disk: /vmfs/volumes/482c6a32-da3cdd8a-646a-001a4baf5986/VMName/VMName_1.vmdk  Size: 98M
Space needed on this Datastore to delete all the snapshots of this disk is:  216.03 Megabytes = 0.21 Gigabytes
----------------
VMName_2-000008.vmdk  Size: 2.5K
  VMName_2-000007.vmdk  Size: 2.5K
    VMName_2-000006.vmdk  Size: 2.5K
      VMName_2-000005.vmdk  Size: 20M
        VMName_2-000004.vmdk  Size: 30M -- Warning: CID chain mismatch between "VMName_2-000004.vmdk" and "VMName_2-000003.vmdk"
          VMName_2-000003.vmdk  Size: 2.5K
            VMName_2-000002.vmdk  Size: 50M
              VMName_2-000001.vmdk  Size: 2.5K
                 Base Disk: /vmfs/volumes/4688d0e7-7b5c822c-61d7-00145e808070/VMName/VMName_2.vmdk  Size: 128M
Space needed on this Datastore to delete all the snapshots of this disk is:  220.07 Megabytes = 0.21 Gigabytes
----------------
VMName_3-000008.vmdk  Size: 2.5K
  VMName_3-000007.vmdk  Size: 2.5K
    VMName_3-000006.vmdk  Size: 2.5K
      VMName_3-000005.vmdk  Size: 20M
        VMName_3-000004.vmdk  Size: 30M -- Warning: Parent file ("VMName_3-000003.vmdk") not found. Unable to continue checking the chain of snapshots. Exiting.
----------------

As you can see here, the numbers on the snapshot files mean nothing, they may be in order or may not. Here SnapVMX detects a missing file but as it is not required to follow the chain the program continues. There is no requirement for the Base Disks to be on the VM directory, they can be anywhere and the program will display them as far as they are reachable.

# SnapVMX Amstrad_8086.vmx
Amstrad_8086-000006.vmdk  Size: 16M
  Amstrad_8086-000003.vmdk  Size: 16M
    Amstrad_8086-000001.vmdk  Size: 16M
      Amstrad_8086-000008.vmdk  Size: 16M
         Base Disk: /vmfs/volumes/4688d0e7-7b5c822c-61d7-00145e808070/Amstrad_8086/Amstrad_8086.vmdk  Size: 250M
Space needed on this Datastore to delete all the snapshots of this disk is:  96.00 Megabytes = 0.09 Gigabytes
----------------
Amstrad_8086-000011.vmdk  Size: 16M
  Amstrad_8086-000004.vmdk  Size: 16M
    Amstrad_8086-000002.vmdk  Size: 16M
      Amstrad_8086-000009.vmdk  Size: 16M
         Base Disk: /vmfs/volumes/47b31f65-74a19d98-e78b-001a4bb24256/Amstrad_8086/Amstrad_8086.vmdk  Size: 30M
Space needed on this Datastore to delete all the snapshots of this disk is:  42.00 Megabytes = 0.04 Gigabytes
----------------
Amstrad_8086_1-000004.vmdk  Size: 16M
  Amstrad_8086_1-000003.vmdk  Size: 16M
    Amstrad_8086_1-000001.vmdk  Size: 16M
      Amstrad_8086_1-000002.vmdk  Size: 16M
         Base Disk: /vmfs/volumes/4688d0e7-7b5c822c-61d7-00145e808070/Amstrad_8086/Amstrad_8086_1.vmdk  Size: 40M
Space needed on this Datastore to delete all the snapshots of this disk is:  64.00 Megabytes = 0.06 Gigabytes
----------------
Amstrad_8086-000012.vmdk  Size: 16M
  Amstrad_8086-000005.vmdk  Size: 16M
    Amstrad_8086-000007.vmdk  Size: Unknown -- Warning: Delta file ("./Amstrad_8086-000007-delta.vmdk") not found
      Amstrad_8086-000010.vmdk  Size: 16M
         Base Disk: /vmfs/volumes/482c6a32-da3cdd8a-646a-001a4baf5986/Amstrad_8086/Amstrad_8086.vmdk  Size: 50M
Space needed on this Datastore to delete all the snapshots of this disk is:  80.00 Megabytes = 0.08 Gigabytes
----------------

6.2. SnapTree

SnapTree will analyse a single chain of snapshots. You can point SnapTree to any .vmdk descriptor. It will follow the chain until the Base Disk.

# SnapTree PH-RHEL_1-000001.vmdk
PH-RHEL_1-000001.vmdk  Size: 656M
  PH-RHEL_1-000003.vmdk  Size: 2.1G
    PH-RHEL_1-000002.vmdk  Size: 48M
       Base Disk: PH-RHEL_1.vmdk  Size: 8.0G
Space needed on this Datastore to delete all the snapshots of this disk is:  3472.05 Megabytes = 3.39 Gigabytes

6.3. SizeNeeded

On SizeNeeded I deliberately left a code line commented. If you uncomment it and comment the next one you can use the function to train your skills calculating on the fly the space needed to commit all the snapshots. The units there are the ones that you want (Gb,Mb,...).

Firstly paste this on the bash shell

function SizeNeeded () { 
# Usage: SizeNeeded [SnapN, SnapN-1,....,Snap2,Snap1,BaseDisk]
# Example: SizeNeeded [1,2,1,2,20,50]
python -c "R=$1; BD=R[-1]; R=R[:-1]; L=len(R)-1; SU=SN=0; 
for i in range(L): 
        SU=min(((SU or R[i])+R[i+1]),BD);
        SN=(SU-R[i+1])*((SU-R[i+1])>0)+SN;
print 'Array of snapshots: ', R,'-- Base Disk: ',BD,' -- Space Needed: ', SN;
#R_MB=SN/(1024.0*1024);R_GB=R_MB/1024; print '%.2f Megabytes = %.2f Gigabytes' % (R_MB,R_GB) ;" ;
}

And now use it

# SizeNeeded [2,8,1,5,100]
Array of snapshots:  [2, 8, 1, 5] -- Base Disk:  100  -- Space Needed:  23

# SizeNeeded [20,1,1,1,1,1,1,200]
Array of snapshots:  [20, 1, 1, 1, 1, 1, 1] -- Base Disk:  200  -- Space Needed:  135

As the last example demonstrates, a big snapshot on a high position will make the 'commit all' operation require quite more space that you would surely expect ("He who fails to plan, plans to fail").



7. Frequently asked questions (FAQ)

7.1. Can I use SnapVMX over the ESX logs?

Partially. It will display the chain as far as all the files are in the same directory and it will do the CID chain check, but it won't make the calculations about space needed to commit as the flat files aren't there.

Use the command below to remove the error messages due to the lack of delta files

# SnapVMX <VMName.vmx>   2> /dev/null  | sed 's/-- Warning: Delta.*not found//g'



8. Common errors

8.1. bash: SnapVMX: command not found

You get any of these error messages:

bash: SnapVMX: command not found
bash: SnapTree: command not found

Reason: You haven't loaded the functions into the shell.

Solution: Run:

# source /<path_To>/SnapVMX.source.code.txt

8.2. -ash: SnapVMX: not found

You get any of these error messages:

-ash: SnapVMX: not found
-ash: SnapTree: not found

Reason: You haven't pasted the 2 needed lines prior to start using the functions.

Solution: Follow the ESXi Instructions.

8.3. egrep: VMname.vmx: Device or resource busy

You get something similar to

egrep: VMname.vmx: Device or resource busy

Reason: From ESX/ESXi 4.0 the .vmx is also locked when the VM is running.

Solution: Go to the host that has that VM running and repeat the process there.

Alternatively, if you just want to take a look to the snapshots structure, you can:

  1. Go to the host that has that VM running
  2. Run the command below and take note of the vmdks your VM is using (replace <VMname.vmx>)
    # egrep -i "scsi[0-9]+:[0-9]+.present|scsi.*filename|vmdk"  <VMname.vmx>
    
  3. Go back to the host where you were initially and run SnapTree against the vmdks obtained in step 2.

You can do this because only the .vmx is locked, the .vmdk descriptors are not.

Example:

### On host A
# pwd
/vmfs/volumes/4802890b-9131cf02-afb9-001f29e9ca56/VMname
# SnapVMX VMname.vmx
egrep: VMname.vmx: Device or resource busy

### On host B (the VM is running here)
# cd /vmfs/volumes/4802890b-9131cf02-afb9-001f29e9ca56/VMname
# egrep -i "scsi[0-9]+:[0-9]+.present|scsi.*filename|vmdk" VMname.vmx
scsi0:0.present = "TRUE"
scsi0:0.fileName = "VMname-000001.vmdk"

### On host A
# SnapTree VMname-000001.vmdk
VMname-000001.vmdk  Size: 2.5G
   Base Disk: VMname.vmdk  Size: 8.0G
Space needed on this Datastore to delete all the snapshots of this disk is:  0.00 Megabytes = 0.00 Gigabytes

8.4. -bash: wget: command not found

While trying to download the code you get

-bash: wget: command not found

Reason: wget is not installed by default in some ESX versions.

Solution: Use "curl -O" instead.

Example (if you are downloading the ESX version):

# curl -O http://geosub.es/vmutils/SnapVMX.Documentation/SnapVMX.source.code.txt



9. About the author

This program was developed by Ruben Miguelez Garcia (also known as Ruben Garcia) on 2009.

You can find Ruben on Mr.Ruben.Garcia@gmail.com

Donations and gifts accepted :o)



10. References



11. Source Code

#-------------------------------------------------------------------------
#   Copyright (C) 2009, 2010 Ruben Miguelez Garcia
#
#   This program is free software; you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation, version 3.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#-------------------------------------------------------------------------
#-------- Helper functions --------------------
function CID () { grep -i '^CID' "$1"  |  awk -F'=' '{ print  $2   }'; };
function ParentCID () { grep -i '^parentCID' "$1"  |  awk -F'=' '{ print  $2   }'; };
function ParentFile () { grep -i '^parentFileNameHint' "$1"  |  awk -F'"' '{ print  $2   }'; };
function Size () { ls -l "$1" | awk  '{ print  $5   }' ; };
function SizeH () { ls -lh "$1" | awk  '{ print  $5   }'; };
function DeltaOrFlat () { FPATH=`dirname "$1"`; echo -n "$FPATH/" ;  grep -i '^RW' "$1"  |  awk -F'"' '{ print  $2   }'; };
function FileExists () { if [ -s "$1" ] ;  then   echo "true"; else echo "false" ; fi; };
function SizeNeeded () { 
# Usage: SizeNeeded [SnapN, SnapN-1,....,Snap2,Snap1,BaseDisk]
# Example: SizeNeeded [1,2,1,2,20,50]
python -c "R=$1; BD=R[-1]; R=R[:-1]; L=len(R)-1; SU=SN=0; 
for i in range(L): 
        SU=min(((SU or R[i])+R[i+1]),BD);
        SN=(SU-R[i+1])*((SU-R[i+1])>0)+SN;
#print 'Array of snapshots: ', R,'-- Base Disk: ',BD,' -- Space Needed: ', SN;
R_MB=SN/(1024.0*1024);R_GB=R_MB/1024; print '%.2f Megabytes = %.2f Gigabytes' % (R_MB,R_GB) ;" ;}
function CID_chain_check () {
# Usage: CID_chain_check  FILE.vmdk
# Check the CID chain between FILE.vmdk and its parent and display the result (null if no mismatch) 
FILE="$1";
FILE_PARENT=`ParentFile "$FILE"`;
if [ `ParentCID "$FILE"` != `CID "$FILE_PARENT"` ]; then echo -n "-- Warning: CID chain mismatch between \"$FILE\" and \"$FILE_PARENT\""; fi; }

#------- SnapTree --------------------------------------------------------
function SnapTree () {
# Usage: SnapTree  FILE.vmdk
# Follow the chain of files from FILE.vmdk down to the BaseDisk displaying them.
        FILE=$1;
        TAB="";
        SIZES="[ ";
        HAS_SNAPSHOTS="0";
        if  [ `FileExists "$FILE"` = "false" ]; then  echo "File \"$FILE\" not found. Exiting."; return -1 ; fi;
        # While the file is not the base disk
        while [ `ParentCID "$FILE"` != "ffffffff" ]; do
                HAS_SNAPSHOTS="1";
                # Get size of delta file
                DELTA=`DeltaOrFlat "$FILE"`;
                if  [ `FileExists "$DELTA"` = "true" ]; then
                        SIZE_H=`SizeH "$DELTA"`;
                        SIZES=$SIZES`Size "$DELTA"`",";
                else
                        SIZE_H="Unknown -- Warning: Delta file (\"$DELTA\") not found";
                        SIZES=$SIZES"0,";
                fi;
                # Check parent file                
                PARENT=`ParentFile "$FILE"`;
                if  [ `FileExists "$PARENT"` = "true" ]; then
                        # Check CID chain between this file and its parent
                        RESULT_CID_chain_check=`CID_chain_check "$FILE"`;
                        # Display file name and size in human format. If the CID is broken say it as well.
                        echo -e "$TAB$FILE  Size: $SIZE_H $RESULT_CID_chain_check" 
                        FILE=`ParentFile "$FILE"`;
                        TAB=$TAB"  ";
                else
                        echo -e "$TAB$FILE  Size: $SIZE_H -- Warning: Parent file (\"$PARENT\") not found. Unable to continue checking the chain of snapshots. Exiting.";
                        return -1;
                fi;
        done;
        FLAT=`DeltaOrFlat "$FILE"`;
        SIZE_H=`SizeH "$FLAT"`;
        SIZES=$SIZES`Size "$FLAT"`"]";
        echo -e "$TAB Base Disk: $FILE  Size: $SIZE_H" 
        if [ "$HAS_SNAPSHOTS" = "1" ] ; then echo 'Space needed on this Datastore to delete all the snapshots of this disk is: ' `SizeNeeded "$SIZES"`; fi; }

#------- SnapVMX ---------------------------------------------------------
function SnapVMX () {
# Usage: SnapVMX  FILE.vmx
# Extract the list of disks attached to the VM and pass them to the SnapTree function.
        if [ `FileExists "$1"` = "false" ]; then echo "VM Configuration file \"$1\" not found. Exiting." ; return -1; fi;
        # Get list of true SCSI
        SCSI_TRUE_LIST=`egrep -i '^scsi[0-9]+:[0-9]+.present = "true"' "$1" | awk -F'.' '{ print  $1   }' `;
        # Go through the list of disks on the VM
        for SCSI in $SCSI_TRUE_LIST ; do
                # Get Disk name
                VMDK_VMX=`grep -i "^$SCSI.fileName" "$1" |  awk -F'"' '{ print  $2   }'` ;
                SnapTree "$VMDK_VMX";
                echo "----------------";
         done;}
#-------------------------------------------------------------------------