VMware: "To Quiesce or not to Quiesce?"

Back Up Your Virtual Machine the Right Way

Many articles have been written about making virtual machine snapshots, but most of them are too conceptual and high level. This post discusses the practical side of making virtual machine (VM) snapshots on a VMware vSphere® platform for use as backups.

We will answer the questions:

  • What is a quiesced snapshot?
  • Why use it?
  • What problems can be encountered when you quiesce?

Using Snapshots for Backup

In VMware vSphere environment, you can create a snapshot in two ways: a snapshot

  • That includes the VM memory state
  • With guest file system quiescing

When backing up a virtual machine using VMware vStorage API for Data Protection, never use the first option that includes the VM memory state. If the virtual machine has 8-16 GB of RAM or more, it takes a long time to create an incremental backup because it is too large (each incremental backup also includes the RAM size). In addition, you can encounter other technical complications.

The alternative is quiescing. It is a more viable option because it involves preparing the guest operating system (primarily the file system) for a backup. 

What is Quiescing?

According to a VMware® Knowledge Base article, “Quiescing a file system is a process of bringing the on-disk data of a physical or virtual computer into a state suitable for backups. This process might include such operations as flushing dirty buffers from the operating system's in-memory cache to disk, or other higher-level, application-specific tasks.”

Unfortunately, this description does not clarify what happens with the virtual machine during the process. This is what we want to explore.

First, when using the VMware Snapshot Provider service in VMware Tools, you start the process of creating a new Volume Snapshot Service (VSS) snapshot inside the guest operating system (OS). All the registered VSS writers — which you can view using the "vssadmin list writers" command — receive the request and prepare applications for backup by writing transactions from the memory to disk. When the VSS writers finish this operation, they report to the VMware Tools Service — via the VMware Snapshot Provider — that the job is complete and the system is ready for a snapshot.

When preparing for a VMware snapshot, the backup software for VMware vSphere uses the following settings:

Quiesced = ON, Memory = OFF
Quiesced = OFF, Memory = OFF

Note that VMware completely controls the process of creating the snapshot itself.
We’re going to review the first option when the Quiesced set to ON.

Why do we Need Quiescing?

There are many reasons for setting Quiesced to ON. For example, you will avoid update sequence number (USN) rollback issue when restoring an Active Directory® — after domain controller is recovered from VSS-enabled backup, InvocationID is correctly reset and you have a healthy entry in the Event Log:

Event ID 1109: Active Directory has been restored from backup media, or has been configured to host an application partition. The invocationID attribute for this domain controller has been changed.

You will also avoid issues with the recovery of SQL Server® or other applications.

Acronis backup software, such as Acronis Backup 12, performs these backup operations correctly for all types of operating systems, servers, and applications running inside a virtual machine.

How to Ensure the Snapshot was Correctly Created Using VSS?

There are several ways to determine whether a snapshot is created correctly. You can check this all the way to the application level.

First, check the Event Viewer. When a snapshot is created with the quiesced=ON, snapshot memory=OFF options (see the screen shot at the beginning of this post), the application logs display the following event from the VSS writers:

Notes: The VSS error with the Event ID 12289 that we can see on the screenshot is not a problem. It is related to the 3.5'' floppy disk. To eliminate the problem, simply remove the floppy drive from the VM configuration:

An alternative method for determining whether the snapshot is created correctly is to use the Datastore Browser component in the vSphere client. After the quiesced snapshot is created, you should see a ***vss_manifests*.zip file in the VM folder in the data store.

Inside the archive, there is a backup.xml file containing descriptions of all the VSS writers found on the guest system, plus the metadata on every writer in writerX.xml.

It is important to note that if the vss_manifests.zip file only contains a backup.xml file, it typically means that the snapshot was created using VSS. However, it is a problem snapshot. Failed snapshots are easy to detect, but it is important to recognize when you have a snapshot that VMware reports to be successful, but it actually is not successful.

In the following sections, we will discuss what causes a snapshot to fail.

Environment Requirements

It is clear that using the quiescing option is beneficial but in practice, you can often have problems related to the incorrect configuration of the initial environment. The official configuration requirements are found here.

Let’s discuss what to look for to determine if you have these problems.

First, be sure that your system supports application-consistent snapshots.

Second, for quiescing to work, you must install VSS components in VMware Tools and update it to the latest version.

vSphere versions 3.5 and earlier used Legato Sync Driver for quiescing. It guaranteed consistency at the file system level but not at the application level, which is exactly why we need VSS components. Legato has been replaced with VMware Snapshot Provider. You can verify that it’s installed by looking for VMware Snapshot Provider Service and the relevant COM+ components inside the virtual machine.

What Problems can you face at this stage?

If the VMware Snapshot Provider service is switched off or is not installed, VMware, will still report success when taking a snapshot with the quiescing = ON, memory = OFF option. However, the snapshot will be taken without VSS and use the Legato Sync driver instead.

The behavior is different in Windows® 2008 and later versions — there will be no event in the logs. Instead, the VSS starts and then stops again.

Third, one of the typical problems of quiescing setup is the parameter disk.EnableUUID=true in the .vmx parameters for the virtual machine.

Setting up this parameter only makes sense for guest systems based on Windows 2008 and later (the option is ignored in Windows 2003). This parameter only exists in vSphere 4.1 or later. In other words, if you migrate an older virtual machine to a newer machine, you may not have this setting.

When this parameter is missing or set to false, the snapshot will be successfully created but without VSS. This may result in an inconsistent backup. If the backup.xml file is empty (it normally contains a record of VSS activities) in vss_manifests.zip file, this is an indication that this parameter is switched off.

Fourth, make sure there are no dynamic disks inside the virtual machine. VSS will not run with dynamic disks — whether it is a system drive or a storage drive. Snapshot will be created, but vss_manifests.zip file will be empty, just like the event logs inside the guest OS. This happens with Windows 2008 and later versions.

The same applies to IDE drives (except for IDE CD-ROM, which will not affect snapshots). Make sure the number of available SCSI slots on one SCSI controller is the same as the number of drives. For example, if there are already 8 SCSI disk drives on SCSI1, you will not have enough slots.

Fifth: A broken VSS inside a guest machine is the reason for many users to complain to VMware support. These users assume that snapshot failure was caused by VMware, when the problem were on the level of the guest operating system. Here is a screenshot of what happens when you attempt to create a quiesced snapshot after unsuccessfully installing a new SQL database. The virtual .iso drive was unmounted during the installation, and the installer did not like that.

This particular problem is resolved by simply restarting the virtual machine.

While restarting may help in other cases too, sometimes VSS is damaged beyond repair and reboot won’t help. To check the VSS, run Windows Backup and try to back up System State. If it fails, the problem is with VSS. If it works, the problem is on the hypervisor’s side.

VMware has published several articles on this subject in their Knowledge Base, including Troubleshooting Volume Shadow Copy (VSS) Quiesce-Related Issues and Failed to Quiesce Snapshot of the Windows 2008 R2 Virtual Machine. Indeed, one of these articles suggest setting disk.EnableUUID to false. This effectively rejects the use of VSS when making a quiesced snapshot. While this is not an ideal solution, you can use it as a temporary measure. However, be careful as it may cause problems when restoring systems that require application consistency, such as USN rollback.

Let's Summarize

Issues #2, #3, and #5 cause most of the problems related to snapshot consistency. In addition, there are instances when a snapshot is not created at all. Regardless of the challenges with snapshots, there is one important thing to remember — it’s not enough to back up, you need to test that you can recover. We have several blog posts that explain the importance of and best practices of backing up your business.

Check your servers today. Make sure your virtual machines are backed up. Review your disaster recovery plan. If you need help, give us a call or try Acronis Backup 12. It’s the world’s fastest, most complete and easiest-to-use backup solution on the market today. With Acronis Backup 12, “To quiesce or not to quiesce?” will no longer be a question.

(This article was originally published in Russian at https://habrahabr.ru/company/acronis/blog/207472/ )

READ MORE: