KVM Live Migration with mixed local and remote storgae

Taming Live Migration

Live migration is an important technology that allows for the movement of a virtual machine from one host server to another with minimal impact to the end user. During the live migration, the virtual machine continues to operate and respond to commands. The underlying source and destination hypervisors manage which host it’s running on, and map the dirty pages from one host to another. This is a critical capability to have when running large scale clusters.

It is typically recommended the virtual machines that are going to be live migrated use a shared storage backend such as a NFS, iSCSI or Ceph. The reason for this is that the storage volumes are generally quite large and if they are remote you have less data to move from host to host.

QEMU and virsh have long supported also migrating disks if they’re local, but they generally recommend against it. The reason being storage data is generally quite large overall. If your VM has 2 CPUs, 16 GB memory and a 500 GB disk - it’s much easier to just move the 16 GB of memory than the 16 GB memory and 500 GB disk. Migration becomes trickier to manage when you have that much data - you would need large pipes to support the movement.

However, there are times when perhaps you need to support a mix of local and remote storage. And in doing so, you do not want to give up the operational benefits of live migration.

A classic example would be having a virtual machine that has a local storage to support a page file. Maybe 2 CPUs, 16 GB memory, 64 GB local NVMe storage, and 500 GB NFS boot volume. In this case that local NVMe storage could be used for a page file, maybe a local cache, or some other ephemeral use case. And the remote volume would be used for the actual persistent data.

When it comes to live migration, you still need to preserve that local NVMe storage across the wire. This is all possible, however the documentation for mixing and matching disks that are both local and remote can be a bit confusing at times. This will show you how to do it.

Understanding the Migration Command

A basic live migration command may look like the following:

virsh migrate --live myGuest qemu+ssh://192.168.1.2/system

I’m going to assume you’ve figured out all the other intricies of live migration. But this command makes a big assumption - that the source and host configurations are identical. Meaning, if you’re using NFS it’s configured identically. The NFS mount on the target better be on the same path as the source, same permissions, etc…

Often times though, the source and destination hosts are not configured identically, even with remote storage. For instance, if you’re using iSCSI the target system has to negotiate the connection with the iSCSI storage server ahead of the migration command. And the underlying disk that the target sees may have a different disk name.

The migration command can deal with the underlying disks changing their names. The command becomes a bit more complicated but it looks a bit like the following:

virsh migrate --live myGuest qemu+ssh://192.168.1.2/system --xml targetHost-myGuest.xml

The targetHost-myGuest.xml (can be named whatever you want) represents the configuration of the VM when it lands on the destination host. To get this, just run a virsh dumpxml myGuest > targetHost-myGuest.xml. This dumps the configuration, which you then need to edit to match the configuration you expect on your host.

Typically, you’re changing the disk source devices. So the original source may be the following:

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' />
      <source dev='/dev/sdb2'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

Then the target may be:

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' />
      <source dev='/dev/sdb5'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

Here you can see that we’ve tweaked just the source devices. The source is the device functionally backing the disk itself. We don’t change the target because we are not changing how we present the device to the guest.

Mixing and matching local and remote sources

Let’s say you’ve used a NFS boot disk and then a NVMe Namespace. This can be easily supported.

Let’s say your source disk devices from your virsh VM definition look like the following:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/mnt/nfs/test_img.img'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' />
      <source dev='/dev/nvme2n2'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

Let’s assume that on the target the NFS mount point is different, though the backing NFS itself is the same so the data does not need to move. Let’s also assume that we need to move the data from our NVMe namespace to the destination system. Obviously the namespace will likely have a different path.

The destination XML may have a disk section like the following:

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/mnt/nfs_second_mount/test_img.img'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' />
      <source dev='/dev/nvme1n5'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

You can see here that the <source> fields are what changed. Not the target, driver or destination.

Next up we need to issue the command to virsh to migrate the VSI. However, we need to explicitly tell it to move the data for the disk backed by the NVMe namespace but not the NFS disk.

The command to do so is:

virsh migrate --live myGuest qemu+ssh://192.168.1.2/system \
  --xml ./targetHost-myGuest.xml \
  --abort-on-error \
  --copy-storage-all \
  --migrate-disks vdb

Key things to note in here. The xml attribute is the updated XML file with the new disk source locations. The copy-storage-all attribute flags to move the storage to the destination system. By default it copies all of the disks. Lastly the migrate-disks attribute is what scopes the disks down to just the vdb device. You identify the disks to migrate based off the <target> attribute in the XML.

Going further

This solves the basic problem of live migration with a mix of local and remote storage. If you’re doing passthru devices, it doesn’t yet seem to have a way to support that. Which makes sense, the VM has native access to hardware.

That said, this method can be enhanced with other live migration capabilities such as QoS and tunnelling. Critical to all this working smoothly is a system on top of live migration that can identify the target, prepare it, and then orchestrate the migration pattern.