ESXi Backups with ZFS and XSIBackup
ZFS Setup
I had two zpools in my setup:
root@zfs:~#zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
data 928G 238G 690G - 25% 1.00x ONLINE -
other 928G 248G 680G - 26% 1.00x ONLINE -
The data pool had a ComStar ZVol presented to an ESXi host and another ZFS Volume used for ISOs shared over NFS:
root@zfs:~#zfs list -r data
NAME USED AVAIL REFER MOUNTPOINT
data 551G 363G 184K /data
data/isos 37.7G 363G 37.7G /data/isos
data/vm 503G 665G 200G -
While the other zpool wasn’t used for anything but was the same size. I decided to use the other zpool for backups.
ESXi VM Configuration
I only had 4 VMs running on the host, 3 of the VMs were running on local storage. The other VM was using the ComStar LUN to store a bunch of data. So I decided to use ZFS capabilities to do a LUN level backup for the ComStar ZVol.
Send/Receive a ZFS Volume
I found a couple of sites that had instructions on how to copy ZFS Volumes across Zpools:
- Migrating Data With ZFS Send and Receive
- zfs incremental recursive replication to usb drive
- ZFS snapshots and clones
The gist of the commands would be like this:
-
Take Snapshot of Source ZFS Volume
zfs snapshot data/vm@1
-
Send the snapshot over to the other pool and receive it
zfs send -R data/vm@1 | zfs receive -Fu other/vm-backup
-
If you want to do incremental backups, take another snapshot
zfs snapshot data/vm@2
-
Send an Incremental Snapshot to the Backup Zpool
zfs send -Ri data/vm@1 data/vm@2 | zfs receive -Fu other/vm-backup
The -R option from the zfs send together with the -F option from zfs receive clean up snapshots on the destination side. The source and destination then look the same, instead of the Destination Volume continuing to aggregate ZFS volume snapshots. To automate the process I wrote a little bash script:
root@zfs:~#cat vm-backup
#!/bin/bash
ZFS=/usr/sbin/zfs
ZFS_VOL_SRC=data/vm
ZFS_VOL_DST=other/vm-backup
CUR_SNAP=$($ZFS list -r -H -o name -t snapshot $ZFS_VOL_SRC)
DATE=$(/usr/gnu/bin/date)
SNAP_DATE=$($DATE +%m-%d)
NEW_SNAP="$ZFS_VOL_SRC@$SNAP_DATE"
function take_new_snapshot(){
$ZFS snapshot $NEW_SNAP
}
take_new_snapshot
if [ $? -ne 0 ]; then
echo "snapshot creation failed with status $?"
exit 1
fi
function zfs_send_inc(){
$ZFS send -Ri $CUR_SNAP $NEW_SNAP | $ZFS receive -Fu $ZFS_VOL_DST
}
zfs_send_inc
if [ $? -ne 0 ]; then
echo "snapshot send failed with status $?"
exit 1
fi
function zfs_cleanup(){
$ZFS destroy $CUR_SNAP
}
zfs_cleanup
if [ $? -ne 0 ]; then
echo "snapshot deletion failed with status $?"
exit 1
fi
echo "ZFS Backup Successful on $DATE"
The script will perform incremental backups. It’s under the assumption that you already sent the first backup and that there is only one snapshot on the Source ZFS Volume (which will be deleted after the copy is done). Adding that to crontab will automate the back up process for the LUN presented to the ESXi host.
ZFS refreservation Property
My ZFS Volume was 300GB in size and was using 200GB of space, after I took a snapshot of that Volume I ended up taking 500GB of space:
root@zfs:~#zfs list -r -t all -o space data/vm
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
data/vm 665G 503G 176K 200G 302G 0
data/vm@02-03 - 176K - - - -
This is because the refreservation flag is set on the volume (and this is preferrable when using ZFS together with iSCSI):
root@zfs:~#zfs get refreservation data/vm
NAME PROPERTY VALUE SOURCE
data/vm refreservation 302G local
There is a great description at From Setting ZFS Quotas and Reservations:
If refreservation is set, a snapshot is only allowed if sufficient unreserved pool space exists outside of this reservation to accommodate the current number of referenced bytes in the dataset.
Basically it absolutely ensures you don’t over provision space and it requires a total of Reserved + Used Space when a snapshot is taken. So in my scenario it made sense, I had a 300GB LUN which I was presenting over iSCSI and I was using 200GB of that LUN (Referenced Bytes or Used Datasets). In total that ends up to 500GB.
On the Source Zpool I didn’t mind, but on the backup Zpool, I only wanted to reserve the used space. So I went ahead and unset that property on the Destination ZFS volume:
root@zfs:~# zfs set refreservation=none other/vm
and then the reserved space was released:
root@zfs:~#zfs list -r -t all -o space other/vm-backup
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
other/vm-backup 499G 200G 136K 200G 0 0
other/vm-backup@02-02 - 136K - - - -
other/vm-backup@02-03 - 0 - - - -
VM Backups with XSIBackup
There is a pretty cool script called xsibackup. It supports ESXi 5.1 and above, luckily I was on 5.1. The script basically takes a snapshot of VM(s) and then uses vmkfstool to clone the vmdk(s) to a destination VMFS volume.
Presenting a ComSTAR ZFS Volume to ESXi
If you want you can use here). This time around I decided to go command line. Here is the process I followed:
-
Create a ZFS Volume
zfs create -V 200G other/backups
-
Associate volume with ComSTAR
sbdadm create-lu /dev/zvol/rdsk/other/backups
-
Create Target group, if it doesn’t exist (I already had one from the napp-it config)
stmfadm create-tg tg1
-
Add Member (Target System) to Target group (I didn’t have to do this either, since I was using an existing Target Group). To get a list of targets on the system run the following:
stmfadm list-target
stmfadm add-tg-member -g tg1 iqn.2010-09.org.na
-
Create a View for LUN to be accessible from the Target Group
stmfadm add-view -t tg1 600144F0876A
-
If you don’t care about Target Groups, you can just add a view and allow everyone to access the LUN
stmfadm add-view 600144F0876A
-
Enable Write Back Cache on LUN
stmfadm modify-lu -p wcd=false 600144F0876A
-
Do a rescan on the ESXi side to see the new LUN:
esxcli storage core adapter rescan -A vmhba32
After the LUN is seen go ahead and put a VMFS volume on the iSCSI LUN (I called it backups).
Install XSIBackup
The install process is described on the main download page. Here is a quick summary:
-
Copy the Script to a local datastore
scp xsibackup esx:/vmfs/volume/datastore1
-
Set the script as executable
~ # chmod 700 /vmfs/volume/datastore1/xsibackup
-
Run a test backup
~ # /vmfs/volumes/datastore1/xsibackup --backup-point=/vmfs/volumes/backups --backup-type=custom --backup-vms=VM2,VM5,VM7 --mail-from=admin@mail.com --mail-to=elatov@mail.com --smtp-srv=mail.domain.com --smtp-port=25 --smtp-auth=none
-
The output will look something like this:
Found --backup-point at /vmfs/volumes/backups Getting list of all VMs... VM: Vmid Name File Guest OS Version Annotation VM: 1 VM1 [VMs] VM1/VM1.vmx rhel6_64Guest vmx-08 VM: 2 VM2 [VMs] VM2/VM2.vmx freebsd64Guest vmx-08 VM: 3 VM3 [VMs] VM3/VM3.vmx rhel6Guest vmx-08 VM: 4 VM4 [VMs] VM4/VM4.vmx rhel5Guest vmx-08 VM: 5 VM5 [VMs] VM5/VM5.vmx debian6_64Guest vmx-08 VM: 6 VM6 [VMs] VM6/VM6.vmx windows7Server64Guest vmx-08 VM: 7 VM7 [VMs] VM7/VM7.vmx rhel6_64Guest vmx-08 VMs to backup: 2 VM2 [VMs] VM2/VM2.vmx freebsd64Guest vmx-08 17507 /vmfs/volumes/529a6c18-ae956476-0872-0030489f1401/VM2 5 VM5 [VMs] VM5/VM5.vmx debian6_64Guest vmx-08 17507 /vmfs/volumes/529a6c18-ae956476-0872-0030489f1401/VM5 6 VM7 [VMs] VM7/VM7.vmx windows7Server64Guest vmx-08 45174 /vmfs/volumes/529a6c18-ae956476-0872-0030489f1401/VM7 Needed room: 78 Gb. Available room: 190 Gb. Hot backup selected for VM: VM2 will not be switched off Hot backup selected for VM: VM5 will not be switched off Hot backup selected for VM: VM7 will not be switched off 220 mail.domain.com ESMTP Exim 4.80 Sun, 02 Feb 2014 14:13:13 -0700 250 mail.domain.com Hello esx.domain.com [192.168.1.111] 250-mail.domain.com Hello esx.domain.com [192.168.1.111] 250-SIZE 52428800 250-8BITMIME 250-PIPELINING 250 HELP 250 OK 250 Accepted 354 Enter message, ending with "." on a line by itself 250 OK id=1WA4Lp-0001I4-6K 221 mail.domain.com closing connection
-
Add XSIBackup script to crontab. First stop the crontab process
/bin/kill $(cat /var/run/crond.pid)
-
Add the following to the /var/spool/cron/crontabs/root file
30 5 9,18 * * /vmfs/volumes/datastore1/xsibackup --backup-point=/vmfs/volumes/backups --backup-type=custom --backup-vms=VM2,VM5,VM7 --mail-from=admin@mail.com --mail-to=elatov@mail.com --smtp-srv=mail.domain.com --smtp-port=25 --smtp-auth=none > /vmfs/volumes/datastore1/backup.log 2>&1
-
Restart the crond process
/usr/lib/vmware/busybox/bin/busybox crond
-
The above changes are not permanent, and will be removed upon a reboot. To apply the changes on boot, add the following to the /etc/rc.local.d/local.sh file:
~ # cat /etc/rc.local.d/local.sh #!/bin/sh # local configuration options # Note: modify at your own risk! If you do/use anything in this # script that is not part of a stable API (relying on files to be in # specific places, specific tools, specific output, etc) there is a # possibility you will end up with a broken system after patching or # upgrading. Changes are not supported unless under direction of # VMware support. /bin/kill $(cat /var/run/crond.pid) /bin/echo "30 5 9,18 * * /vmfs/volumes/datastore1/xsibackup --backup-point=/vmfs/volumes/backups --backup-type=custom --backup-vms=VM2,VM5,VM7 --mail-from=admin@mail.com --mail-to=elatov@mail.com --smtp-srv=mail.domain.com --smtp-port=25 --smtp-auth=none > /vmfs/volumes/datastore1/backup.log 2>&1" >> /var/spool/cron/crontabs/root /usr/lib/vmware/busybox/bin/busybox crond exit 0
SMTP Synchronization Error
At first my emails would fail with the following message:
exim SMTP protocol synchronization error (input sent without waiting for greeting)
I was using Exim on my Linux box, the fix for that issue is described here. I just needed to add the following option to my exim configuration:
smtp_enforce_sync = false
After that was in place, I received the following email:
The speeds were really good, I also checked out esxtop and latency was pretty good as well: