SSD Performance Degradation and SCSI UNMAP Command
So I had two sets of SSDs in my VMware setup, one on the local mac mini:
[root@macm:~] esxcli storage core device list -d t10.ATA_APPLE_SSD_SM256E t10.ATA_APPLE_SSD_SM256E Display Name: Local ATA Disk (t10.ATA_APPLE_SSD_SM256E) Has Settable Display Name: true Size: 239372 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/t10.ATA_APPLE_SSD_SM256E Vendor: ATA Model: APPLE SSD SM256E Revision: 2A0Q SCSI Level: 5 Is Pseudo: false Status: on Is RDM Capable: false Is Local: true Is Removable: false Is SSD: true Is VVOL PE: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: yes Attached Filters: VAAI Status: unknown Other UIDs: vml.0100000000533141414e594e463330323932342020202020204150504c4520 Is Shared Clusterwide: false Is Local SAS Device: false Is SAS: false Is USB: false Is Boot USB Device: false Is Boot Device: true Device Max Queue Depth: 31 No of outstanding IOs with competing worlds: 32 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: false
And a couple on my OmniOS ZFS storage:
<> sg_vpd -p ai /dev/rdsk/c2t4d0 ATA information VPD page: SAT Vendor identification: ATA SAT Product identification: Samsung SSD 840 SAT Product revision level: BB6Q Device signature indicates PATA transport ATA command IDENTIFY DEVICE response summary: model: Samsung SSD 840 EVO 250GB serial number: ftrt firmware revision: EXT0BB6Q
SSD Performance Degradation
Over a period of 4 years, the SSDs became slow and slower. I noticed that backups would take longer to complete over time. Reading over a couple of pages, it seems that SSD degradation is pretty normal. After some time as the drive fills up it will become slow, and actually doing an UNMAP on SSD drives can help out:
- FAQ: Using SSDs with ESXi (Updated)
- Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs
- The Myth of SSD Performance Degradation
Some notes, from the above sites:
Over time, our daily production traffic caused the SSDs to become fuller from their perspective. As a result, garbage collection was triggered more and more often until disk performance reached unacceptable levels. One natural way to solve the issue is to tell SSDs which data are deleted. Modern operating systems support an instruction called TRIM to allow file systems to pass the information to the underlying disks.
From another site:
As the number of known free Flash cells decreases the write performance of the SSD also decreases because it heavily depends on the number of cells that can be simultaneously written to.
To address this issue the ATA TRIM command was introduced many years ago. Modern Operating Systems use the TRIM command to inform the SSD controller when they delete a block so that it can add the associated Flash cell to its free list and knows that it can be overwritten.
And here is the last one:
It’s because of this relationship between write amplification and spare area that we’ve always recommended setting aside 10 - 20% of your SSD and not filling it up entirely. Most modern controllers will do just fine if you partition the drive and leave the last 10 - 20% untouched. With TRIM support even the partitioning step isn’t really necessary, but it does help from a data management standpoint.
SCSI UNMAP with VMware
On the local drive I confirmed that UNMAP is supported (Delete Status):
[root@macm:~] esxcli storage core device vaai status get -d t10.ATA_APPLE_SSD_SM256E t10.ATA_APPLE_SSD_SM256E VAAI Plugin Name: ATS Status: unsupported Clone Status: unsupported Zero Status: supported Delete Status: supported
Then I ran the following to send an UNMAP to the VMFS Datastore:
[root@macm:~] esxcli storage vmfs unmap -l datastore1
That took about 2-3 minutes to complete, and in the logs I saw the following:
[root@macm:~] tail -f /var/log/vmkernel 2017-02-19T03:57:25.503Z cpu2:72098 opID=62289008)vmw_ahci[0000001f]: scsiUnmapCommand:Unmap transfer 0x18 byte 2017-02-19T03:57:26.019Z cpu2:72098 opID=62289008)vmw_ahci[0000001f]: scsiUnmapCommand:Unmap transfer 0x18 byte 2017-02-19T03:57:26.537Z cpu5:72098 opID=62289008)vmw_ahci[0000001f]: scsiUnmapCommand:Unmap transfer 0x48 byte 2017-02-19T03:57:27.052Z cpu5:72098 opID=62289008)vmw_ahci[0000001f]: scsiUnmapCommand:Unmap transfer 0x18 byte
Then my performance on the local drive came back. I was able to write a file pretty quickly:
[root@macm:/vmfs/volumes/533e29ae-e243ce90-39a5-685b35c99610] time vmkfstools -c 15G -d eagerzeroedthick test.vmdk Creating disk 'test.vmdk' and zeroing it out... Create: 100% done. real 0m 34.59s user 0m 4.75s sys 0m 0.00s
Before running the UNMAP command, the above took 3 minutes.
SCSI UNMAP with ZFS/Solaris
Reading over OmniOS r151014, it looks like with OmniOS version r151014 it can handle UNMAP, which is great:
- zfs_free_max_blocks (can reduce for less free blocks per transaction)
- zvol_unmap_enabled (can set to 0 to ignore UNMAP requests which can be slow)
- metaslabs_per_vdev (an upper limit per vdev, currently 200, now tunable)
And I did confirm that the ESXi host can see the UNMAP Primitive (the Delete Status) for the volume/LUN presented over iSCSI with Comstar:
[root@macm:~] esxcli storage core device vaai status get -d naa.600144f070cc4400 naa.600144f070cc44000 VAAI Plugin Name: ATS Status: unsupported Clone Status: unsupported Zero Status: supported Delete Status: supported
However when I sent the UNMAP command to the LUN it ran really fast and it felt like it didn’t do anything. I then ran into this interesting page: ZFS and Intelligent Storage. Which talks about how UNMAP with ZFS doesn’t work very well:
By sending UNMAP/TRIM commands, ZFS can notify the array that a particular block of storage is no longer required, which on most arrays will trigger the array to re-thin-provision that block of storage, freeing the space it was using.
Unfortunately that’s where the good news ends. Solaris/ZFS added support for UNMAP in the Solaris 11.1 release, however their implementation was so horribly broken that they recommended disabling it in the release notes for the very same version! In a release soon after they disabled it by default, and despite it now being almost 4 years and 2 Solaris release later, they still do not recommend ever turning it on.
Re-formatting the LUN and starting all over again gave exactly the same results - a few minutes into the random workload the pool once again reported data corruption - confirming Oracle’s advice that turning on UNMAP for ZFS is not a good idea!
I then remembered a recommendation from napp-it: Tuning/ best use
Do not fill up SSDs as performance degrades. Use reservations or do a “secure erase” on SSDs that are not new, followed by overprovision SSDs with Host protected Areas,
Formating SSD Drives in Solaris
I removed the LUN, I removed the volume, and I deleted the pool. Then I formatted the disks:
<> format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c2t0d0 <ATA-ST3808110AS-H cyl 9726 alt 2 hd 255 sec 63> /pci@0,0/pci1043,8534@1f,2/disk@0,0 1. c2t1d0 <ATA-ST1000DM003-1CH162-CC47-931.51GB> /pci@0,0/pci1043,8534@1f,2/disk@1,0 2. c2t2d0 <ATA-ST1000DM003-1CH162-CC47-931.51GB> /pci@0,0/pci1043,8534@1f,2/disk@2,0 3. c2t3d0 <ATA-ST1000DM003-1ER162-CC45-931.51GB> /pci@0,0/pci1043,8534@1f,2/disk@3,0 4. c2t4d0 <Samsung-SSD 840 EVO 250GB-EXT0BB6Q-232.89GB> /pci@0,0/pci1043,8534@1f,2/disk@4,0 5. c2t5d0 <Samsung-SSD 840 EVO 250GB-EXT0BB6Q-232.89GB> /pci@0,0/pci1043,8534@1f,2/disk@5,0 Specify disk (enter its number): 4 selecting c2t4d0 [disk formatted] FORMAT MENU: [33/57] disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk fdisk - run the fdisk program repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels inquiry - show vendor, product and revision volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> analyze ANALYZE MENU: read - read only test (doesn't harm SunOS) refresh - read then write (doesn't harm data) test - pattern testing (doesn't harm data) write - write then read (corrupts data) compare - write, read, compare (corrupts data) purge - write, read, write (corrupts data) verify - write entire disk, then verify (corrupts data) print - display data buffer setup - set analysis parameters config - show analysis parameters !<cmd> - execute <cmd> , then return quit analyze> purge The purge command runs for a minimum of 4 passes plus a last pass if the first 4 passes were successful. Ready to purge (will corrupt data). This takes a long time, but is interruptible with CTRL-C. Continue? y pass 0 - pattern = 0xaaaaaaaa 488397042 64 44 pass 1 - pattern = 0x55555555 488397042 pass 2 - pattern = 0xaaaaaaaa 488397042 pass 3 - pattern = 0xaaaaaaaa 488397042 The last 4 passes were successful, running alpha pattern pass pass 4 - pattern = 0x40404040 488397042 Total of 0 defective blocks repaired.
Then I re-created the pool, volume, and LUN (more on the process check out the ZFS iSCSI Benchmark Tests on ESX post). Before re-adding it to the ESXi host I ran a quick bonnie++ and I got the following:
Before running the purge with the format utility, I got 60MB/s Write and 700MB/s for Read. I am glad to see my write performance back.
Automatic UNMAP with vSphere 6.5
Then I read that with vSphere 6.5 and VMFS-6, UNMAP is automatic. From What’s new in vSphere 6.5 Core Storage:
This is a feature I know that many of you have been waiting for. There is now automatic UNMAP with VMFS-6 and vSphere 6.5. This automated UNMAP crawler mechanism will reclaim what is termed “dead” or “stranded” space on VMFS-6 datastores. Blocks that have been freed will be reclaimed within 12 hours by the crawler.
But unfortunately you can’t do an upgrade: Migrating VMFS 5 datastore to VMFS 6 datastore. So I moved everything off the local datastore and formatted it with VMFS-6. I also readded the ZFS LUN, formatted it with VMFS-6 from the get-go, and I was able to confirm that it’s enabled:
[root@core:~] esxcli storage vmfs reclaim config get -l datastore1 Reclaim Granularity: 1048576 Bytes Reclaim Priority: low
BTW more information on that feature is here:
- Automatic space reclamation (UNMAP) is back in vSphere 6.5
- Using the esxcli storage vmfs unmap command to reclaim VMFS deleted blocks on thin-provisioned LUNs
If you run that on a VMFS-5 datastore you will see the following:
[root@macm:~] esxcli storage vmfs reclaim config get -l datastore1 Failed to retrieve unmap property for filesystem, VMkernel log may contain more details. Reason: VMFS with version 5 does not support unmap property
Now hopefully with that enabled, the performance won’t degrade as bad (but still 4 years on an SSD is great)