When all is said and done, do IOPS really matter?

by Chris Midgley on Wed, Apr 13, 2011 at 9:13 AM 2 comments, 3545 views

For some time now, IOPS has been all the rage with VDI, and for good reason.  Understanding how many IOPS your desktops require is a critical component in understanding how to size your VDI infrastructure.  And when using a traditional rotating disk based storage system, it is easy to see why IOPS matter.  At some number of IOPS - typically around 2,000-4,000 - the storage array will begin to bog down.  Queue depth will increase as the array is waiting on the disks to respond to prior requests.  This is easily seen when monitoring latency on the array - response times will go from an acceptable 5-20ms, and then skyrocket upwards as load increases and users complain!

But the problem with IOPS is that it is a measure of how much I/O the desktop workload is creating, and not how well the storage subsystem is handling it.  Does it really matter as long as somewhere in the storage stack, the IOPS are handled quickly and efficiently, so that users are not waiting for their desktops to respond?

For example, imagine having a screaming fast, pure solid state (SSD) SAN.  These puppies can handle staggeringly fast I/O loads without skipping a beat.  Does it really matter if IOPS are sky high if the SAN is able to deliver the blocks blazingly fast?  Sure - higher IOPS could mean higher infrastructure load, and therefore more CPU, more fabric, more everything.  But at the end the day, as long as the I/O is being handled in a reliable and cost efficient manner with low latency, who cares? 

But there's the rub ... "cost efficient".  SSD is anything but cost efficient for VDI.  The devices are expensive, and the capacity is small.  At least today - at current trends, that story will likely change in the not too distant future.

So until then, why not blend SSD and HDD?  These solutions are available today from multiple vendors.  A "hybrid array" is able to load balance the disk blocks between the smaller capacity, but super fast SSD and the slower, but much higher capacity HDD (SAS/SATA disks).  They watch the block activity, and try to determine which data is deserving of the SSD (because it is used a lot), versus which data is rarely accessed and therefore should be over in HDD.  This means you can get amazing performance and capacity given the right data pattern ... but you can also suffer on performance with the wrong one.  The key is a data pattern that can be easily optimized by the array - keeping a small number of hot blocks in SSD and the majority of the warm/cold blocks over in HDD.

We ran a series of load tests to see how a hybrid array responds to different I/O patterns, measuring latency to understand overall performance.  For the tests, we used the exact same hosts, networking and storage.  For storage, we used a Dell EqualLogic PS6000XVS hybrid array.  LoginVSI Pro 3 was used for load generation, using a Medium workload on three hosts running 225 virtual desktops (on our hosts, 75 desktops per host was the limit before ESX CPU ready exceeded 10%).  We ran a series of tests, comparing two main configurations - regular persistent desktops (thin full clones), and Unidesk persistent desktops.  

The IOPS in these tests where nearly exactly the same (a tad under 2,000 IOPS) - which makes sense as we were using the same load generation tool, so we would expect about the same IOPS.  For the regular persistent desktops, the latency became unacceptable (20-40ms) as we reached around 200-225 desktops.  But for Unidesk desktops, with the same load and desktop count, the array was screaming fast (at only 2-3ms latency).

The reason Unidesk outperformed regular desktops on the hybrid array is the data pattern.  Each operating system and application is in a layer that is stored once, and shared across many desktops.  When it is assigned to a user, the user directly accesses (reads) that layer - without copying or duplicating the data.  This allows the hybrid array to easily identify the hot blocks that should be retained in SSD, and push the rarely accessed desktop content into the HDD.

Sharing data, and avoiding duplicating blocks is the "magic" of making hybrid arrays perform.  Each time data is duplicated, the array is challenged to determine if one block is more valuable than another.  Instead of a block being shared across hundreds of desktops, it is duplicated hundreds of times in the array - each one being just as equal as the other and no ability to identify anything as being especially "hot".

This is why the regular persistent desktops test failed to see significant performance benefits with the hybrid array.  Each desktop is a clone - a duplicate of the entire image.  It's a worst case scenario for a hybrid array.  No block is shared, making it nearly impossible to identify hot blocks and achieve scale across a large number of desktops.

I'd expect non-persistent desktops (those using shared images, such as Linked Clones) will also greatly benefit from hybrid storage arrays.  We have yet to run detailed non-persistent tests on hybrid arrays to understand the I/O load of this configuration, but it logically makes sense because shared images are the key to making hybrid successful.  But do be careful about the other software that is typically used in the non-persistent VDI stack, as these may cause increased data duplication, making it challenging for the hybrid.  For example, application virtualization / streaming applications may duplicate application data blocks to each desktop, and profile management may copy profile data blocks for each user.  These have the potential to flood the limited SSD space on the hybrid, causing latency to degrade.

So while understanding IOPS is an important part of a good VDI design, the key to understanding storage performance is understanding latency ... especially when using hybrid arrays.  When given the correct data pattern, hybrid arrays have the potential to substantially reduce your cost per desktop (and reduce the number of arrays, management points and rack space).  But you should test your desktops, including all software and tool sets, to ensure the resulting data pattern delivers acceptable latency under load.

Comments

Posted on June 17, 2011
Chris Midgley
Unidesk employee
Joined: December 15, 2007
Points: 1055

Chris - excellent! Let us know how it goes. Let is know if we can be of any assistance - talk through your deployment goals, help with the design, etc.

Posted on June 14, 2011
Chris Mertens
Unverified user

Good information....we have purchased the Dell unit you reference for our VDI project.

Post new comment

The content of this field is kept private and will not be shown publicly.