Making sense of VDI SAN performance

by Chris Midgley on Wed, Mar 2, 2011 at 1:09 PM 0 comments, 3180 views

VDI storage is a complex beast.  It's hard enough to figure out how much capacity you need, but figuring out performance is a whole different story.  It starts by determining performance requirements, measured in IOPS (learn more about IOPS and VDI performance).  Once you have a baseline, you then embark on a staggering complex and obfuscated world of storage vendors and storage architectures.

For example, let's say you've determined you need to deliver a peak load of 6,000 IOPS.  How are you best to deliver that I/O, and in the most cost effective and manageable way?  Most storage vendors won't quote an IOPS number, and for a very good reason.  The number of IOPS an array can deliver varies (and not by a small amount) based on your use case.  The VDI use case is a bit unusual - it has a very high write to read ratio (typically between 50%-70% writes), with a large amount of small block writes (which are more "expensive" than large block writes).

A typical SAN storage array can deliver a massive amount of IOPS, well in excess of 6,000 IOPS on a single tray of storage (and some will claim 10 times that) ... when the I/O patterns are well optimized for the array.  For example, the array can write an amazing amount of data if the write block size is well aligned with the RAID stripe size.  And when reading from cache, the performance can be simply amazing.  That's one of the great things about a SAN - it can leverage the large cache that sits in front of storage to significantly improve the performance of the native disk drives.  But there are some data access patterns (such as high volume non-sequential small block writes) that can degrade the performance of the SAN significantly - in a worst case scenario, performance can rapidly degrade to nearly that of the native disk drives themselves.

To try to make sense of storage, let's run a little bit of basic I/O math.  First, let's look at some disk performance numbers (source: wikipedia.org):

  • Desktop/laptop disk: ~90 IOPS (7,200 RPM)
  • Basic SAN storage: ~130 IOPS (10K RPM)
  • High performance SAN storage: ~180 IOPS (15K RPM)

As you can see, a high performance SAN array can outperform a desktop disk 2:1.  When you then add multiple drives (via RAID), the read performance can get much better.  For example, a 16 drive array (14 drives after RAID 5 w/spare or RAID 6 w/o spare) can deliver read traffic potentially at up to 14 times faster than a single disk.  With the disk itself being 2 times faster, that's 28 times!  That's a big performance boost, and before taking cache into account.

But writes are a different story.  RAID 5 has substantial performance impact on writes ... up to 10 times slower than RAID 0 (source: wikipedia.org).  And RAID 6 can be even worse. But once again, the cache kicks in to save the day.  The writes are stored in the RAM cache, and then as the drives have time, the write is committed.  The result is a perceived high performance write.  To make things even worse, small block writes (smaller than the RAID stripe size) force the SAN to do a "pre-read" - where it will first read the stripe into cache, then update it with the small block change, and write it back again. 

As long as the SAN is not overloaded, the cache can really soften the blow.  But what happens when you start of push the limits of the performance?  The cache becomes saturated, writes get stalled until the cache can be flushed, read data has a short life in cache ... and performance tanks.

So how does VDI impact a SAN?  Remember that VDI can have a 50%-70% write I/O pattern, with mostly small blocks.  That makes the SAN work really hard.  But when you have an I/O storm, such as the 9 AM login window, the cache gets pushed to its limits.  If the SAN array is not sized to handle this load, the performance will rapidly go from good to bad.  The latency (time to deliver a disk block) will increase, making every desktop  feel sluggish (or worse).

The trick therefore is to correctly size your SAN.  It's important to understand your IOPS needs, per community of users, and then to size your RAID sets to be supportive of the I/O load during a reasonable storm. 

Which leads to the question ... how much I/O can a SAN deliver?  As mentioned earlier, it depends.  The VDI use case is pretty harsh on a SAN, so you should generally assume less IOPS than any vendor data sheet or performance test claims.  I've read vendor case studies that claim massive loads on a single RAID set, ten times what we have seen viable in production, because their assumptions are based on IOPS rates per desktop that are below what an idle desktop generates!  So don't base your assumptions on a paper that quotes number of VDI users per array.  Base it instead on your IOPS needs, and work with the vendors to optimize your storage requirements - making sure to take into account the I/O patterns of VDI (high write, small block).

To help get a ball-park estimate on the I/O capabilities of a basic SAN array, here is some very simplistic math (that doesn't take into account anything unique or special about each vendor's solutions).  Take the number of drives after raid and spares (so for a 16 drive array with RAID 5 and one spare, it is 14 drives) and multiply that by the approximate IOPS per device (for example, a SAS 15K RPM is about 200 IOPS, so 14 devices * 200 IOPS = 2,800 IOPS).  Now apply a cache performance improvement factor (for read and write) - which we find varies based on load to somewhere between 20% and 70% (the more overloaded the array, the worse the cache factor).  So using 50%, it would be 2,800 * 150% = 4,200 IOPS.   

But don't push that limit too hard!  Design your infrastructure with at least 25% headroom (sized based on a common high I/O load event like a login storm) to avoid the SAN cache cliff (about 3,150 IOPS in this case).  So for our example of 6,000 IOPS, two 16 drive RAID sets would likely be a good fit. 

Now this does not take into account the next generation of storage solutions based on SSD (solid state devices, or memory-based storage, rather than rotating disks with flying heads).  SSD has amazing IOPS - 10,000+ is not uncommon per device, and they don't have the cost of waiting for the disk to rotate into position (much lower latency).  However, pure SSD is very expensive today, making it less attractive for VDI.  Filling the gap are a new breed of storage arrays, sometimes called hybrid's, that blend the high performance of SSD with the low-cost per TB of rotating disk.  Hybrid arrays offer a great price-performance ratio, but not all I/O patterns work well with them (including not all VDI approaches).  I'll be writing some more on hybrid shortly, but the short version is that hybrid is smaller (and therefore requires some data reduction technique to be viable for VDI), and is very sensitive to the data write patterns (if data is copied or duplicated a lot, the SSD will be unable to optimize the blocks, resulting in poor performance).  Also check out Ron's article on hybrid storage arrays.

In summary, the key to success with VDI storage is to make sure you understand your IOPS needs - especially during a major I/O event like the login storm, and then ensure your array is able to support that load without falling over the SAN cache cliff.  Consider optimizing your storage architecture for these loads, such as using more RAID sets with smaller disk drives, rather than larger drives and fewer sets.  And make sure you look at hybrid and SSD solutions - I'm convinced they are going to be a big part of the VDI future.

Tagged cache, iops, san, sas, sata

Post new comment

The content of this field is kept private and will not be shown publicly.