Using SSD in VDI, how are you handling the write IO?
3 comments, 2760 views
One of the more common discussions I have with customers is around the topic of VDI disk I/O and how to design to handle the I/O load from their desktops. The funny thing about this topic is that we’re not even a storage vendor or an I/O optimization company. I think it's just that the information is so confusing and customers are so new to the topic that they're looking for any port in a storm and someone they can trust that's not selling them a solution. What's interesting about these discussions is that some companies will fail to inform the customer about the impact of write I/O on the disk.
The thing to understand when it comes to I/O hitting the disk is that not all disk I/O are created equally. Write I/O impacts the disk disproportionally. Depending on the RAID configuration of your disk and the type of storage system and software involved, a single write I/O to a RAID 5 volume can wind up as 3 to 4 writes against the physical disk underneath the logical volume. This is lovingly known as the RAID penalty.
Now different RAID sets and RAID configurations result in different write operation penalties. When you throw this penalty on top of VDI's higher write to read ratio (VDI often has a much higher write:read ratio than servers) you have the potential to wind up with disk requirements that could be as high as 50/50 read/write ratios or worse. Add on top of this skewed ratio the idea that the average desktop can require more I/O than the standard Windows server and you have an environment where your typical rotating disks often will not keep up with the I/O requests.
Enter SSD stage right. The benefit of using SSD is that a single SSD can handle thousands of I/O operations where you may only see 150 to 200 I/O operations from a 15K rotating disk. In a straight change from all rotating disk to all SSD (such as a Whiptail SSD array), the number of I/O operations it can handle and how it handles write operations is not that important. All the reads and writes are supplied by the SSD disk. Instead, the real confusion or problem comes from a mixed disk environment that leverages SSD for some caching and rotating disk for most data storage.
As an example let's use a model that leverages local SSD storage in the servers and rotating disk in a centralized array. In this model, often the local SSD allows the customer to cache the shared gold image locally on that server and serve up all the reads from that gold image via high I/O, high speed, SSD disk. In this configuration, the delta disk where writes land for each VM and other types of data (user installed apps, user data disks, etc) is often still going across a fiber channel or ethernet network to rotating disk on the backend. In an environment where you may have just as many reads as writes, and that number may be measured in the thousands per server, your rotating disk may become the bottleneck for the hundreds or thousands of writes being thrown at it that are not being handled by the SSD. This is something often ignored or unmentioned by those that sell this type of caching model.
Another model available is what we call hybrid arrays. These arrays have a mixture of both rotating disk and SSD in the same chassis or behind the same controllers. Software within this environment moves hot data from slow rotating disk to the faster SSD disk. It also keeps track of data usage in the SSD and as data segments on the rotating disk surpasses I/O requests for that in the SSD, data gets swapped out to keep the hot data in SSD and cold data gets moved to slower disk. The problem here is that in order to move data into the SSD there has to be a history of I/O requests against the data. That data is focusing purely on the reads against specific disk segments and is still not handling the writes. So, the key question you need to ask of any vendor that has a hybrid array using SSD or EFD or some caching mechanism is:
“How are you handling the write I/O?”
The answer you want to hear is that they are carving out a portion of the SSD for the write operations. The idea being that as the writes come into the array or the chassis or controllers they are first written to the SSD so there's no backup/bottleneck from the VMs and then these writes are played out to the rotating disk as necessary. You would be surprised at the number of vendors that don't do this or require an “up sell” in software just to enable it.
As you can see, buying SSD can be a confusing topic, which is why we spend a lot of time speaking to customers about their different storage options. So when looking at SSD options and thinking about architectures that can leverage this high-speed SSD to handle the I/O, you have to think about not only the read I/O operations but also the write operations. Any caching mechanism that only handles your reads will probably wind up with a heavy bottleneck on writes that you will suffer from as you increase your scale on your VDI.
Ron’s Politically Incorrect VDI Blog

Too many organizations are out there trying to implement VDI and failing. Whether you like what Ron has to say or not, he is here to say what others won’t about VDI and help you get it right in your environment. Get Ron’s advice… raw…unfiltered… without the sugar coating.
Popular Blogs by Ron
-
[27,991 views]
-
[23,259 views]
-
[20,418 views]
-
[6,979 views]
-
[6,079 views]



Comments
Ron,
Very good analysis. We make hybrid arrays but have taken a different approach than what you suggest for handling random writes. Instead of using SSDs as a cache, we designed a data layout that serializes random writes, so we always write to disk in full stripes. This allows just a dozen 7.2K RPM spindles to support thousands of random writes, and allows us to use our SSDs even more efficiently to accelerate reads. We think VDI is a key reason why there will be a growing demand for write-optimized storage over time. Our CTO, Umesh Maheshwari, offers more details here:
http://www.nimblestorage.com/blog/the-growing-need-for-write-optimized-storage/
Very nice article Ron. At Starboard Storage Systems we have the exact architecture that you have discussed. In fact we go a step further in having dedicated SSDs for both read and write so you do not share those resources. This enables us to offer the data protection you need on writes while giving the efficient expansion without a RAID utilization penalty for reads. As you stated having dedicated SSD read and write-back cache resources is ideal for VDI. You can read more in my blog here http://blog.starboardstorage.com/blog/bid/225642/Using-SSD-for-VDI
A great blog post, however I would add that a storage system's ability to deliver a large number of IOPS while satisfying each I/O request with "minimal latency" is very often overlooked.
While hybrid storage systems that mix flash and spinning disks, may provide for 90% of data access at high IOPs and relatively low latency. Your application's apparent performance may be driven as much by the latency for the 10% of transactions whose data isn't entirely in flash as much as by the much lower latency for the 90% that are.
An interesting method to increase the IOPs and to reduce the latency is to look at PCIe NAND solutions within the VDI server. I.e. these can provdie for mixed workloads with 15-microsecond access latency, 3GB/s bandwidth, over 700,000 read IOPS and over 1,100,000 write IOPS ( http://www.fusionio.com/products/iodrive2/ )
In my mind Nutanix style solutions are well worth a look, as they provide VDI server side PCIe NAND performance, with array style management / redundancy ( www.nutanix.com )
Post new comment