As storage systems move away from the typical RAIDLUN model for storage assignment, IOPs for a specific application are tougher to nail down and can mask a misbehaving application.
With all the new storage systems when a LUN/Volume is created it is stripped across all elements that make up the storage tier. So essentially you have access to the full IO capability of the entire storage tier, not just a subset of disks that make up a RAID Group. Unfortunately providing a higher number of IOPS is useless unless they are delivered at predictable low latency.
This leads to the question – So what good are IOPS figures? And why does the storage industry talk about them all the time? Personally I think it’s a hang-up from the days of disk, when IOPS were such a limiting factor… and partly a marketing thing, because multi-million IOPs results sound impressive. I’m more concerned in what we should be asking about than what we should not, so what does matter?
Set aside IOPS as a factor for now. The whole point of a flash array is that IOPS effectively become an unlimited resource. Sure, there is always a real limit – but it’s so high that it’s no longer necessary to worry about it.
Latency is now the critical factor that should be focused on because this is what injects delay into your system. Latency means lost time; time that could have been spent busily producing results, but is instead spent waiting for I/O resources.
Business requirements tend to be along the lines of needing to supply trading reports faster, or reduce the time spent by call center operatives waiting for their CRM screens to refresh. These almost always translate back into latency requirements. After all, the key to solving any performance issue is always to follow the time and find out where it is being spent. Have you noticed that latency is the only one of our three fundamental characteristics which is expressed solely in units of time?
I was recently ran into this specific issue. A small change introduced a minor latency increase to a claims processing system and the result was that each claims representative went from process 40 claims a day to 38. That may not see like a lot but multiply that over 400 reps over a week that translated to 4000 less claims processed a week and since they get paid by the claims processed it was a significant hit to profit. When looking at the system form a strictly IOP perspective they were the same before and after the change. So yes IOPS do matter but don’t get distracted them… it’s all about latency.
by Mike Kelly