I often get involved in sizing Violin arrays for particular requirements.
We get requirements that are written in many different ways, some make sense and others don’t.
It probably does not help that a lot of arrays are marketed against levels of IOPS,
“1 Million IOPS” is maybe the most frustrating request I hear.
We have the three traditional metrics, IOPS, bandwidth and Latency.
They are all closely related. Bandwidth really is just a measure of how many IOPS and at what IO size. IOPS are limited by how many operations your application submits in parallel and how long it takes your storage to respond. Latency is a measure of how long it takes. FlashDBA has put his every thought on these metrics here.
So the most interesting number to look at is latency, but when to measure the latency?
If you consider the above graph, the two systems being measured have similar minimum latency and both scale to good rates of IOPS. However, the area in the middle will result in very different application performance between the two systems! It can result in a factor of ten in application performance.
As these differences between different systems are now better known, we are getting less requests for minimum latency numbers and maximum IOPS rates. We are increasingly being asked excellent questions like what latency will we see if I do this many iops?
To answer this question sensibly for flash arrays, unfortunately you have to ask more questions. You need to know;
- What size are these IOs on average? On flash a single 16k write takes as much backend work as four 4k writes.
- Are the IOs random or sequential? Random operations work better on some systems than others.
- How much capacity will these IOs be spread over? That is the active working set size, not just the total usable capacity required.
- What is the read / write mix of the IOs? Flash can perform very differently for a 90% write workload compared to a 10% write workload.
- How many hours a day does this rate of IOPS and bandwidth need to be sustained? Background operations can kick in after several hours of operation.
- What protocol will be used to connect the server (s) to the storage? They have different limits and different latencies.
- Is de-duplication planned to be switched on?
Arrays that have de-duplication turned on by default, also add complexity to the specification. If you are asking one of these vendors on IOPS levels at a certain latency level, you need to caveat that with:
How cryptographically unique is the working data set?
Why do you ask this? Because if a test harness constantly sends similar data content as read and write requests to an array, the data will be permanently embedded in DRAM cache, in fact no IO request will ever reach flash. This gives an artificially high IOPS and artificially low latency measurement, that will never be achieved with real data.
We are finding that customers ask a selection of vendors questions like, how many random 8k IOPS can I achieve at less than 0.5ms response time 50/50 read/write spread over 10TB of usable capacity? Often we find we are the only vendor able to provide accurate results to such requests.
So what is the best way to confirm a system is fit for your requirement?
Try and understand what your systems need. What performance are they currently driving and what are they capable of driving? At Violin we have some excellent tools for looking at SQL or Oracle stats and providing you a report about the benefit of using flash. Feel free to take advantage of them…
Be wary of IOMeter or FIO or VDbench type demonstrations, there are so many parameters, they can be adjusted to show nearly anything and are probably not relevant to you.
I would look for results of running your database or application with your dataset size or bigger for days or weeks on end. Some published results like VMMark or Oracle benchmarks are good for this.
Best still is to find a reference customer with a similar environment to yours. Ask that existing customer what happens to performance when a module breaks…….but that’s a subject for another blog entry.