There are two hats we wear when creating an IT service offering. The first is the customer-facing marketing hat where customer needs are defined, and service offerings are advertised at various price levels in a catalog. The second hat is the backend product management hat where the front-facing service catalog is translated to various reference architectures at well defined cost levels.
The goal when running IT as a business (aka: ITaaS) is that the fully burdened costs associated with the reference architecture are recouped over a reasonable period of time by the pricing set in the service catalog.
A few rules to live by in this world view related to enterprise storage:
- The customer (business user) only has visibility to the Service Catalog. All communications with the customer are based on the language used in the Catalog.
- The customer (business user) never sees the Reference Architecture that supports each of the Services listed in the Catalog. Never have customer communications about arrays, drive mix, RAID type, methods, processes, etc.
- If the customer is unhappy with the service they are receiving, an evaluation must be made as to whether or not they are receiving the expected Service Level Objective as defined in the Service Catalog. If not… it’s IT’s problem. If they are and are still unhappy, then a discussion must follow about moving them to a different Tier of service as defined in the Service Catalog.
- If a differentiated Tier of service does not exist in the Service Catalog, and there is a defined customer need for that class of service, then an end-to-end service definition must occur. IT must create and price a new offering in the Service Catalog. IT must also create a corresponding new definition in the Reference Architecture to explain how that service will be accomplished and at what cost.
What does this have to do with sizing a storage array?
Whether or not IT is living out the utopian vision of ITaaS, some concrete realities emerge from the ITaaS vision that show up in array sizing, namely: Subscribed vs. Allocated capacity planning, and performance planning measured in IOPS/GB.
Warning: There are lots of different considerations when it comes to array performance: Write%, Skew, Cache Hit Rate, Sequentiality, Block Size, app sensitivity to Latency vs. Throughput, advanced data protection services etc. Let’s take these for granted for now and focus on IOPS/GB, because frankly that’s how the Service Providers do it.
How much capacity do I need to purchase?
Customer facing Service Catalog capacity planning should start with the Subscribed capacity. This defines the promise IT makes to the user about how much capacity is available to store data and at what tier. This is the advertised capacity to which pricing is set.
IT internal Reference Architecture capacity planning should worry more about the Consumed capacity after any available data reduction technologies do their work (thin, de-dupe, compress, etc). This is the actual capacity (plus buffer) to be purchased to which costs are allocated.
What kind of performance do I need?
Customer facing Service Catalog performance planning can be done by SLA or SLO. Do you want to establish a worst case scenario floor that you promise to exceed every time (SLA)? Or do you establish a best case performance target ceiling you promise to try and deliver (SLO)? The world is moving away from SLA’s to SLO’s. The IOPS/GB performance level is defined as the IOPS to the Host divided by the Subscribed GB (IOPS / GBs).
IT internal Reference Architecture performance planning should be done against IOPS to the Host divided by the Thin Consumed GB (IOPS / GBc).
An IOPS / GB sizing example that leads to poor performance
A Business customer has an existing application in a legacy storage environment. They’ve got 40TB of capacity assigned and are doing about 4,000 IOPS to the array.
That’s 4,000 / 40,000 = .1 IOPS/GBs (Host IOPS per Subscribed GigaByte).
A purchase is made for 56 x 1TB SATA drives RAID 6 6+2 providing ~40TB capacity, and 4,480 IOPS (56 * 80 IOPS) to fit the .1 IOPS/GB workload profile rather well (ignoring too many factors).
But … after Thin Provisioning, only 40% of the capacity is consumed (16TB), and the IT Finance Dept says, wow, I’ve still got another 34TB of capacity available. Let’s use it!! The problem is, they’ve consumed all their IOPS, leaving a lot of capacity stranded and unable to be used.
An IOPS / GB sizing example that leads to good performance
A Business customer has an existing application in a legacy storage environment. They’ve got 40TB of capacity assigned and are doing about 4,000 IOPS to the array. Planning for Thin Provisioning, we expect only the first 40% (16TB) will actually be Consumed.
That’s 4,000 / (40,000 * 40%) = .25 IOPSh/GBc (Host IOPS per Consumed GigaByte).
A purchase is made for 42 x 600GB 10K drives RAID 5 3+1 providing ~17TB capacity, and ~5,000 IOPS (42 x 120) to fit the workload profile rather well (ignoring many factors, see “Disk IOPS” discussion further down). This also assumes the risk that Thin Provisioning or other data reduction technologies are able to do their job effectively and allow you to purchase less capacity than in a “thick” world.
Over time, the app performs well, and is more properly balanced in both the IOPS and TB dimensions.
What if your boss says you must purchase the Thick capacity?
Some pointy haired bosses who don’t trust that Thin Provisioning will provide real benifit will say that you must purchase the “Thick” capacity amount due to various sociological / political / economic conditions within your organization.
In this case, make two assumptions:
- Your goal is to reach 80% capacity consumed over the life of your array
- Thin will cause 40% Consumption of every Subscribed GB
Use the formula Host IOPS / ( Thick GB * 40% ) to arrive at your target IOPS/GB for the entire array’s worth of capacity! This will be a much bigger number than before, and will equate to a much higher performance solution. The trick is, when Thin Provisioning (or other data reduction) kicks in, you’ll have so much head room, that someone will want to stack additional applications onto the array to consume all the capacity.
You’ll need the additional performance to accommodate this inevitable consolidation.
Use the above calculated IOPS/GB * (Thick GB * 80%) to reveal the target Host IOPS needed to achieve full efficient use of the installed capacity. It’s never a good idea to consume more than 80% of a presumably over-subscribed resource. It’s ok to target your performance to equate to 80% capacity consumption.
Let’s add RAID Type and Read/Write % into the mix
So far, I’ve been focused on “Host IOPS” when discussing IO Density. Now that we’ve calculated the Host IOPS and Capacity properly, how do we configure an array to achieve a certain Host IOPS Density? We need to translate Host IOPS down into “Disk IOPS.”
By Disk IOPS, I mean the amount of work the underlying Flash or HDD Spindles must do to accomplish the Host IOPS goal. I won’t have room to discuss all of the potential factors that go into a model like this, but we’ll get started down the path.
FYI, some of those factors are: block size, cache benefits, data reduction (benefit or overhead), replication, etc.
Most implementations of traditional RAID types RAID-1, RAID-5, and RAID-6 carry with them some form of write IO penalty. When a host writes to a RAID-1 mirror device, 2 IOs are generated by the backend storage representing the write to the first spindle, and the second write to the mirror pair. So we say that the write penalty for RAID-1 is 2. When a host writes to a RAID-5 protected device and the write I/O is not the full stripe width, then potentially 4 IOs must happen on the backend of the array. One read I/O from the parity portion, a second read I/O from the data stripe, CRC calculations are done in memory, and then a third IO is issued to write the parity portion, and the fourth and last IO is issued to write the data stripe. So we say that RAID-5 has a 4 IO write penalty. For RAID-6, there must be a read of the first parity, a read of the second parity, a read of the data stripe, and after CRC calculations are done three writes must occur to finish the IO. So we say that RAID-6 carries a 6 IO write penalty.
Coupled with RAID type, we must know the read/write percentage of our workload. We assume that reads do not carry any penalty to the backend spindles, so for every host read there is a single disk I/O. Obviously at this point we are not factoring in any cache benefits. We must multiply every host write by the write IO penalty previously described. So knowing that we need to support a 10,000 IO workload which is 60% writes, we can calculate how many IOPS the backend disks must be able to provide.
IOPS to Disk = %READS * Host-IOPS + %WRITES * Host-IOPS * RAID-Penalty
Where the RAID-Penalty is 2, 4, or 6 as described above.
Now that we know how many disk IOPS our spindles must be able to produce, how many spindles do we need? Use the following chart to reference how many IOPS per spindle technology type can be provided with no more than one outstanding queued I/O to the spindle (aka: good response times).
2500 IOPS per EFD (200GB EFD = 12.5 IOPS/GB)
180 IOPS per 15K RPM spindle (2.5″ or 3.5″ doesn’t matter) (300GB = .6 IOPS/GB)
120 IOPS per 10K RPM spindle (2.5″ or 3.5″ doesn’t matter) (600GB = .2 IOPS/GB)
80 IOPS per 7.2K RPM spindle (typically SATA) (2TB = .04 IOPS/GB)
A Final Example
A Business customer has an existing application in a legacy storage environment. They’ve got 40TB of capacity assigned and are doing about 4,000 IOPS to the array. Planning for Thin Provisioning, we expect only the first 40% (16TB) will actually be Consumed. Our pointy haired boss is going to require that we purchase all 40TB of capacity… hmph.
That’s 4,000 / (40,000 * 40%) = .25 IOPSh/GBc (Host IOPS per Consumed GigaByte).
Now factoring in the inevitable growth that we know will happen due to all of the free space that will remain in the array, we say:
.25 IOPS/GB * (40,000 * 80%) = 8,000 Host IOPS that the array solution will need to support over it’s lifespan
Ignoring the complexity of multi-tiered solutions, let’s look for a single spindle type that will meet our needs. Our IOPS/GB target is .25. From the previous table, it looks like a 600GB 10K is .2 and a 300GB 15K is .6… Hmmm, what about a 600GB 15K drive? That would be:
180 IOPS / 600GB = .3 IOPS/GB It looks like a match!
Since this is a consolidated general purpose VMware farm that we’re supporting, let’s assume the Read/Write ratio will be 40% Reads and 60% Writes (this is VERY consistent across VMware clusters, FYI). Also, since we’re using fast FC spindles, let’s see what a RAID-5 3+1 solution will look like, shall we?
Here’s our formula repeated from above:
IOPS to Disk = %READS * Host-IOPS + %WRITES * Host-IOPS * RAID-Penalty
40% * 8,000 + 60% * 8,000 * 4 = 22,400 IOPS to Disk
Wow, that’s a lot of write penalty, since there we only needed 8,000 Host IOPS. Our 600GB 15K RPM drive can do 180 IOPS per spindle, so 22,400 / 180 = ~124 spindles… let’s round up to 128 since that’s a nice power of 2 to achieve the performance needed to support our hosts.
Now, does this provide enough capacity? Well, 128 * 600GB * 3/4 (RAID Overhead) = 57,600 GB. It looks like we’ve over-shot our capacity numbers a little. Perhaps a 300GB drive would be more efficient? Maybe a 300GB 10K? Maybe we should look at a 3 tier approach with a little EFD, some FC, and most of the capacity in SATA?
And this dear reader is an exercise for you 😉
Take this to the extreme with all flash arrays
Always on in-line deduplication provided by quality All Flash Arrays take this notion to the extreme. These arrays don’t allocate anything unless it’s globally unique data. A single EMC XtremIO X-brick provides 7.5TB of usable capacity with 70TB of logical de-duped capacity. This 10X increase in the potential IOPS/GB density requires Flash at the core and innovative workload management algorithms to enable extreme performance that’s hitting a very limited set of allocated data regions.
What about the benefits of cache?
It is a dangerous and complex game to try and figure out the benefit of cache in any enterprise storage array. Cache serves two purposes. First, cache provides a buffer to accept spikes in host generated workload for asynchronous destage to the backend media post acknowledgment to the host. This function provides good latency to the host while leveling out the workload characteristics seen by the backend spindles. This does not necessarily reduce the amount of work that the spindles must do overall. The second function of cache is to legitimately reduce the amount of work the backend spindles must accomplish. This is done by caching host writes for subsequent reuse, and by being a prefetch buffer for reads that are predicted for near future use. In my experience prefetched reads reduce backend spindle work to about 30% of what it would normally be. Accommodating host rewrites to active cached data before it is destaged to the backend saves the spindles from having to do any of that work. The question is how do you predict the amount of read cache hits, read cache misses, prefetched reads, writes to backend, and re-writes from any given host workload? These arrays specific implementation details that depend on your specific host workload profile, and make it very dangerous to assume much cache benefit. Just let it come naturally and be the icing on the cake of your good performing solution.