Author Archives: scott

Measuring the Value of Corporate Data

Increasingly, smart people are taking up the mantle of assigning economic value to the data or information assets within organizations. Steve Todd is producing a blog series on the topic. Dave McCrory has begun looking at gravity theory applied to data (Data Gravity) in an attempt at some point to aid valuation or at least guide investments. Various scholarly papers have been written since the 90’s discussing methods of economic valuation.

First, who cares? We are fully involved in the digital economy. How do we estimate the value of an information based company? Why isn’t the very information asset against which many digital businesses are based represented on the balance sheet? For how much should we insure our data assets against loss? What is a fair price to charge for access to information? What value can we assign information to use to collateralize a loan?

And on the flip side, what is the negative value of leakage of information into the hands of bad guys? What about the tax implications of an information economy? What value is being traded by an information currency every time I barter my personal data for online services or discounts at the local grocery?

Economists, venture captialists, insurers, CFO’s, shareolders all care how valuable a company’s information assets are, but how can they assign fair value? That’s the topic of an EMC research study in partnership with University of California San Diego researcher Dr. Jim Short to explore “all things data value”. Jim is the Research Director for the Global Information Industry Center (GIIC).

To further this discussion, I’d like to comment on a paper I read several years ago on this topic titled “Measuring The Value Of Information: An Asset Valuation Approach” presented at the European Conference on Information Systems (ECIS ’99) by Daniel Moody and Peter Walsh.

In short, information is an Asset in the economic technical term, and it has measurable value. Only the method of measurement is in question. Moody and Walsh identify 7 “laws” of information that I will make further comment on

“Information is infinitely shareable.” I agree, information is not generally “appropriable” in the sense of exclusive possession. Anyone can make a perfectly valid copy or share access to the original if they are within the network universe. In this sense the value can be cumulative of all shared points of use, and is more valuable the more it is shared. This is the primary driver of “value in use” of information. On the other hand, what about copies of data? Is duplicated information more valuable? What if it is duplicated for DR purposes? Perhaps only if it is still “owned” by the original party. Data piracy is an example of copies of information returning no or negative value to the orignial owner.
“Value of Information increases with use.” Yep, absolutely as opposed to other more tangible assets that depreciate over time. This law also leads to one of the most important methods of valuation—measuring the frequency of access of information and by whom.
“Information is Perishable.” They are suggesting that information’s value decreases over time. I would tend to agree, but with the caveat that Big+Fast Data analytics are extracting value from old data in ways never imagined before. There needs to be a method of predicting future value of presently unused information. This type of future value speculation may be very hard to do. I wonder if there are economic forces similar to holding land for long periods, in the hopes that one day gold may be discovered. There may be categories of information holdings that are assumed to be more valuable due to past discoveries of value in similar types of datasets.
“The value of information increases with accuracy.” So I tend to think accuracy is overrated. I mean, have you seen all the hoax articles on facebook recently? Fabrications and lies are valuable too. But I get it. Generally you want accracy in your datasets. This builds trust which in turn builds value.
“The value of information increases when combined with other information.” Absolutely a gold star on this one. This is much of the premis of Data Gravity. Information pulls other information to it, and data sets become ever larger. This accumulation of valuable information increases its pull of other information and applications. Another name for this is “context.” Information is much more valuable in context, and the buying and selling of information is a huge business today. This law drives the “value in exchange” portion of information valuation as companies buy and sell data.
“More is not necessarily better.” Hmmm. This law is beginning to seem a little outdated. The authors discuss human psychology and information overload. These days the machines are doing the analytics on our behalf, and more of this good thing does seem to be more of a good thing.
“Information is not depletable.” You don’t reduce the quantity of information as it is used. In fact the opposite is true. More information is created through the use of information. This “metadata” (information about information) is often as valuable or more valuable than original content (just ask the NSA). In fact, it is through this metadata on information usage that many aspects of value themselves are determined.

Since information is sharable, imperishable, and nondepletable, we can look at summing both the market “value in use” and utility “value in exchange” to find the true value of a dataset. I wonder if any of the following industries have valuation models that are similar:

Research libraries and their value to a community
Land holdings and their value for mineral rights
Methods of appraisal of antiques or other goods of subjective worth

It’s an interesting topic to begin dialoguing about. Much more needs to be discussed. What do you think?

EMC VMAX3 is a Data Services Platform

VMAX3 is now Generally Available and represents the next generation of market leading platforms from EMC that have established themselves as the most reliable and highest performing data storage platforms on the planet. Let’s examine what makes VMAX3 the first Data Services Platform in the industry and why VMAX3 is an even more revolutionary step from VMAX, as VMAX was from DMX. Along the way let’s dispel some myths about the product.

At launch VMAX3 is ready for the most mission critical workloads. At its introduction, the original VMAX was a revolutionary platform, virtualizing the matrix interconnects between internal components and allowing for both scale-up and a scale-out expansion based entirely on Virtual Provisioning. These were enormous leaps forward in the simplicity and scalability of Symmetrix. EMC has developed rigorous testing, manufacturing processes, and incredible architecture designs that gave VMAX and then VMAX2 the title of “Most Reliable Symmetrix” ever produced.

VMAX3 is standing on the shoulders of the VMAX2 and is taking Reliability, Availability, and Serviceability to the next level. Additional redundancy has been added on the backend DA connections to the DAEs. More upgrades and service events leave the director online. Tons of serviceability design changes for CS
have been added such as rear facing light bars (ever try to find one of these in a datacenter?), a work tray in the cabinets, etc. that set the VMAX3 as the RAS benchmark in the industry.

A Data Services Platform must be available, but it must also take functionality to the next level. EMC’s goal for VMAX3 HyperMax OS is to make it the foundational component in our customer’s data center providing access to all of our best capabilities, FAST, SLO, FTS, Cloud, SRDF, TimeFinder, ProtectPoint, etc…).

HyperMax is our most comprehensive OS rewrite to date that takes the massively parallel preemptive multitasking kernel of VMAX3, and opens the door to running other data services apps inside the array. The VMAX3 Data Services Platform can be extended with additional software functionality online as it comes available. Most critically to our customers, code upgrades are done without taking a single component offline. There is no failover/failback, one-at-a-time reboots, ports offline, etc. No other array can do this!

VMAX3 is designed for Always On operations. Combined with SRDF, VPLEX + RecoverPoint, or ProtectPoint customer operations are always protected from outage, site failure, or any other outage situation.

We talk to our customers about “always on” Platform 3 infrastructure, and now we’ve built VMAX3 into an “always on” Data Services Platform.

VMAX3 Always On

Our phased release schedule enables us to get products to market faster. We tell our customers how Agile development has revolutionized the way apps are brought to market and how the new normal is “fast.” Get products out fast and iterate fast to
ramp up to provide the most important features first, prioritized by customer demand. It’s one thing to do this in the Platform 3 web space, because the service is designed to be always on, and upgradable without degradation to the end user experience. To do this in the hardware market, the platform must provide the same Always On availability to allow upgrades that enable the additional functionality.

In exactly this way, VMAX3 provides us a platform that is always available, and through HyperMax, additional data services can be added later online. Getting the base platform right is critical. As previously discussed the upgradeability of VMAX3 allows us to introduce this revolutionary product now, which is crucial to maintaining market leadership position and building momentum while simultaneously creating a revolutionary new data services platform.

VMAX3 will carry customers toward Storage as a Service. Fundamental to the value proposition of VMAX3, like VMAX Cloud Edition before it, is the idea that purchase decisions and provisioning are based on Service Level Objectives as opposed to rotational speed of the drives. This outcome-based thinking is a wave carrying our customers toward the beachhead of Storage aaS.

Three years ago, EMC’s message for introducing ITaaS Transformation was 1) Have C-level sponsorship, 2) Pick a project and grow it, 3) Let the technology get you there. Well, VMAX3 is getting our customers there. Whether or not they have financial models that equate to the service levels inside VMAX3, they gain the simplicity and ability to automate processes that Service Level based provisioning provides.

The VMAX3 Data Services Platform can handle any performance Service Level Objective required by the customer. Designed as an all-flash capable array, any of the VMAX3 family can support all-flash configurations. Should the customer chose a higher capacity or lower cost design, spinning disk can be used in combination to provide various SLO’s of performance and capacity within the array. Front-end and Back-end CPU cores are now pooled giving any port full line rate capability, doubling the IOPs capable on VMAX2. PCI Gen 3 and 6Gb SAS connections to the DAE’s deliver incredible bandwidth for DSS workloads, tripling the throughput of VMAX2. VMAX3 provides both the rich data services functionality and the performance required to process the avalanche of new data in the datacenter.

In summary, VMAX3 is an “always-on” scale-up-and-out Data Services Platform that can be online extended via HyperMax OS to provide additional advanced software functionality within the array over its lifespan with incredibly easy to use SLO based provisioning meeting any performance requirements of SAN or NAS attached hosts. This redefinition of the storage array makes VMAX3 a larger step forward from VMAX as VMAX was from DMX.

How I installed OpenStack at a Service Provider in 20 mins

I’ve been playing around with my skillz in python, puppet, OpenStack, deployment methods, etc for a few weeks. Here’s a small example of deploying OpenStack in a non-production environment (no hardening or customization of any kind) in 20 mins. Maybe it’ll help you in your self education.

For testing purposes I use a service provider called Digital Ocean. I’m sure what they do is competitive to all things EMC, so this is not an endorsement. They do, however have VERY CHEAP costs if you’re just doing testing (on the order of < 1 penny per hour) with linux based services (again YMMV, I have no idea how good they are).

Now when I say 20 mins, that’s how long it takes for the procedures to run once you figure out what to do. I’ve spent a couple of days playing with the site, VM’s, python, devstack, etc. to get the procedure in place. That said, it took 20 mins of wall clock time to get OpenStack running from the devstack.org package on a single machine deployment (ie: not production ready).

At the end of this post is the python code to talk to the Digital Ocean API and deploy the instance. Since it’s not very sophisticated code, I just edit it directly to do what I want to do as opposed to passing command line arguments (list, deploy, destroy).

So, to get a VM I called ‘devstack’ I executed my little script with settings hard-coded:

./go.py

The script returned the following output, parsed from the JSON response:

107.170.89.157 devstack 1790811

Then I did the following to install OpenStack:

$ sudo echo "107.170.89.157 devstack" >> /etc/hosts
$ ssh root@devstack
# adduser stack
# echo "stack ALL=(ALL:ALL) ALL" >> /etc/sudoers
# apt-get install git
# su - stack
$ git clone https://github.com/openstack-dev/devstack.git
$ cd devstack && ./stack.sh

Output:lots of output from stack.sh... 
Horizon is now available at http://107.170.89.157/
Keystone is serving at http://107.170.89.157:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: NOTFORYOUREYESTOSEE
This is your host ip: 107.170.89.157
stack.sh completed in 794 seconds.

Here is the python script I mentioned above, It’s not great, but maybe it’ll give you some idea how to talk JSON, and return results to an online API.

#!/usr/bin/pythonimport requests, json, pprint, time, socket, sys
################
## make the json call to the public api
################
def jsonRequest(targetUrl):
 #set the headers for how we want the response
 headers = {'content-type': 'application/json','accept':'application/json'} 
# make the actual request
 r = requests.get(targetUrl, headers=headers, verify=False)
#take the raw response text and deserialize it into a python object.
 try:
 responseObj = json.loads(r.text)
 except:
 print "Exception"
 print r.text
 #print json.dumps(responseObj, sort_keys=False, indent=2)
 return responseObj
################
## return list of instances
################
def getDroplets(apiKey, clientId):
 #set the target url for the query
 targetUrl = "https://api.digitalocean.com/v1/droplets/?client_id=%s&api_key=%s" % (clientId, apiKey)
# make the actual request
 resultList = jsonRequest(targetUrl)
 return resultList['droplets']
################
## Deploy a new instance
################
def deployDroplet(apiKey, clientId, hostName):
 sizeId = '66' #smallest; 62 = 2GB Ram
 imageId = '3240036' #ubuntu 64bit
 regionId = '4' #NY region
 sshKey1 = '153816' #brighs ssh key
 sshKey2 = ''
 privateNetworking = 'true' # create private network
#set the target url for the query
 targetUrl = "https://api.digitalocean.com/v1/droplets/new?client_id=%s&api_key=%s&name=%s&size_id=%s&image_id=%s®ion_id=%s&ssh_key_ids=%s,%s&private_networking=%s" % (clientId, apiKey, hostName, sizeId, imageId, regionId, sshKey1, sshKey2, privateNetworking)
# make the actual request
 responseObj = jsonRequest(targetUrl)
 return responseObj['droplet']
################
## destroty an instance
################
def destroyDroplet(apiKey, clientId, dropletId):
 #set the target url for the query
 targetUrl = "https://api.digitalocean.com/v1/droplets/%s/destroy/?client_id=%s&api_key=%s" % (dropletId, clientId, apiKey)
# make the actual request
 responseObj = jsonRequest(targetUrl)
 return responseObj
######################################
apiKey = "GOGETYOUROWNAPIKEY"
clientId = "GOGETYOUROWNCLIENID"
droplets = getDroplets(apiKey, clientId)
for droplet in droplets:
 #for key in droplet:
 # print key
 print "%s %s %s" % (droplet['ip_address'], droplet['name'], droplet['id'])
 #print "destroying droplet %s" % (droplet['name'])
 #destroyResult = destroyDroplet(apiKey, clientId, droplet['id'])
 #print destroyResult['status']
#droplet = deployDroplet(apiKey, clientId, 'devstack')
#print "%s %s" % (droplet['name'], droplet['id'])
#deployDroplet(apiKey, clientId, 'tester4')
#deployDroplet(apiKey, clientId, 'tester5')
#destroyDroplet(apiKey, clientId, '1786240')

EMC ViPR Leverages 3rd Party OpenStack Cinder Plugins

EMC ViPR 2.0 was just announced today at EMC World. It brings a distinct level of maturity to a product that was only just been brought to market, but is already the most advanced storage controller of heterogeneous arrays. One of the most interesting aspects of the announcement was the leverage of Cinder to provide ViPR a much wider support capability for heterogeneous arrays!

If you haven’t heard of ViPR, it’s not “storage virtualization” in the sense that you’re familiar with. It’s a control-plane product like Virtual Center rather than a virtualization product like the hypervisor ESXi. ViPR discovers storage, SAN, and hosts and controls the provisioning and connectivity of those assets together. It virtualizes (in a sense) heterogeneous storage “southbound” and provides the common higher function API “northbound” to any framework product that wants an easy way to communicate with the Software Defined Storage layer.

Enough similar products and management frameworks have been created over the years for the SNIA storage standards organization to create an API standard called SMIS. EMC VMAX and VNX have SMIS providers for instance that ViPR can use to manage provisioning on these arrays. Other vendors do not so easily provide or so fully support the SMIS standard, and as such ViPR can’t use a standards-based approach to managing them. Thus the EMC ViPR team was left with creating plugins for each and every array to provide provisioning support–a daunting task indeed.

Enter the OpenStack community. I’m sure as we were throwing our weight into the OpenStack community, some smart-gal-or-guy in our Advanced Software Division said, “Hang on a sec. All these vendors are producing Cinder plugins for OpenStack. Why can’t ViPR just leverage them?” And where SMIS couldn’t deliver, a viable open source community has created a widely adopted API for OpenStack.

Let’s make sure you understand this. We’ve had Cinder API support for “Northbound” into OpenStack. This provides OpenStack the ability to provision block storage against ViPR-as-a-virtual-storage-array. This new announcement leverages every other vendor’s Cinder support to provide ViPR the ability to communicate “Southbound” into the array and manage it with ViPR-as-a-SDS-controller. This is a VERY different animal and instantly provides ViPR a wide range of heterogeneous storage management and automation capabilities.

Added to this common industry storage support, the ViPR team has now announced a plugin support (rich data services support) for Hitachi arrays, with more on the way. Great stuff from the EMC ViPR team.

Size Does Matter

A colleague of mine Sean Cummins recently posted a very good summary of VMAX FAST VP best practices which got a little derision from some of the competition.

Much of the competitive argument is from those who sell an all flash array that can store about 100TB after deduplication. That’s a single RAID type. That’s a single tier. That’s a single price per GB. That’s a single IO per GB. That’s great, but what about the other 90%+ of the capacity of the VMAX, and all of the other workloads that don’t require the highest of IOPS and lowest of latencies? EMC marketing has focused on the “optimization of flash” within the VMAX, but the flip side of that point-of-view is to drive costs out by a massive introduction of SATA without suffering the performance hit.

For the last several years, and for some time to come flash has been a means to an end. The real value of flash has been to enable the placement of as much capacity as possible onto cheap, high capacity, slow SATA. Customers can’t afford to place petabytes of data on all-flash solutions. There aren’t all-flash solutions that can accommodate that scale. The petabyte workloads have not required all-flash performance.

This to shall pass. Every major storage vendor has an all-flash product. There are even customers today (who shall not be named) that purchase all-flash VMAX! We’re starting to see how Latency is becoming the name of the game as opposed to IOPS. This will drive all-flash array consumption for those smaller workloads that can fit on an all-flash solution that don’t require enterprise federated consistency protection for BC/DR (that’s a whole other topic).

The world is changing rapidly. Data growth is exploding. Companies want to store everything in the hopes that they’ll be able to mine their data for customer insights. We’ll see what storage architectures emerge as the most effective for each of the various workloads in the world. My warning is this. Whoever is selling you “only one way” of doing things is SELLING YOU on an idea that all of your workloads are the same. They have the proverbial hammer, and they’re trying to convince you all your problems are nails. Don’t buy it.

EMCs OpenStack strategy

Like most things at EMC, the OpenStack integration strategy is focused on customer choice. It’s an attempt to meet the customer where they leveraging IT assets, specific workloads, and integration points into all of our products. Additional automation and deeper integration we provided by our software defined storage can control plane software called ViPR. Obviously all of the deep reporting, SRM tools, and storage data protection services will remain and be brought into the OpenStack integration.

To jump right into EMC integration with OpenStack Cinder, we provide native plug-ins for all of our block storage products, iSCSI Isilon and Scale I/O. Several major distributions such as Rack Space, Red Hat, and Canonical distribute the plug-in for VNX and VMAX, and versions of the Cinder plug-in support fiber Channel are available from Github or the EMC web site. Third parties have written the Cinder plug-ins for Isilon iSCSI and Swift support and iSCSI support for Scale I/O.

EMC’s recommended approach would be to use the Cinder plug-in for our ViPR storage control-plane product, since through that single plug-in iSCSI and fiber channel support are provided across all our products, as well as object storage support for Swift, S3, and our own Atmos APIs. That plug-in is available from Github and will be coming to the major distributions soon. The advantage of using ViPR especially in fiber Channel environments is that ViPR will automate the zoning and the definition of rich service pools that provide fully automated storage tiers, replication settings, and high availability configuration.

Not only is there a single integration point with the ViPR Cinder plug-in for block storage, but with ViPR data services we bring Swift API and Amazon S3 API support. Finally, ViPR brings this automation support to all consumers of block storage, not only OpenStack but VMware and other virtualized or bare metal hosts.

Community for Sharing Storage Resource Management (SRM) Custom Reports

I was in an EMC and EMC Partner training class around SRM 3.0 (Storage Resource Management Suite) today, and came away very excited about one of the latest software assets in EMC’s portfolio (SRM home page). SRM 3.0 is a comprehensive tool for reporting on capacity and performance of the largest storage enterprises in the world. Using passive or standards based active discovery, SRM can provide comprehensive capacity and performance reporting and trending across your entire IT stack: VM’s, physical servers, SAN infrastructure, and storage elements.

EMC is becoming very good at providing a simple and easy out-of-the-box experience for its software products while preserving a very rich underlying functionality and flexibility that elevates EMC products above the competition. This balance is very evident in SRM 3.0, where predefined active dashboards show summary data, and allow click-through to the underlying metrics and detailed reports. (Watch a demo video here).

During the class lecture, (and the first hands-on experience I had with the tool) I was able to modify one of the standard storage pool reports to show the data reduction associated with Thin Provisioning on VMAX and VNX. I discussed some of these formulas in a previous post (Some EMC VMAX Storage Reporting Formulas).

One of the best parts of SRM 3.0 is the ability to export and import Report Templates. As such, I exported the report I customized, and posted the XML here for you to download and import into your own SRM 3.0 installation.

I’d like to invite all of you to export reports that you have customized for your environment with the community, so that we can engage the power of everyone to advance the art of storage resource management.

My first python and JSON code

I’m not a developer. There I said it.

I’m a presales technologist, architect, and consultant.

But I do have a BS in Computer Science. Way back there in my hind brain, there exists the ability to lay down some LOC.

I was inspired today by a peer of mine Sean Cummins (@scummins), another Principal within EMC’s global presales organization. He posted an internal writeup of connecting to the VMAX storage array REST API to pull statistics and configuration information. Did you know there is a REST API for a Symmetrix?! Sad to say most people don’t.

He got me hankering to try something, so I plunged into Python for the first time, and as my first example project, I attached to Google’s public geocoding API to map street addresses to Lat/Lng coordinates. (Since I don’t have a VMAX in the basement)

So here it is. I think it’s a pretty good first project to learn a few new concepts. I’ll figure out the best way to parse a JSON package eventually. Anyone have any advise?

###################
# Example usage
###################
$ ./geocode.py
Enter your address:  2850 Premiere Parkway, Duluth, GA                 
2850 Premiere Parkway, Duluth, GA 30097, USA is at
lat: 34.002958
lng: -84.092877

###################
# First attempt at parsing Google's rest api
###################
#!/usr/bin/python

import requests          # module to make html calls
import json          # module to parse JSON data

addr_str = raw_input("Enter your address:  ")

maps_url = "https://maps.googleapis.com/maps/api/geocode/json"
is_sensor = "false"      # do you have a GPS sensor?

payload = {'address': addr_str, 'sensor': is_sensor}

r = requests.get(maps_url,params=payload)

# store the json object output
maps_output = r.json()

# create a string in a human readable format of the JSON output for debugging
#maps_output_str = json.dumps(maps_output, sort_keys=True, indent=2)
#print(maps_output_str)

# once you know the format of the JSON dump, you can create some custom
# list + dictionary parsing logic to get at the data you need to process

# store the top level dictionary
results_list = maps_output['results']
result_status = maps_output['status']

formatted_address = results_list[0]['formatted_address']
result_geo_lat = results_list[0]['geometry']['location']['lat']
result_geo_lng = results_list[0]['geometry']['location']['lng']

print("%s is at\nlat: %f\nlng: %f" % (formatted_address, result_geo_lat, result_geo_lng))

Enterprise Storage Scale Out vs Scale Up

A lot of what we hear in the cloud washed IT conversation today is that scale out architectures are the preferred method of deployment over scale up designs. As the theory goes, scale out architectures provide more and more compute connectivity and storage capacity as you add nodes.

Most vendor marketing has jumped onto the scale out bandwagon leaving implementations that also scale up indistinguishable from the rest. Properly conceived scale out architectures allow us to add additional discrete workloads to an existing system by adding nodes to the underlying infrastructure. The additional nodes accommodate processing required to support the additional separate workloads. This kind of share nothing design, allows us to stack a large quantity of discrete workloads together and manage the entire system is one entity, even though it can be very hard to balance workload across the nodes (silo’ed).

But what about the large quantity of monolithic workloads that need scalable performance but are not able to be distributed across independent nodes? Take for example a single database instance that begins life with just a few users, and is supported well on a single node of a scale out architecture. As this instance grows, more users are added, the database grows beyond the confines of the single node. In many pure scale out implementations, adding additional nodes is ineffectual. The additional nodes do not share in the burden of the workload from the first node. They are designed merely to accommodate additional discrete workloads.

EMC scale out and up products such as VMAX Isilon, provide the cost benefits of a scale out model with the performance benefits of a scale up model. Purchasing only a few nodes initially, IT gains the cost benefit of starting small, and as workloads increase, additional nodes can be brought online, and data and workload is automatically redistributed across all nodes. This creates a share everything approach that supports the ability to scale up discrete applications and while also providing the ability to scale out stacking additional applications into the array along with the first. Software-based controls provide for multi-tenancy, tiering and performance optimization, workload priorities, and quality of service. This is as opposed to scale out architectures where rigid hardware defined partitions separate components of the solution.

I hope this helps demystify “scale out” a little. Just because you can manage a cluster of independent nodes, doesn’t mean it’s an integrated system. Unless workload tasks are shared across the entire resource pool, scale out does not confer the benefits of scale up, where scale up-and-out designs provide the best of both worlds.

Some EMC VMAX Storage Reporting Formulas

What goes around comes around. Back in the day everything was done on the command line. Storage administrators had their scripts, their hands never had to leave the keyboard, they could provision storage quickly and efficiently, and life was good. Then the age of the easy to use GUI came into effect. Non-storage administrators could deploy their own capacity, and vendors espouse how much easier and more effective IT generalists could be in managing multiple petabytes of capacity with the mouse. Now we’re into the age of the API. Storage along with all other IT resource domains has its reporting API and developers are now writing code to automatically report on and provision storage in the enterprise. And many shops have realized that the command line is as effective an API for automating reports and doing rapid deployments as any REST based API could be.

So I’m getting a lot of questions again how to do some basic reporting on a VMAX storage array from command line generated or API generated data. Customers are writing their own reporting dashboards to support multiple frameworks.

The most fundamental question is how much of my storage array have I consumed? Assuming one is inquiring about capacity and not performance, there are two aspects to that question. The first question is from the business, “How much capacity has been allocated to my hosts?” In other words how much capacity do my hosts perceive they have been assigned? In an EMC VMAX array this has to do with the Subscription rate. For better or for worse we provide a Subscription percentage of a pool that has been Subscribed, as opposed to an absolute value of capacity.

To calculate how much of your array has been Subscribed, sum the products of the Pool % Subscribed by Pool Total Available GB to get the total capacity Subscribed to the hosts.

EFD Pool Subs % * EFD Pool Total GBs = EFD Subscribed GB
FC Pool Subs % * FC Pool Total GB = FC Subscribed GB
SATA Pool Subs % * SATA Pool Total GB = SATA Subscribed GB
Sum the totals above = Array Subscribed GB

Using the example above, this would be:

0% * 4,402.4 = 0.0  EFD Subscribed GB
177% * 77,297.4 = 136,816.398 FC Subscribed GB
73% * 157,303.7 = 114,831.701 SATA Subscribed GB
251,648.099 Array Subscribed GB

Then compare that number to the sum of the Total GBs provided by each of your three pools to reveal the ratio of the subscribed array capacity versus the total available capacity.

If you’re Array Subscribed GB is larger than your Array Total GB, then tread carefully since you have oversubscribed your array’s capacity!

Subscribed GB / Array Total GB = Array Subscription Rate
251,648.099 / 239,003.5 = 105.29% oversubscribed in this case!

To understand how much capacity has been filled physically allocated on the array after all data reduction technologies have done their job (thin provisioning, compression, de-dupe), simply sum the Used GBs capacity from each of your pools to show an overall array consumed capacity. Compare this number to the previously mentioned subscription capacity to show the benefit of data reduction technologies in the array.

Sum of Used GBs above is ( 3855.6 + 38708.4 + 43987.6 ) = 86,551.6 GB
251,648.099 Subscribed GB / 86,551.6 Consumed GB = 2.9 : 1 Data Reduction provided by Thin Provisioning! Not bad at all!

Likewise sum the free capacity measures from each pool together to get the available capacity remaining in the array.

There you have it. If you’re not into the GUI, just a little math provide you the dashboard data you need to show your executives the capacity utilization of the array.

Happy hacking!

Harnessing Unicorns

leveraging the most valuable cloud native services, products, and practices to deliver the quickest time to value