Tag Archives: Hadoop

EMC Transforms Hadoop Infrastructures

EMC Greenplum HD on Isilon Scale Out NASEMC is transforming Hadoop based Big Data Analytics infrastructures from one-off, build-it-yourself, science projects of the early adopters to a fully supported, proven scalable, incredibly reliable solution for the majority of Enterprise IT shops.  EMC has married it’s proven Greenplum HD distribution of Apache Hadoop with the EMC Isilon, highest performing single filesystem scale-out NAS on the planet.  The Greenplum HD appliance removes the complexity of setting up a big data analytics infrastructure, and allows businesses to focus on generating value from their unstructured data.

 

Why Hadoop?

Not all data resides in a database.  It used to be the case that computers only analyzed data about well structured back office processes.  Business Intelligence was about sorting through transactions, and demographics, and data with very well defined structure.  imageBig Data Analytics is the next “big thing” for enterprise scale business, because not only are we now able to do BI on a much more rapid, iterative, dare I say “real-time” basis, but we are able to conduct these Analytics not just on data describing peoples’ demographics, but describing and tracking peoples’ behavior.  Peoples’ behaviors are fundamentally unstructured.  To track behavior (apparently) creates an unstructured mess of xml schemas, text log files, web traffic data, etc.  Hadoop (really a combination of MapReduce framework with the Hadoop Distributed File System) provides the ability to perform analytics tasks on any relationaly structured or non-structured data.  Imagine being able to iteratively process through all of the data you have about your products, customers, market trends, twitter streams, security logs, purchase history, etc. and come up with a predictive view of potential actions your constituency might take.  You constituency may be your marketing team given customers’ likely buying decisions, your product developers given product quality improvement data, your risk managers given data about potential clients, or your security team provided real-time data about attacks in progress.

Do you like spending money on science projects?

imageThe few who are willing to bet on new tech are called Early Adopters.  The Majority wait for a more guaranteed return on investment.  Early Adopters are willing to dedicate infrastructure for one-off projects, accept single points of failure and limited disaster recoverability, sacrifice solution efficiency for quicker time to market, and maintain a specialized support workforce when normal support channels don’t exist.

Why run a Hadoop appliance with EMC Isilon and EMC Greenplum HD?

According to the Enterprise Strategy Group’s White Paper: EMC’s Enterprise Hadoop Solution: Isilon Scale-out NAS and Greenplum HD (email address required), the EMC Hadoop Solution overcomes the innate issues with home grown Hadoop projects.

  • Isilon’s OneFS operating system eliminates the single point of failure of a single NameNode within Hadoop.  The NameNode contains all of the metadata for the HDFS storage layer.  By distributing all of the metadata across every node within the Isilon cluster, every node acts as a NameNode and provides a highly available solution for mission critical workloads.
  • Isilon’s HDFS implementation streamlines data access and loading by allowing NFS, CIFS, HTTP, or FTP access to data resident on the HDFS filesystem.  Since Hadoop applications can access the data directly without the expense of copy or move operations, this saves time, cost of storage, and greatly simplifies the Analytics workflow.
  • Implementing a dedicated storage layer allows for more efficient utilization of the compute and storage resources by allowing them to expand independently.  Most Hadoop infrastructures are based on DAS inside the compute nodes preventing independent scale.
  • Implementing the EMC Greenplum Hadoop Distribution on EMC Isilon hardware provides configuration backed by EMC’s premiere customer support capabilities.  Customers can leverage their existing knowledge and experience with EMC and Isilon, and don’t have to have specialists on staff to manage the Big Data Analytics infrastructure.

Ultimately any Hadoop implementation is just a portion of the overall Big Data Analytics requirement, but it is one that has held some mystery to traditional infrastructure customers.  Take a cue from what we’re learning from the Cloud value proposition and ask yourself if your enterprise is wants to get into the Hadoop business, or do they want to extract value from Big Data Analytics.  In the end Hadoop is a tool, now you can pick up the phone and “order one.”