It must be confusing if you are a Hadoop newbie at the moment. There areRead More
Why SQL on Hadoop is one option for gaining real ROI from Big Data
Hadoop was seen as a virtual playground full of possibilities for engineers to reinvent a business’ relationship with its data resources. The business was sold on knowing its customers inside out. Not limited now to just the data it collects from transactions, but other rich content on social media or from wearables or IoT. All this knowledge would pave the way for innovation and market dominance.
Opinions suggest that Hadoop has entered the “trough of disillusionment” in the Garter Hype Cycle. Reasons behind this are dominated by measuring the overall payback for a big data project and also the transition from test or prototype systems to production systems. Businesses have invested time and effort into implementing Hadoop, only to find it’s not the panacea it was claimed to be.
ROI for Hadoop is difficult to quantify. In this article, Guy Harrison talks about “rock star data scientists” who have been able to combine three essential ingredients to provide breakthroughs in big data projects: “strong statistical and data mining expertise, the ability to create software that can implement these algorithms, and the business savvy to identify the problems that these algorithms should solve.” But he also indicates that, “a truly successful data scientist requires years of experience to develop the sort of judgment and imagination required to create innovative big data solutions.” And those people are in short supply.
So where does that leave a business? With a shortage of talent, how can a business begin to get ROI from its existing big data projects?
Well, one way to extend your big data projects outside of the domain of data science is to employ SQL on Hadoop to give access to the data to your wider business, allowing business users to use their preferred tools like PowerBI or Tableau.
Five core supporting reasons for SQL on Hadoop
1. SQL is the language of data query, is proven to work in big data environments (eBay uses it to process 50 petabytes each day), and is used by all modern data visualization tools for accessing data
2. SQL is the preferred language of the data management community, and sits naturally with their existing tool sets
3. SQL offers immediate returns – most businesses are familiar with it, and make use of it on a daily basis
4. Fast and efficient ad hoc exploration of Hadoop data enabled by SQL is a top priority and essential for justifying long-term investment in the platform
5. Self-service analytics is increasingly seen as business-critical, and without SQL tools this will be limited, thereby limiting the range of users able to extract value from Hadoop data
Thinking outside of your open source Hadoop distribution
To really get the benefit of SQL on Hadoop may require thinking outside of the Hadoop stack that came with your choice of Hadoop distribution. Recently developed SQL engines lack the maturity to cope with the range of SQL submitted, and the BI tools interacting with them cannot perform adequately even for one user, let alone hundreds or thousands. In this paper, respected Analyst Mike Ferguson from Intelligent Business Strategies, outlines why open source software should not be your only option.
The excessive complexities that come with running SQL queries on a parallel platform and accommodating high-concurrent workloads has for the majority proven to be an insurmountable barrier to realizing the benefits of Hadoop and big data projects.
Such complexity can be overcome, nullified, simplified with the right SQL engine to help make data accessible wherever it’s needed. Businesses can support the “BI everywhere” agenda and run analyses only limited by the user’s imagination and not the available data.