What the above tells us, as far as a solution is concerned at least, is that SQL needs to be involved – and involved heavily
It is after all the most widely used query language in the world. People are familiar with it, and it remains the language that popular data visualization tools use to ask questions of a relational DBMS.
The challenge now is to extend this into non-relational data stores, to use SQL as a common language that helps inspire simplified data access to assets located in multiple stores – without having to switch between different APIs to make it all work.
That’s right: SQL on the ultimate noSQL platform – SQL query engines for Hadoop in big data systems that can transform your experience of BI platforms. Tools that return IT to a state of ease and familiarity when it comes to programming analytics apps and integration tasks. Tools that enable developers to make use of their SQL skills and capabilities within extensive Hadoop data lakes, and overcome the ‘weak’ relational functions within the platform. Capabilities that once and for all help Hadoop break free from the confines of the data scientists’ laboratories, and to enter the BI mainstream.
1. SQL is the language of data query, is proven to work in big data environments (eBay uses it to process 50 petabytes each day), and is used by all modern data visualization tools for accessing data
2. SQL is the preferred language of the data management community, and sits naturally with their existing tool sets
3. SQL offers immediate returns – most businesses are familiar with it, and make use of it on a daily basis
4. Fast and efficient ad hoc exploration of Hadoop data enabled by SQL is a top priority and essential for justifying long-term investment in the platform
5. Self-service analytics is increasingly seen as business-critical, and without SQL tools this will be limited, thereby limiting the range of users able to extract value from Hadoop data
The world’s largest payment card issuer has 10,000 active Tableau users analyzing data held in a nine petabyte Hadoop cluster in near real-time.
Their aim is to identify spending patterns across a data set that covers 12 billion transactions. To do this, the company had two options:
1. Create and maintain 10,000 near real-time data extracts for individual clients – which operationally would most likely prove to be impossible
2. Use a query engine for Hadoop capable of handling hundreds of concurrent complex SQL queries over the entire data set – and return the results in near real-time.
Unsurprisingly they went with option 2: Kognitio.