The Era of Big Data requires an ‘Information Anywhere’ approach to business analytics


Gartner Research refers to the concept as the Logical Data Warehouse, while Enterprise Management Associates calls it the “Hybrid Data Ecosystem“, and Radiant Advisors refers to the idea as a “Modern Data Platform”.  Whatever you call it, the Kognitio Analytical Platform is designed to deliver on the requirement for today’s business information systems that they enable data to be stored, or “persisted” in the most efficient data storage environment and processed or analyzed anywhere.

Many organizations will slowly move their primary data storage environment to Hadoop.  Others may continue to grow their data warehouse environments, or dynamically orchestrate between the conventional databases that store data and other storage paradigms (key value store, NoSQL databases, etc.)  The key is that no one system will have all the data, all the time.  Kognitio is deigned to work in a tiered environment – via innovative software design that recognizes data can be persisted anywhere, then “pinned” into memory for fast analytical processing.

This dis-aggregation of storage and processing layers delivers on a Modern Data Platform that recognizes business users’ have tools that leverage industry-standard SQL, that are engrained in their processes and practices. Someone once said “Changing business users tools is harder than changing their religion.”  Instead, the tools and their SQL should be accommodated by a flexible platform for analytical processing with super low-latency response times, high-throughput and high-concurrency.

Two key components separate an analytic database from a full-scale analytical platform:

  1.  Tight Hadoop Integration

Rather than offering a simple connection to Hadoop clusters, Kognitio has implemented an advanced, integrated interaction method directly with the Hadoop Distributed File System (HDFS).  This skips over HIVE and its batch processing latency to interact directly with the data.

Superior to any other approach, it can pass SQL queries, reduced into machine code, wrappered in Map/Reduce jobs directly into the Hadoop clusters.  This enables the sophisticated filtering and projection algorithms developed by Kognitio to operate within the Hadoop cluster.   The end result is that the Hadoop MPP cluster is efficiently utilized for this pre-processing and delivers the specifically required data (particular rows and columns, for example) to Kognitio.

This process assumes that data is stored in a structured or semi-structured format within the Hadoop Cluster, as in, Extract, Load and Transform (A new twist on “ELT”) processes are performed within the cluster or data is stored in that fashion.

NOTE: the way this works is actually more elegant – since it may sometimes be faster to pull an entire file if it is less than, say 100 million rows, an entire file can be pulled OR the above-referenced Map/Reduce Agent can run to filter millions of records from billions of records.  This also suggests that the data would have been structured IN Hadoop, using one of the many methods than enables ELT or ELT therein.

2.      External Tables

An important aspect that ‘SQL on Hadoop” tools often forget is that the most important data resides in databases and data warehouses around the enterprise.  Those systems hold the keys to CRM, HR, Finance, Sales and other important information relevant to business analytics.  Kognitio has addressed this via a feature called External Tables.

An external table, literally, is a table that is external to Kognitio, but can be “addressed” by the Kognitio Analytical Platform.  In this way, data stored in an RDBMS, in a cloud storage subsystem (such as Amazon S3), or the optional free persistent disk storage from Kognitio (also an RDBMS) can be brought into memory for processing on a ready-basis.

That means that the entire data ecosystem, leveraging Kognitio Analytical Platform can understand data that sits in anywhere, and make it available for access by bringing it into memory for fast processing and delivery to any interface.

An open toolset is available for users to develop external table interfaces for any persistent data store, and Kognitio has provided the toolsets for its own persistent data storage, Amazon S3 Cloud Storage, and Hadoop.  New high-speed external table interfaces are in development, some jointly with clients and partners.

In a big data environment, it is of course important that all of this happen at high-speed in a massively parallel and multi-threaded environment.  This is where it helps to be working with technology that is at its core true MPP and in-memory!

The analytical platform is a key enabling component that is required to modernize enterprise data environments.  It enables organizations to free themselves from the tyranny of vendor lock-in and the “price per terabyte” concept of storage.  As data volumes continue to grow, that concept of paying any vendor for every drop of data stored will be untenable. In the Kognitio Analytical Platform, only the amount of memory (RAM) used needs to be licensed, enabling users to pay for the value that they receive, in terms of analytical processing, as opposed to the data they store.

Demand the freedom of ‘information anywhere’ with an analytical platform approach to your Big Data Ecosystem today!