Talent is still hard to come by in the big data sector, a new reportRead More
Advanced Analytics and the changing BI Market
In years past, Business Intelligence (BI) has predominantly been about using data to report on what has happened in the past. For forward-thinking organizations, this is no longer sufficient. Today, they need to look forward, not back. Using past data and independent variables to predict what is going to happen in the future, or “Predictive Analytics,” is a large part of what has come to be known as “Advanced Analytics.”
A simple BI reporting workload is all about data filtering; i.e., extracting the data of interest from the overall data set.
- For example, if I want to find how much revenue was made from selling snow shovels in Chicago in January, the platform has to filter out all the rows of data concerning snow shovels, Chicago and January from all of the other sales data and then total up the revenue.
- The bulk of the effort is expended via the filtering. The totalling up is insignificant. Techniques such as indexing and columnisation are all designed to make the filtering stage quicker, but do little to help with actual totalling up.
Advanced analytics completely changes the workload for the platform doing the work. Filtering data is merely the first step. Once the analytical data set, referred to as the “data of interest,” is obtained, the analytical platform then needs to execute some complex, compute-intensive algorithms against the data. If we continue the example from above:
• After all the data is filtered and available, it then feeds the algorithms. For predictive analytics, this might be accomplished through a programming language like R. In our example, we can fit a model with years of historical data, incorporating other relevant data (e.g., weather patterns/forecasts, foot traffic, historical sales of complementary products, etc.).
• Then, we need to run that “R” statistical product forecasting algorithm to produce a model for how we think sales will go next January. Depending on complexity, the model can run from one to thousands, or hundreds of thousands of lines of code.
Now, the filtering becomes insignificant in relation to the processing step; how the data is stored, columnar, row, indexed, etc. becomes pretty much irrelevant.
All this is fantastic news for the Kognitio Analytical Platform, as it has always been about processing data. Processing data using complex analytical algorithms is a CPU-intensive task. By fully parallelizing every operation, across all available CPU cores, Kognitio can efficiently bring more horsepower to bear against a given data volume than any other technology. And since Kognitio supports full scale-up and scale-out capabilities, the number of CPU cores we can effectively support is, to all practical intents, unlimited.
Kognitio version 8 also has a feature called “Massively Parallel In-Memory Code Execution.” This feature allows analytical algorithms written in any language to be embedded within the platform and executed in a fully parallel context. The Kognitio implementation is unique as it doesn’t have to explicitly support a particular algorithm or set of algorithms. It can run any pre-existing algorithm or set of algorithms, whether it be open source such as “R,” or licensable code such as MATLAB, WPS SAS functions or code that is proprietary to an organization; e.g., fraud or risk algorithms. This feature takes Kognitio well beyond a simple MPP SQL database engine and turns it into a powerful parallelisation engine for advanced analytics with unrivalled flexibility and scalability.
One interesting and little-known fact is that back in 1988, the very first prototype of what is now the Kognitio Analytical Platform was called an Information Processing Engine (IPE). Even today, this is still the prefix to the tables that hold all our system data. Advanced Analytics is all about Information Processing, so Kognitio still meets the original vision of the engineers who first designed it more than a generation ago.