A few months ago I wrote about using your database to do the heavy liftingRead More
Is a single version of the truth important for BI?
According to Forrester, only 30% of unstructured data and 40% of structured data is currently available within enterprises for analysis by BI and relevant systems.
A central data source of all business data is essential for intelligence integrity, compliance and data governance. A central data lake is more easily managed than disparate sources, and can be continually checked for safety and usability. Unfortunately, it’s increasingly tricky for businesses to manage this central data storage, and to obtain a single view of data. Forrester reports that 44% of business intelligence users complain about not having all the data they need for holistic intelligence.
A Single Source of Truth (SSoT) is a storage principle that relies on one unchallenged repository of information. A Single Version of Truth (SVoT), meanwhile, is one central view of your data that everyone in a company agrees is the real, trusted operating data. Both work together to ensure you get the most valuable answers from the information needed for business intelligence. Providing interactive, ad-hoc access to a Single Source of Truth, for many users, can however be challenging. So, is a single version of the truth always essential?
The fragmented data problem
Wherever your business intelligence is pulled from, data quality and data governance are crucial; both are fundamental to garnering accurate, intelligent metrics. Today, we continue to see organizations struggling with increasing data silos and competing versions of ‘truth’.
“Users are still making decisions based on data that’s provenance is unclear. This can invalidate insights, and cause serious financial drain to the organization.”
Roger Gaskell, Chief Executive Officer, Kognitio
A common headache for business intelligence leaders is knowing the age and accuracy of information. Issues often arise from questions over when, where and how the business data is collected, as well as the settings and processing within your separate intelligence tools. Without a single version of the truth, your BI users, or indeed entire organizations, are working with competing or conflicting information that is unfit for purpose.
It’s unwise for organizations to entertain multiple versions of anything that is used for one purpose. With various versions of data, different approaches may yield contradictory answers from the same information. Then, which ‘truth’ is accurate, and which ‘truth’ is false?
Agreeing and optimizing data sources
The most significant aspect of creating a central data lake is agreeing the accepted sources of data. You should have an acknowledged origination point for each and every data element, whether this is from a legacy system or a manually maintained source.
You should not, however, automatically eliminate data sources; each source should be carefully evaluated to ascertain its value. Then, optimize these sources to ensure they accurately reflect the ‘truth’. Identify and eliminate errors that can cause differences that may arise because of inefficiencies in the ways data is collected, collated, and managed.
Why is a single truth important?
If there are multiple datasets sitting across the business, you will always have access to other versions of the truth, or rather, information that is no longer the truth at all.
“Multiple truths can lead to ineffective business intelligence and bad decision making. Contradictory data erodes trust in the data itself. It restricts the ability of an organization to understand its performance or forecast with confidence.”
Roger Gaskell, Chief Executive Officer, Kognitio
In recent years, Hadoop has been seen as the solution to this headache. In Hadoop, you can consolidate and access all your disparate data sources from one central data lake. This gives you one version of truth.
The truth is slowing you down
Hadoop easily pulls together various data sources, and creates one central data lake. However, your BI users and other analysts are constantly challenged when working with this centralized data in Hadoop. It can be incredibly slow for multiple self-service users, and painful to access. Your need for the truth is slowing you down.
The principal threat to data truthfulness is inconsistently pulling it from the central data lake. Yet, if business intelligence processes are too slow on Hadoop, the truth will always be corrupted to some degree. When information is extracted and saved into personal BI tools or other analytics platforms, there can be multiple modifications and inconsistencies, or countless versions of the ‘truth’.
Software like Kognitio, meanwhile, can run hundreds of concurrent queries over big data sets directly within Hadoop. As a drop-in component of the Hadoop infrastructure, it makes the data in your Hadoop cluster as easy to manage as within your own personal BI tools. Without the need to move your Single Version of the Truth from your Single Source of Truth, the integrity of your business intelligence insights is always protected.
Find out more about protecting truthful data, and getting much faster business intelligence directly from your data in Hadoop with Kognitio.