With large-scale data breaches in the news with increasing frequency, businesses that use massive amountsRead More
Hadoop data lakes ‘must be secured’
With large-scale data breaches in the news with increasing frequency, businesses that use massive amounts of information to aid in their everyday operations or key strategic decision-making need to make sure that the assets they own are as secure as possible.
Greg Hanson, vice-president of EMEA business operations at Informatica, stated in a recent article for SC Magazine that as tools such as Hadoop become more popular, figuring out how to protect the information within needs to become a top priority.
"As more businesses dive into data lakes for collecting, preparing and analysing greater volumes and types of data, IT is working out how to inflate the life rafts for this new approach," he said. "Compliance-sensitive industries, such as healthcare or financial services, or any other consumer-driven industry, such as retail or consumer packaged goods, are legally obligated to ensure strict controls on the use of data."
This means professionals will be expected to devise a data-centric approach to security. Mr Hanson said that while it may appear there will be benefits to a data lake approach, as it will be more straightforward to secure information that is stored in one location, businesses also need to factor in the "myriad ways" in which data gets there.
He stated: "Successful enterprises are taking a holistic approach to big data security by evolving beyond traditional perimeter and endpoint-based security approaches and addressing the security of data itself at multiple levels."
There are several factors that IT professionals must consider to secure their Hadoop deployments – some are which will typically come with the technology, while others may require third-party tools.
For instance, Mr Hanson noted many Hadoop distributions have native Kerberos-based authentication systems for controlling access to data. He added that enterprises are increasingly enabling Kerberos-based access controls and using data preparation technologies that fully integrate with these control systems.
When it comes to authorisation, most solutions ship with tools such as Apache Ranger and Apache sentry to ensure only the right personnel have access to sensitive data.
Mr Hanson also recommending employing data masking technologies to anonymise private and sensitive data and ensure that data security intelligence is used to monitor what is happening to information at every stage of the Hadoop storage and analysis process.