One of the key challenges for any organisation embarking on a big data project will be ensuring that costs are kept under control – something that is not always easy to do when firms are collecting and storing huge amounts of information.

Therefore, in order to tackle this issue, IBM has revealed it is working on a new method for automatically classifying information in order to ensure the most relevant data is always on hand.

Known as 'cognitive storage', the solution involves putting value to incoming data, determining what data should reside on which type of media, what levels of data protection should apply and what policies should be set for the retention and lifecycle of different classes of data, Computer Weekly reports.

IBM researcher Giovanni Cherubini explained the most obvious answer to the challenge of handling large amounts of data while keeping costs low is to have tiers of storage – such as flash and tape solutions – with the most important data held on the fastest media.

The machine learning tool aims to assess the value of data and direct it to the most appropriate solution, by studying metadata and analysing access patterns, as well as learning from the changing context of data use to help it assign value. 

IBM researcher Vinodh Venkatesan added: "Administrators would help train the learning system by providing sample files and labelling types of data as having different value."

For business users, the challenge of this is that they will have a large variety of data – from business-critical transactional data to emails, machine sensor data and more – so it will be essential that any cognitive storage system is able to categorise this correctly.

Mr Venkatesan said: "For an enterprise, there are ‘must keep’ classes of data and these could be set to be of permanently high value. But that is a small proportion in an enterprise. 

"The rest, the majority, which cannot necessarily be manually set, can be handled by cognitive storage – such as big data-type information and sensor information that might have value if analysed."