Harvard seeks to tackle big data storage challenges

01

Dec
2016
Posted By : admin Comments are off
big data storage challenges, growth
Categories :#AnalyticsNews

With a growing number of companies looking to expand their big data analytics operations in the coming years, one key consequence of this will be an explosion in the amounts of data that businesses will have to store.

Therefore, finding cost-effective solutions for this will be essential if such initiatives are to be successful. While turning to technologies such as cloud computing could be the answer for many businesses today, as data volumes continue to grow at an exponential rate, new and improved solutions may be required.

This is why developers at Harvard University have been working to develop new infrastructure that is able to cope with this influx of information and support critical research taking place throughout the institution.

James Cuff, Harvard assistant dean and distinguished engineer for research computing, said: "People are downloading now 50 to 80 terabyte data sets from NCBI [the National Center for Biotechnology Information] and the National Library of Medicine over an evening. This is the new normal. People [are] pulling genomic data sets wider and deeper than they’ve ever been."

He added that what used to be considered cutting edge practices that depended on large volumes of data are now standard procedures.

Therefore, the need for large storage capabilities is obvious. That's why earlier this year, Harvard received a grant of nearly $4 million from the National Science Foundation for the development of a new North East Storage Exchange (NESE). This is a collaboration between five universities in the region, with Massachusetts Institute of Technology, Northeastern University, Boston University, and the University of Massachusetts also taking part.

The NESE is expected to provide not only enough storage capacity for today's massive data sets, but also give the participating institutions the high-speed infrastructure that is necessary if data is to be retrieved quickly for analysis.

Professor Cuff stated that one of the key elements of the NESE is that it uses scalable architecture, which will ensure it is able to keep pace with growing data volumes for the coming years. He noted that by 2020, officials hope to have more that 50 petabytes of storage capacity available at the project's Massachusetts Green High Performance Computing Center (MGHPCC).

John Goodhue, MGHPCC's executive director and a co-principal investigator of NESE, added that he also expects the speed of the connection to collaborating institutions to double or triple over the next few years.

Professor Cuff noted that while NESE could be seen as a private cloud for the collaborating institutions, he does not expect it to compete with commercial cloud solutions. Instead, he said it gives researchers a range of data storage options for their big data-driven initiatives, depending on what they hope to achieve.

"This isn't a competitor to the cloud. It’s a complementary cloud storage system," he said.

What happened to the ‘data gravity’ concept?

25

Apr
2016
Posted By : admin Comments are off
hadoop spark services platform
Categories :#AnalyticsNews

A few years ago, one of the emerging thoughts in the data storage sector was the idea of 'data gravity' – the concept that the information a business generates has mass that affects the services and applications around it. The more data firms create, the more 'pull' it has on surrounding parts of the organisation.

The term was coined back in 2010 by Dave McCrory. In his original post, he spelled out how as data volumes grow, the effect they have on other parts of the IT environment becomes more pronounced – in much the same way that a larger planet or star exerts a greater gravitational pull than a smaller one.

Back then, when big data was still in its infancy for many companies, there was a great deal of uncertainty about the impact that growing volumes of data would have on a business, and Mr McCrory's concept helped get IT professionals used to the idea of data as having a tangible, real-world impact on how a firm operates.

These days, it's not a term that you hear very often. But why is this? It's not exactly the case that the concept hasn't worked out, but as big data technology has evolved, its rather been overtaken as the accumulation of vast quantities of data becomes the new normal for many firms – the influence has moved from local planet gravity to cosmos 'market' scale gravity.

When Mr McGrory first described the concept, tools like Hadoop were still a long way away, and the impact that the platform has had on the big data market has been huge. As a result, the notion that data has a 'pull' on just parts of the IT department has progressed to an enterprise level influence.

Many strategies are now more guided by ideas such as the 'data lake' – where all of a business' generated information is pooled into a central resource that businesses can dip into whenever they need it. Is this the ultimate evolution of the gravity concept – a data black hole – hopefully one where information escapes!

The idea of data having 'mass' that can affect other parts of the business hasn't gone away – it's just become the accepted truth, the norm, as more companies put data, and the information derived from it, at the heart of their activities.

Facebook

Twitter

LinkedId