Why Hadoop benefits from its modular approach

While Hadoop can offer businesses a powerful tool to help make the most of their large data sets, one common complaint from users is that the technology is too complex and has a steep learning curve, which acts as a barrier to more widespread adoption.

However, although efforts are being made to improve the usability of the platform, one expert has argued that the inherent complexity of Hadoop in not necessarily a bad thing.

In an interview with Tech Republic, vice-president of products at Cloudera Charles Zedlewski said this is a reflection of the fact that Hadoop was never intended to be a static platform, but a living, evolving one that is able to incorporate a variety of modular components and embrace rival offerings.

"Change has always been a part of the Apache Hadoop ecosystem," he said. "The rate of change stems from the fact that the Apache Hadoop ecosystem embraces modularity and continuous competition for membership in the overall platform."

While this continual change does have its challenges, the benefits of this approach far outweigh the drawbacks.

Mr Zedlewski compared the development of Hadoop to other open-source technologies, in particular Linux. Like Hadoop, this has a range of distributions offering new innovations and approaches, though it is a more advanced solution. For instance, Red Hat Enterprise Linux and Ubuntu are comprised of more than 100 open source projects, while Cloudera Hadoop currently has around 25.

Over the years, Linux distributions have regularly added components that extend its functionality, such as web servers, developer libraries, browsers, databases and graphical user interfaces, and introduced components that compete with preexisting components.

Similar enhancements are expected to be seen for Hadoop as the platform matures in the coming years, and its open source, modular nature will make this easier to implement.

"Modular open-source communities tend to be better incorporators of good new ideas, because the substitution of one component in the stack does not obviate the overall platform," Mr Zedlewski continued.

By comparison, he noted that "in all-in-one technologies, new functionality gets adopted because it's tightly conjoined with a popular technology, not because it's great functionality in and of itself". As a result, more and more features fall short of best in class as time passes, and the system as a whole begins to atrophy.

Hadoop's modular approach is also more appealing to contributors, as they will have the freedom to make their own technical decisions and create innovations without worrying about how they will integrate with other parts of the system.

Of course, there are drawback to this approach, of which the complexity it creates is one of the biggest, as the overall platform can feel less like a coherent, single-vendor effort. The fact that Hadoop is constantly evolving can also create confusion among users and analysts.

However, by partnering with the right experts, Mr Zedlewski noted businesses can mitigate the impact of these challenges and help make their deployment of Hadoop as straightforward and easy to use as possible.