LinkedIn offers open-source Hadoop plugin

Professional social networking site LinkedIn has announced it is offering an open-source Hadoop project that aims to help businesses build environments that run across multiple jobs.

In a post introducing the Gradle tool plugin, senior software engineer at the company Alex Bain explained that LinkedIn designed this solution to help facilitate work across a number of Hadoop application frameworks. The plugin is now available on Github.

He noted that since there is no 'one-size-fits-all solution' that is perfect for every type of job, Hadoop applications at LinkedIn are written using a number of different frameworks, depending on which is more appropriate for the task at hand. However, this can create challenges when it comes to organising Hadoop projects in a consistent manner.

The Gradle plugin was therefore developed to assist with this, by enabling developers to extend their build system by defining their own plugins. This helps individuals to more effectively build, test and deploy Hadoop applications.

Such solutions are particularly valuable for data-intensive businesses such as LinkedIn, which will typically be running many jobs within a workflow.

Mr Bain said: "Long before the Hadoop Plugin, Hadoop developers at LinkedIn had realised that writing individual Hadoop jobs was only part of the challenge in using Hadoop effectively. Most data-driven features that appear on LinkedIn are actually generated by processing pipelines that may consist of dozens of individual Hadoop jobs chained together into workflows."

For some of the larger data processing workflows, this could involve hundreds of job files. To deal with this, some developers started creating their own home-grown tools to help manage their workflows. 

But as these were written using a mix of frameworks, they prevented LinkedIn from completing its company-wide migration to Gradle, and over time became increasingly fragile and difficult to maintain.

"To solve these problems, we developed the Hadoop DSL, which is included with the Hadoop Plugin," Mr Bain said. "The Hadoop Plugin and Hadoop DSL have been embraced as the standard way to develop Hadoop workflows at LinkedIn. If you are writing Hadoop jobs using Gradle as your build system, you should definitely consider using the Hadoop Plugin. It will save you time and energy in developing your Hadoop workflows."