Spotify has become one of the world's largest music streaming sites since its founding inRead More
Spotify explains how Hadoop boosts innovation
Spotify has become one of the world's largest music streaming sites since its founding in 2008. In the years since, it has grown to have more than 60 million active users and 15 million paid subscribers, who can enjoy a library of around 25 million songs.
A large portion of this success can be attributed to the innovative way it manages its data – not only in the way it streams music to listeners, but how it learns about its customers and tests new products. The key to this is how it has embraced Hadoop from the very start.
Speaking to Computing magazine, product manager at the company Josh Baer explained Spotify had an advantage over many other firms because it adopted big data analytics tools from the start, rather than having to integrate them with existing solutions.
"For most companies that have legacy systems their biggest challenge is how to get data into Hadoop. We are very fortunate that we had Hadoop from the very beginning. We grew up on Hadoop," he said, adding the company was developing the concept of a data lake before anyone else had even started called it that.
The result of this is Spotify is in a good position to boost its innovation through analysing data and identifying where improvements can be made.
For example, the firm uses details from the Spotify application and external sources for real-time analytics to help improve its algorithms, test new products and fix bugs.
Mr Baer explained Spotify has introduced the concept of 'hack weeks', where its developers can work on something completely difference for a short time, which has led to several good new features. but even when ideas don't pan out, the time is not wasted.
"Some ideas that seemed to be cool in reality don't get used much, so we look at the data and we fail fast. We celebrate the failure," he said.
However, Mr Baer said Spotify is still keen to get even faster when it comes to how it handles data. He highlighted Apache Hive, Hadoop's data warehousing software, as an area that demands particular attention in this regard.
If an analyst is asked to find out which city has the most Justin Bieber fans, for instance, they will create an algorithm to examine the firm's data. this might take 40 minutes, which is not fast enough for Spotify's needs. "We want Hive to get a lot better," Mr Baer said.