A Review of DataWorks Summit, San Jose 2017

Jun 19, 2017 | By Mark Chopping

The DataWorks Summit in San Jose was held on June 13-15, and this blog post summarises interesting talks at the event. Keynote section Sumeet Singh (Yahoo) Sumeet talked about Yahoo’s migration from MapReduce jobs to those running on Tez on the 39K+ nodes that they use for Hadoop processing with over 1M jobs per day….

Read more

A Review of Strata Data Conference, London 2017

May 27, 2017 | By Mark Chopping

The Strata Data Conference was held at the ExCeL in London this week. Sessions that were of interest to me included: What Kaggle has learned from almost a million data scientists (Anthony Goldbloom, Kaggle) This was part of the keynote on the first day. Kaggle is a platform for data science competitions, and have had…

Read more

Visiting my neighbour S3 and his son JSON

May 23, 2017 | By Chak Leung

They also live across the river and there’s no bridge…so we’ll just make our own! Amazon’s S3 is a popular and convenient storage solution which many, especially those with big data, tend to utilise, and the challenge can be connecting to this large store that has been building up over days/weeks/months/years. There are many ways…

Read more

The loneliest railway station in Britain

May 18, 2017 | By Graeme Cole

In my last blog post, I introduced Kognitio’s ability to flatten complex JSON objects for loading into a table. Today we’ll look at another example using real-world Ordnance Survey data. We’ll also look at what you can do if the JSON files you need to load are in HDFS. We’ll use these techniques to solve…

Read more