Blog

A Review of DataWorks Summit, San Jose 2017

19

Jun
2017
Posted By : Mark Chopping 0 Comment

The DataWorks Summit in San Jose was held on June 13-15, and this blog post summarises interesting talks at the event.

Keynote section

Sumeet Singh (Yahoo)

Sumeet talked about Yahoo’s migration from MapReduce jobs to those running on Tez on the 39K+ nodes that they use for Hadoop processing with over 1M jobs per day. In the last year, MapReduce jobs have dropped from about 70% of the workload to around 10%, with Tez moving in the opposite direction (Spark job level remaining consistent). This has also allowed improved utilisation of the compute and storage infrastructure.

In addition, updating the version of Hive in use has led to a significant lowering of latency for analytics jobs. The following slide shows … Read more

Read More

A Review of Strata Data Conference, London 2017

27

May
2017
Posted By : Mark Chopping Comments are off

The Strata Data Conference was held at the ExCeL in London this week. Sessions that were of interest to me included:

What Kaggle has learned from almost a million data scientists (Anthony Goldbloom, Kaggle)

This was part of the keynote on the first day. Kaggle is a platform for data science competitions, and have had almost 1 million users participate. Over 4 million models have been submitted to competitions, and this presentation covered off the traits Kaggle have seen for successful entrants.

In particular, for structured data the trend is for competitors to initially explore the data via histograms etc. to get a better understanding of it, then create and select features for use in their approach, which typically involves … Read more

Read More

Visiting my neighbour S3 and his son JSON

23

May
2017
Posted By : Chak Leung Comments are off

They also live across the river and there’s no bridge…so we’ll just make our own!

Amazon’s S3 is a popular and convenient storage solution which many, especially those with big data, tend to utilise, and the challenge can be connecting to this large store that has been building up over days/weeks/months/years. There are many ways to do this, no doubt, you could do a curl/wget but it’s in its raw form and converting it for use with databases isn’t always simple.
With Kognitio external connectors and tables you can connect to and parse it on the fly.

Let’s see how we can do this with JSON data and external tables in Kognitio where we’ll be able to use externally stored … Read more

Read More

The loneliest railway station in Britain

18

May
2017
Posted By : Graeme Cole Comments are off

In my last blog post, I introduced Kognitio’s ability to flatten complex JSON objects for loading into a table. Today we’ll look at another example using real-world Ordnance Survey data. We’ll also look at what you can do if the JSON files you need to load are in HDFS. We’ll use these techniques to solve the following example problem…

Which railway station in mainland Britain is the furthest straight-line distance from its nearest neighbour? The fact that the answer is Berwick-upon-Tweed may surprise you!… Read more

Read More

Hadoop… Let’s not throw the baby out with the bath water again!

03

May
2017
Posted By : Roger Gaskell Comments are off
rubbish

Here we go again! Suddenly the industry seems to have turned on Hadoop. Headlines saying “it’s hit the wall” and “it’s failed” have recently appeared and some are suggesting that organisations look at alternative solutions. Granted, Hadoop has its limitations and has not lived up to the massive hype that surrounded it a year or two ago, but then nothing ever does.

I admit I was not a fan of Hadoop when it first appeared; it seemed like a step backwards. It was very complicated to install, unreliable, and difficult to use, but still it caught the industry’s imagination. Engineers liked it because it was “proper engineering” not a shrink wrapped productionised product, and the business was seduced by the … Read more

Read More

External tables and scripts and user mapping

27

Apr
2017
Posted By : Ben Cohen Comments are off
external scripts, tables, user mapping, sql query

Introduction

Kognitio has two mechanisms for external programs to provide or operate on data during an SQL query: external tables and external scripts. These can be written in the user’s preferred language, for example R, Python, shell or even C.

External tables are used like normal tables in SQL queries but the rows are supplied by an external “connector” program rather than being stored by WX2. The connector could generate the rows itself (TPC-H data, for example) or fetch them from another database, HDFS or an internet data source.

External scripts are table functions in SQL which take a subquery “argument” and return a table. They allow users to create complex analytics or achieve functionality that the database itself … Read more

Read More

Hadoop’s biggest problem, and how to fix it

05

Apr
2017
Posted By : Mark Chopping Comments are off

Introduction

Hadoop was seen as a silver bullet for many companies, but recently there has been an increase in critical headlines like:

  1. Hadoop Has Failed Us, Tech Experts Say
  2. You’re doing Hadoop and Spark wrong, and they will probably fail
  3. Has Hadoop Failed? That’s the Wrong Question

The problem

Dig behind the headlines, and a major issue is the inability for users to query data in Hadoop in the manner they are used to with commercial database products.

From the Datanami article:

  • Hadoop’s strengths lie in serving as a cheap storage repository and for processing ETL batch workloads, Johnson says. But it’s ill-suited for running interactive, user-facing applications
  • It’s better than a data warehouse in that have all the
Read more

Read More

Simple performance checks against your hardware cluster

24

Mar
2017
Posted By : Simon Darkin 1 Comment
performance hardware cluster, cpu. benchmarks

Kognitio have a lot of experience commissioning clusters of new hardware for our MPP software product. As part of that process, we’ve developed a number of steps for validating the performance of new clusters, and these are the topic of this blog entry.

 

There are many Linux based benchmarking tools on the market however they are not usually installed by default, in which case some simple command line tools can be used to quickly establish if there is a potential hardware issue that warrants further investigation.    The following hardware components are covered by this topic:

  • CPU
  • Disk
  • Networking
  • RAM

 

CPU   

A slow CPU or core could have an adverse effect on query performance, and so with the use … Read more

Read More

Strata + Hadoop World – San Jose

22

Mar
2017
Posted By : Sharon Kirkham Comments are off

The Kognitio team had a great trip to Strata + Hadoop World in San Jose last week and we would like to say a big thank you to everyone who stopped by for a chat about getting enterprise level performance for their SQL on Hadoop. We look forwarding to hearing from you when you try out Kognitio on Hadoop.

At the start of the conference we released our benchmarking whitepaper in which Kognitio outperformed Impala and Spark in a TPC-DS benchmarking exercise. This proved to be of great interest and kept us all really busy on the stand. Conversations ranged from people who have been using Hadoop a while and are having problems serving data to their end-user applications

Read more

Read More

Participate in the Kognitio Console beta test program

10

Mar
2017
Posted By : Michael Atkinson Comments are off
Kognitio console, beta test program

Kognitio Console is Kognitio’s client side management program for the Kognitio Analytical Platform.

Some of its features are:

  • It allows inspection of the metadata tree for schemas, tables, views, external scripts, users, connectors, sessions, queues, etc.
  • It also gives an object view to each of these metadata objects, allowing their inspection and management.
  • There are also lots of tools, wizards and widgets to browse data in Hadoop, load and unload data, identify problem queries and many more.
  • There are also a set of reports and dashboards to monitor the state of Kognitio systems.
  • Macros may be written to extend Kognitio Console, the reports and dashboards are written in these XML macros.
  • Ad-Hoc queries may be executed.
  • KogScripts may be executed
Read more

Read More

Facebook

Twitter

LinkedId