Blog

Scale-out in-memory is a tough nut that can be cracked!

Interesting article published a week back in UK Computing “What’s holding back in-memory databases?”
computing.co.uk

It raises a series of points that can be summed up in “it’s tough to engineer” and “there are still misunderstandings”. The counter points are “with the right team and time it’s definitely doable” – Kognitio clients will attest to that – and “data in-memory does not equate to easily lost”. Let’s explore some of this.

The article concludes with a prediction from esteemed Gartner Analyst Donald Feinberg (“Hi Donald, we need to catch up soon”) stating ” …we are cycling back from horizontal, clustered servers to single servers”. Pah! Some vendors might be, probably because the’ve failed to solve those nasty little problems around controlling and reliably locking many units of memory, in dynamic operation, in a distributed environment, across variable networks, without deadlocking or overly fragmenting or plain just jamming up! Guess what, we solved that and many other similar nasties years ago. Multi-server scale-out for in-memory is entirely possible and available today. It took time and engineering pain (I remember them when they had hair) and customer patience (may be tolerance). Hana is much referenced in the article, it’s still relatively very young, and even with the significant investment by SAP, we still don’t hear much of large multi-node production clusters. Kognitio has one customer with clusters of 40, 36 and 26 nodes and a whole heap of smaller ones to boot, they are far from unique. Has the Hana team at SAP convinced Donald that it’s too tough and the inevitable fallback is just single server instances for in-memory? Again, Pah!

Data persistence and in-memory is always a challenging discussion, the inevitable fear of data loss clouds DBA thoughts. The article describes mechanisms to protect data, Kognitio offers its own where tables can be in-memory and on disk with synchronous data change – including transaction processing – we call this a table image and offer various options to select what relevant columns and rows need be in RAM at any given time, even though the user has access to the full table content – a fragmented table image. Even server failures are handled with no loss of data. For the bold you can go entirely memory-only but you do this when the data life-cycle is short or you can pull the requisite data back into RAM from source. For Kognitio data need not persist with in it, external tables can pull data, on demand, direct from Hadoop, S3, other EDW etc. i.e. build a data model in-memory on-demand directly pulling data from source. Too many traditional database people see data as being stored for years and thus must be protected, in our dynamic, big data, data science driven world, data model shelf-life can be measured in a few days or hours. Like business, analytics must keep moving on! With that in mind absolute persistence is less of a concern – as attested by the statement in the article “found a bedrock of adoption … in analytics”. What business wants is fast analytics, rapid discovery, quick proofs – in-memory data models with lots and lots of CPUs to do the work achieves that, if in doubt try us out.

Paul Groom, COO