Blog

4th June – Preparing Data For Modelling and Validation Techniques

If Data Science Excites you, come to our next workshop on exploiting the MPP Analytical Platform with not-only-SQL processing

The upcoming Data Science Workshop: preparing data for modelling and validation techniques on June 4 at 8am PST / 11am EDT / 4pm GMT will cover a number of validation tasks that you may need to perform to ensure that the data meets modelling specifications. We will cover some generic cases such as pivoting data, creating dummy variables, zero variance predictors as well as specific ones like the underlying assumptions of regression:. Please see below for an overview of what Dr. Sharon Kirkham, Head Data Scientist at Kognitio and the Director of the Kognitio Analytics Center of Excellence (KACE), will aim to cover in this session.

This is the latest in a series of workshops Dr. Kirkham has hosted that receive plaudits from all who attend. This is a 90 minute session – well worth your time to attend.

Live interaction is encouraged, so please do come with questions.

Wednesday, June 4, 2014
8 am PDT / 11 am EDT / 3 pm GMT

REGISTER HERE!

What will be covered:

This month’s Kognitio class will focus on a number of validation tasks that you may need to perform to ensure that the data meets modelling specifications. We will cover some generic cases such as pivoting data, creating dummy variables, zero variance predictors as well as specific ones like the underlying assumptions of regression:

  • Pre-processing

Firstly some tests to validate some of the foundational characteristics of a dataset such as the probability distribution and spread of variance. Should these characteristics not agree with the model assumptions then we can apply manipulation techniques (such as transforming, scaling etc.) to make sure that they do, all whilst utilising the parallel capabilities of Kognitio.

  • Post Processing

Some validations require the model to be executed first as they require some element of the results, a common scenario is the residual analysis in regression and the selection of cluster numbers in k-means clustering. These two cases were seen parallelised in previous classes and we will demonstrate how to perform these in conjunction with the post-processing steps.

Remember this is an open session and questions are welcome. You can find much more information on the Kognitio User Forum for External Scripting – where modules are available to self-educate developers, programmers and data scientists on the technology.

Please click here to register for this virtual event.