Forum

Discussions specific to version 8.1
Contributor
Offline
User avatar
Posts: 11
Joined: Wed Jun 26, 2013 12:32 pm

Hadoop and CPU

by schneider » Fri May 01, 2015 12:30 pm

Hi,

It's probably not a simple answer but in exploring the possibilities of utilising our existing hardware Infrastructure to support a Hadoop I'm interested in the abilities of KAP to share system resources.

I believe disks could be shared by simple allocation during system installation (Kognitio could be apportioned a tiny fraction if HDFS is used to permanently house the data)

I also understand the limitations that can be placed into the Kognitio configuration to limit the RAM made available to KAP, presumably something similar can be done on the Hadoop cluster to grow and reduce RAM availability based on requirements (eg. How intensive are your Map/Reduce jobs). I understand KAP to reserve ~90% of RAM resources on the DB nodes leaving ~10% by default.

What about CPU, will KAP recover gracefully when a poorly written or intensive Hadoop job is executed hogging available CPU on 1 or many nodes. Will it behave similar to standard SQL query that hogs all the CPU of 1 node and recover gracefully when said Hadoop job is killed or finishes its processing?

I'm concerned about the support teams ability to monitor performance issues in KAP if analysts were allowed to run intensive Hadoop jobs in the same cluster.

-Anthony
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Hadoop and CPU

by markc » Fri May 01, 2015 1:08 pm

If Hadoop (or anything else) hogs all the cores on a node, that will impact the Kognitio software, with Kognitio processes on the "hogged" node taking much longer to handle their portion of any query, and hence increasing the query time accordingly.

You should be able to see this behaviour with Hadoop tools and address it, or by using "wxtop -N" which shows all busy processes across the Kognitio system whether they are Kognitio processes or other (e.g. Hadoop). However, tracking the problem back to the offending job is likely to be easier with Hadoop tools.

To reduce/avoid the impact, you could isolate the nodes running DB processes from the ones doing heavy Hadoop processing. For example, with the MapR Hadoop distribution we've seen people deploy Kognitio software on a subset of the nodes, shut down the MapReduce task tracker tasks on those nodes, but leave the HDFS components running to allow Kognitio and Hadoop to co-exist on a platform whilst avoiding the worst effects of Hadoop processing on the Kognitio software. With MapR you can also e.g. control the amount of memory used by mfs via the mfs.cache.memory setting in the /opt/mapr/conf/mfs.conf file as described at http://doc.mapr.com/display/MapR/mfs.conf (otherwise it defaults to 20% of the system's memory).
Reply with quote Top
Contributor
Offline
User avatar
Posts: 11
Joined: Wed Jun 26, 2013 12:32 pm

Re: Hadoop and CPU

by schneider » Tue May 05, 2015 11:02 am

Thanks Mark.

Am I correct in assuming there is no way to adjust the RAM available to Kognitio without a restart of the database and subsequent loss of all data that was residing in RAM? (re-image required)
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Hadoop and CPU

by markc » Tue May 05, 2015 11:06 am

You are correct that currently you cannot adjust the RAM Kognitio is using without restarting the DB and rebuilding RAM images from scratch.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 2 guests

cron