Forum

Discussions specific to version 8.1
Contributor
Offline
User avatar
Posts: 48
Joined: Tue May 28, 2013 1:44 pm

Constraining how many connector instances I have

by anonymous2 » Wed Jul 10, 2013 11:45 am

I am trying to use the new external table functionality in KAP, which is really useful.

However, I'd like to constrain which nodes in my system can access the source of external data - I have a system with many nodes, and I don't want them all to have IP addresses on my network to allow them to access other systems. The volume of data I'm looking to move is not massive, so using a subset of the nodes should meet my needs.

I understand with e.g. the ODBC connector that I can use the splitexp attribute to either have one loader or one per RAM store, but how can I get more fine-grained control than that, and also restrict the loaders to a specific set of KAP nodes?
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Constraining how many connector instances I have

by markc » Wed Jul 10, 2013 11:51 am

First of all, you should identify the nodes which have external access. By default, the software assumes all nodes have this, so if only a subset of nodes can connect to external data sources, you should use wxviconf to edit the global config file and add the following to the [capabilities] section:

db_extern_nodes=<comma-separated list of node names>

Next, there are a number of generic attributes which can be used with all connectors to restrict which nodes are used during a load, and how many nodes / threads are used in total. Note these must still be specified in addition to the db_extern_nodes setting above - that alone will not restrict which ram stores try to connect to external data sources.

max_connectors: maximum number of ramstores that should participate in the load.
max_connectors_per_node: maximum number of ramstores per node that should participate in the load.
connectors_on_nodes/connectors_not_on_nodes: a space-separated list of hostnames to run (or not run) the loader threads on.
connectors_on_mpids/connectors_not_on_mpids: as above, but a space-separated list of mpids instead of hostnames.

For example, if you want to fetch rows from a file in HDFS with the HDFSCON connector, but you only want to connect to HDFS from two specified nodes, you might do this:

external table (c1 int, c2 date)
from hdfscon
target 'file /user/wxadmin/int_and_date.csv, connectors_on_nodes "hp-rack2-enc1-1 hp-rack2-enc1-2"'

We have also used e.g. the max_connectors attribute to improve operation in cloud environments which have problems if too many TCP connections are made when accessing HDFS data.
Reply with quote Top
Contributor
Offline
User avatar
Posts: 386
Joined: Thu May 23, 2013 4:48 pm

Re: Constraining how many connector instances I have

by markc » Wed Jun 10, 2015 9:13 am

Note also that using the restrictions above will impact performance, so the intention is to only activate them for connecting to legacy databases via the ODBC connector, connecting over the internet/over a WAN, etc. where performance is not expected to be high. For high-performance connections (e.g. accessing data in a Hadoop cluster), it is important to ensure full connectivity, and then the restrictions above should not be required.

Trying to connect to a Hadoop cluster using just a couple of nodes with external connectivity would be expected to give sub-optimal performance, and that should be rectified by ensuring all KAP nodes have connectivity to the Hadoop cluster.
Reply with quote Top

Who is online

Users browsing this forum: No registered users and 1 guest

cron