Automatic crash recoveryΒΆ

In the event that the Kognitio software crashes, it is possible to setup the system to automatically restart, possibly after generating process dumps for later investigation. There are ways of doing this. The first is to simply enable reliability features in the [wxsmd] section of the config file

[wxsmd]
reliability_features=yes

The second method is to add the following to the config file under [wxsmd events]

[wxsmd events]
on_server_crash=exec /tmp/serverdown

and then create an executable file /tmp/serverdown on one of the nodes, for example

#!/bin/sh

# use the wxdgtool to dump off processes for later investigation
/opt/kognitio/wx2/current/bin/wxdgtool -i -o /tmp/wx2_crashinfo_`date +%F`_`date +%T`.txt

# now restart the server
wxserver start

and finally synchronise the file /tmp/serverdown across all the blades with

wxsync -S /tmp/serverdown

Whichever method is followed, in the event of a crash the master smd daemon should dump off a small amount of info just specifying which nodes crashed, but not anything large such as core files. Then the server should restart automatically.