Software RAID

Caution

This section refers to Kognitio Standalone only

The Kognitio platform’s storage networks may be able to provide sufficient fault tolerance to guarantee data integrity in the event of a component failure. If the storage networks do not offer such guarantees, then Kognitio can be configured to provide the required fault tolerance using software RAID techniques.

In practise, this will normally not be necessary on a Kodoop system, since the underlying HDFS will usually have it’s own fault tolerance built in. This guide assumes a Standalone system is being used.

RAID is short for Redundant Array of Independent (or Inexpensive) Disks, and is a set of techniques that employ two or more drives in combination for fault tolerance and performance. Kognitio uses RAID 5, which distributes the parity information between all the disks in a cluster.

RAID 5 operates over clusters of disk resources. Typical cluster sizes are 2 or 4, although other sizes can be used. Note that:

  • All clusters in the system must be the same size.

  • The cluster size must be an exact divisor of the total number of disk resources available.

  • To make best use of RAID, it is recommended that clusters span multiple nodes, with no more than 1 disk resource per node. This will protect you against a node failing.

  • When adding more nodes to a system, you must add at least as many nodes as the cluster size if you want to use disk resources on those nodes (so with a clustersize of 4, you must add at least 4 nodes at a time if you want to use the disk on them).

  • A system can tolerate one disk in a cluster being marked as ‘bad’. If two or more disks in the same cluster are marked as bad, and neither disk can be restored, then the system will need to be restored from a backup. Here, ‘bad’ means unusable - this could be due to a disk failing, or due to a node losing network connection to the other nodes, or some other reason.

  • If one disk in a cluster fails (only the disk, not the node), then the system will not suffer data loss and will continue running, recreating data on-the-fly from the rest of the cluster. This will result in performance degradation, although the system will still be usable.

  • Note that although a system with RAID will tolerate a missing disk without having to restart, this is not the case for a missing node. If a node fails and cannot be recovered, then the system can be restarted with the node missing. This is only possible if the node does not host more than one disk from the same cluster.

  • Using RAID will reduce usable disk capacity. A cluster size of 2 will reduce usable disk by 50%, while a cluster size of 4 will reduce it by 25%

  • RAID cluster size can only be changed by recommissioning a system

Enabling Kognitio RAID

Warning

Enabling Software RAID will erase all the existing contents of the database!

In order to enable RAID 5 on Kognitio, it is necessary to edit the global configuration file and then restart the database, (the following assumes a cluster size of 4 is required):

  1. Log on as the wxroot Kognitio administrative user

  2. Run wxviconf, which will open a copy of the global configuration file. Any changes made to this file will automatically be replicated to all the nodes when wxviconf exits

  3. Add the setting raid_cluster_size=4 under the [boot options] section in the global configuration file. It is also recommended to add the parameter virtual_diskstores=1 under the [boot options] section - see <link>

  4. Save and exit the editor in the normal way, which should result in a message confirming that the edited nodes have been restarted

  5. Run wxadmin and choose the server/newsys option

At the end of this process, Kognitio will be available again with RAID 5 enabled. The available disk capacity with have been reduced by approximately 25%.