Adding or removing disks

Caution

This section refers to Kognitio Standalone only

During normal operation, Kognitio will ensure that data is distributed evenly over all existing disk resources. If the user wishes to add or remove disk resources from a running system, then all existing data must be redistributed evenly between the disk resources of the new configuration; this can be accomplished without the need to unload the data and reload it. The operation of adding or removing disk resources and automatically redistributing data as required is called reconfiguration.

The next sections describe the steps to be followed when adding or removing disks.

Adding disks to a system

  1. If one or more nodes are being added to the system, first follow the steps in Adding and removing nodes. If disks are being added to existing nodes, ensure that these disks are identical to existing disks, in terms of hardware (size, type, specifications etc) and configuration (e.g. partitions).

  2. Ensure all user images are removed from the system by running the SQL create system image;.

  3. Via Linux command line on a node, run wxadmin, then choose the server option, then the reconfigure option, then the quickadd option - the fulladd option is deprecated and should not be used. The server will prompt for confirmation, press enter to confirm the reconfigure.

During reconfigure, the system will be restarted several times. Users will not be allowed onto the system during this time. When reconfigure is complete any required RAM images will need to be recreated. After the reconfigure completes, you will need to run a background Deskew operation to populate the new disks with data.

Removing disks and/or nodes from a system

Before removing disks or nodes, note that:

  • If Kognitio software RAID is enabled, all RAID clusters must be either left intact, or removed entirely. The Kognitio software will check that this is the case.

  • There must be enough free disk space on the disk resources that are NOT being removed to store all non-deleted data currently held on the removed disk resources.

  • All compressed data maps will be deleted along with their statistics. New statistics will need to be gathered, and the data maps rebuilt once the reconfiguration is complete.

  • The steps required to remove disks will take a time comparable to running a reclaim on the same amount of data.

  • Removing disks will remove some historic information, so it is not possible to run an incremental backup after this operation until another full backup has been taken.

Follow the below steps to remove disk or node resources from a Kognitio system:

  1. Ensure all user images are removed from the system by running the SQL create system image;.

  2. Identify the MPIDs of the disk stores controlling the disks to be removed. This can be done with wxprobe -l in Linux.

  3. Set the login mode with the SQL command set parameter adm_login_mode to 5. this ensures that only localhost sessions connecting as SYS are allowed.

  4. Ensure there are no other connections to the system and submit the SQL command RECONFIGURE DOWN mpid-list;, where mpid-list is a space-separated list of the MPIDs that were identified in step 2. The system will then run the reclaim algorithm to decide which data needs to be kept. That data will then be sent, round-robin, to the disk resources that will remain after the reconfiguration. The operation is aborted immediately if there is not enough disk space available for the redistribution. Wait for the reconfigure down to finish.

  5. Run the wxserver halt Linux command.

  6. Ensure that the removed disk resources are not going to be used by WX2 - typically by stopping the smd on the nodes being removed by running wxsvc stop on each of those nodes. However, if only a subset of disks on each node is being removed, or if the nodes are going to remain as RAM-only nodes in the system, edit the local config file on each node and provide a list of the remaining disk resources under [system] with partitions=<comma separated list of remaining disk resources>.

  7. Run wxserver start to bring the downsized system back up. This will automatically set adm_login_mode to 0, removing the restrictions set in step 3 previously.

  8. The nodes no longer being used can now be removed or have their system_id changed and their SMDs restarted.

  9. If nodes have been removed, ensure that the list of Kognitio node addresses used by ODBC to handle the initial Kognitio connection request is still correct. For example, any DSNs in /opt/kognitio/wx2/etc/odbc.ini should be updated to no longer connect to the removed nodes.

Deskew

Kognitio provides a background deskewer which should be activated after adding disks to a system. The deskewer will move existing data onto the new disks, until data is evenly distributed over all disks. This operation completes in the background, allowing other queries and sessions to run at the same time. There are two parameters which control the deskewer:

  • redist_count: Tables with less than this many rows will not be considered for background deskew. The default value is 15,000.

  • redist_threshold: Tables with less than this percentage of skew will not be considered for background deskew. Skew is defined as the percentage less rows of data on the disk resource with the fewest rows, compared to the disk resource with the most rows. The default value is 101, which means that by default the background deskewer will be inactive. For example, if table T has 100,000 rows on the disk resource in the system containing the most rows of T, and 90,000 rows on the disk resource containing the fewest rows of T, then T will be processed by the background deskewer if redist_threshold is <= 10.

To trigger a deskew after adding disks, set the redist_threshold parameter to a lower value, e.g. 20, then back to 101 when the deskew has completed

set parameter redist_threshold to 20;
-- wait for deskew to complete --
set parameter redist_threshold to 101;

Deskewing works on the total number of rows in a table, including deleted rows, so it is possible for an initial deskew to not result in a table being perfectly balanced. Subsequent deskew operations will resolve this issue. Consider a scenario with 2 disk resources in a system, labelled A and B. A has 100 million rows for table T, and B has 0 rows for table T, however the first 20 million rows of T on A are all deleted. If a deskew operation is run on T, this will result in the first 50 million rows being considered for migration to A, but only 30 million will actually be migrated as the 20 million deleted rows will be ignored. So after the initial deskew, A will have 50 million rows and B 30 million rows. However, a subsequent deskew operation will result in A and B both having 40 million rows. Note also that at the end of deskew, the rows migrated from old disks to new disks are no longer scanned when doing table scans, but they still exist on the old disks. A reclaim / repack operation must be used to remove the old rows from the old disks, and balance out disk usage across the system. Unlike reconfigure, deskew does not invalidate any compressed data maps.