Kognitio can be persisted either directly in Kognitio, using Kognitio’s internal file system, or accessed directly from another system using Kognitio’s external table feature. Kognitio’s internal file system allows data to be loaded and stored internally within Kognitio effectively turning Kognitio into a self-contained Data Warehouse.
Users of Kognitio see data, whether held on disk or in RAM, as conventional database schemas comprised of a collection of tables. Any queries accessing disk-based tables will automatically pull just the data required (using predicate push-down to ensure optimal filtering of rows and selection of columns before sending to RAM) into memory to complete the query. Ideally, frequently accessed tables will be pinned into RAM as images and kept memory-resident for efficient repeated use (a big difference from a cache) until no longer required, when selected images can be dropped.
The technology used to store the internal file system varies depending on the platform on which Kognitio is installed.
Data persistence in Kognitio for standalone compute clusters
Kognitio, when deployed on a standalone compute cluster, has its own distributed linear file system. The file system is built using disk storage attached to each server in the Kognitio cluster. Its capacity is the sum of all the disk storage attached to each server. This storage can be formed from any locally attached disk on each individual server or from SAN or network attached storage. Kognitio software manages the distributed storage and provides resilience via RAID methods across groups of servers to protect against disk and/or server failures.
The file system is fully scalable and parallelized. Each discrete disk volume is accessed in parallel and the overall disk I/O bandwidth is directly proportional to the number of physical drives accessed e.g. if an individual disk can provide 100MB per second read rate then Kognitio exploits n × 100MB per second aggregate bandwidth, where n is the number of disk volumes.
When data is loaded into the Kognitio platform, the user can choose to load only to memory, to combined disk and memory, or just to disk.
Although Kognitio for standalone compute clusters has a capacity-based licensing model, the license only pertains to the available RAM. Specifically, there is no charge levied for data that is held on the optional Kognitio internal disk subsystem. Put simply, if a user has a system with 10TB of RAM but chooses to store 100TB of data locally on Kognitio’s internal disk, the user only requires a license for the 10TB of RAM.
Platform Requirements for Kognitio on standalone compute clusters
As has already been mentioned, Kognitio combines the full power of clusters of unmodified industry standard, low-cost server hardware into a single, high-performance analytical platform. No proprietary or special hardware is required.
As long as the servers are x64 or x86 based and suitably networked, Kognitio can run on them. For a high-performance high-capacity system, Kognitio recommends the use of servers with a high number of physical cores (8–32) and large amounts of RAM (128–512GB). Kognitio also recommends that each server has a minimum of dual 10Gigabit Ethernet network interfaces.
The requirements for the Kognitio platform operating system (OS) are very modest – no clustering or other specialist software is needed – no specific OS configuration or tuning is required. A standard base level installation of 64-bit Linux (typically RedHat or SuSe Enterprise Linux but other distros work just fine) on each server is all that is needed for Kognitio.
Data persistence in Kognitio on Hadoop
Kognitio on Hadoop uses the HDFS file system that is a default component of Hadoop and this can be used in exactly the same way as the Kognitio Standalone file system. However in most Hadoop installations the data is loaded and stored outside of Kognitio, in one of the many HDFS file formats and are accessed by Kognitio using the External Table. Kognitio’s internal file systems tends to be used just for the storage of meta-data, log-files and when required for persisting intermediate tables created by complex analytical scripts.
Data persistence in Kognitio for MapR
Kognitio on Hadoop supports Cloudera, Hortonworks and Native Apache Hadoop distributions. Kognitio has a separate version for MapR called Kognitio on MapR. This is because MapR has rewritten a number of the native Hadoop components to make them better suited to enterprise applications. Kognitio on MapR works in exactly the same way as Kognitio on Hadoop but uses MapR-FS rather than HDFS.
To fit with the predominantly open-source Hadoop market, Kognitio on Hadoop is free to use at any scale and with no functional restrictions. There is a range of optional paid support contracts that also include the ability to get patched version of older releases that are not available to non-supported users.
Kognitio Standalone and Kognitio on MapR are licensed products, but for systems with less than 512GB RAM no license is required and the software is effectively free-to-use. Above 512GB, a license key must be installed. The license price is simply based on the amount of memory (RAM) available to the Kognitio software. There are no restrictions on users, cores or data stored on disk. Support is available at a percentage of the license price.
All of Kognitio deployment options are available on-premises or in the cloud. This provides organisations with a much greater degree of flexibility in how and where they deploy solutions, and reduces their reliance on particular vendors. This is especially true for cloud deployments where many of the cloud-based big data analytics solutions are cloud only.
This means that the organisation is not only locked into a software vendor but also a cloud supplier. Cloud based computing can be very cost effective way of providing infrastructure for project work or for flattening peaks, but for permanent installations of any scale, quickly becomes very expensive. Kognitio’s flexible deployment allows organisations to select the right combinations of on-premises versus cloud and adapt quickly to changing requirements.