I was recently asked about running sentiment analysis over various forms of customerRead More
This is the story of Kognitio’s migration from on-premise IT infrastructure to cloud computing.
Kognitio develop an analytical platform that can run on clusters of commodity servers.
As Hadoop gained acceptance, Kognitio ported their software to run on Hadoop. As part of that port the software could run on Amazon Elastic Map Reduce (EMR), which is basically a version of Hadoop in Amazon cloud.
This gave Kognitio initial experience in using Amazon Web Services (AWS), as part of developing and testing the EMR version of our software.
The benefits over on-premise solutions were obvious:
1. It provided a quick and easy way for prospects to evaluate the software. Prospects didn’t need to get commitment from their IT department to create a small Hadoop cluster for testing, or to deploy new software on an existing production Hadoop cluster.
2. It was possible to go from nothing to a working cluster ready to load data in about 20 minutes.
3. On-demand clusters could be run for about $14 per TB RAM per hour, or around $3 per TB RAM per hour using spot pricing.
4. It was easy to scale up. Running e.g. a 5TB RAM cluster using spot instances was immediately achievable. Previously we’d have to reserve a lot of dedicated internal servers for this sort of exercise.
The potential downsides were:
1. Networking was significantly worse than dedicated on-premise servers. At the time, the best AWS systems had bandwidth of about 12% of that seen with on-premise servers per TB RAM.
2. The marginal cost of running an on-demand system 24×7 was higher than the equivalent on-premise system. This could be reduced somewhat with a long-term commitment in AWS, but would still have been more expensive than the equivalent on-premise solution. It would also have sacrificed some of the flexibility from moving to cloud.
Having seen the rewards to be reaped from cloud adoption, the next steps were to migrate our own on-premise infrastructure to cloud.
As a provider of an MPP data processing solution, we had many hundreds of commodity servers for use by product development, QA, and presales. We also had a number of servers for internal infrastructure – file servers, build servers, etc.
A number of factors made this infrastructure ideally suited for a move to cloud:
1. Some of the kit was old. So the power, cooling and other hosting costs associated with it were significantly higher than one would expect for newer hardware.
2. A lot of the kit was only used for a small percentage of hours in a week:
We decided to migrate as much infrastructure to cloud as possible. We chose to use AWS as that was an environment we were familiar with as a result of the work with Amazon EMR mentioned above. We also wanted to take advantage of spot pricing (see later).
Given the factors mentioned in (2) above, we expected costs to be mitigated from not having systems running 24×7, and ensuring systems were shutdown / terminated when not in use.
We also planned to use spot instances for at least development and QA work. Here the price benefits outweighed any inconvenience from occasionally losing instances (something which in practice happened very rarely).
We knew we’d incur higher costs in some areas, such as moving our Linux file systems to AWS given they would be running 24×7. However, such systems were a very small percentage of the total overall cost both on-premise and in cloud.
Our first steps were to migrate some key bits of infrastructure to AWS. This included a Linux file system and servers for building, running git, bugzilla and jenkins for development, and for generating software licences for customers.
A lot of these components, such as git, had been built on our existing VMWare infrastructure. These could just be picked up and deposited in the cloud using a Lift and Shift approach, then we’d change the network routing and carry on using them as before.
Other components didn’t transfer so easily. Particularly things we had already imported from physical hardware into VMs on-premise, as Amazon VM import tools couldn’t always convert them. Our wiki was one of these examples, and in this case we had to rebuild the service and import the data into it. In situations like this we also took opportunities to e.g. split up services which had been co-located on one server to use a dedicated (and hence smaller) Amazon VM after the migration.
We then developed some simple scripts for launching multi-node systems for development and QA purposes (Kognitio software is an MPP offering, so it is relatively rare to launch a system comprising of a single node). This replaced the RDP infrastructure we had on-premise for deploying to physical servers.
This gave us tremendous flexibility compared to our previous on-premise infrastructure. We could trivially launch large systems with any Linux distribution, different node types, etc. We could also do competitive analysis with other products, which made things like benchmarking much more straightforward. For example, we were able to compare Kognitio on Hadoop against other SQL on Hadoop offerings, as you can read at https://kognitio.com/blog/how-different-sql-on-hadoop-fare-in-99-tpc-ds-test-queries/.
Having these scripts to simplify the process of launching systems was critical in ensuring that developers and other staff were not averse to shutting down systems when they weren’t using them. With more friction on spinning up resources, the temptation would be for people to hold onto nodes for longer “just in case” they wanted to use them later in the day.
Following that we moved other bits and pieces to AWS as and when it made sense – for example, our QA infrastructure.
Later we were able to restart all our core AWS infrastructure to…
…all in less than two hours.
As always with this sort of migration, there were some teething troubles along the way, and some other learnings subsequently:
1. We had some short-term issues during the transition phase. At this time we had a hybrid environment with some resources in AWS and some on-premise. There were some communication glitches in that period which were only relevant whilst we were partially migrated to AWS. The learning from that is to minimise such transition periods. Avoid lots of phases which lead to problems only seen during the transition process itself.
2. Hitting AWS limits for our accounts:
3. We found a lot of places that had hard-coded references to IP addresses, or hard-coded paths that had to change as part of the migration.
4. Occasional oddities such as the problem described at https://stackoverflow.com/questions/31783160/why-vim-is-changing-first-letter-to-g-after-opening-a-file which we saw when moving to Amazon Linux for our development.
5. Changes in how Amazon handle spot, which you can read about at https://blog.spotinst.com/2017/11/29/everything-spot-instances-reinvent-2017/. The previous process of bidding for spot instances went away (where the highest bids would get instances). Now the price changes gradually. You can still bid, but everyone who gets nodes gets the gradually changing price. So you cannot put in a high bid to take an instance from an existing spot user at a price still lower than the on-demand price, which was how things operated before. If all the nodes are used up, you either wait for someone to finish with nodes, or have to use on-demand rather than spot. One consequence of this is that we tend to use more of the older generation nodes (e.g. R4 nodes), as usually you can get spot instances of those without having to wait.
There have been additional, unanticipated benefits of cloud, including:
1. We have been able to show customers and prospects how to run Kognitio on AWS, to give them another deployment option. This is particularly useful for development/testing/project requirements. Here the time and cost to deploy new infrastructure on-premise or with a traditional hosting provider, can be prohibitive.
2. We have found issues when running our product on different Linux distributions, as it is now so much easier to try a Linux distribution of choice. Our nightly QA can use a wide range of Linux distributions, and those can be changed very simply.
In addition, our original focus on a product for Hadoop was hampered by broader issues in Hadoop adoption. Hadoop proved difficult for organisations to deploy. Once deployed, people were wary of changing production systems, having suffered many Hadoop issues already. So a lot of the expected Hadoop market has moved straight to cloud to mitigate against the complexity of Hadoop. Fortunately, we moved to cloud as a company at the same time, which served us well by ensuring we soon had a cloud product available.
We still have some dedicated hardware for a variety of purposes. Migration of that as and when required/practical is something for the future:
1. Windows domain infrastructure, which we haven’t tried to migrate.
2. Dedicated hardware for product testing with high bandwidth networking unavailable in AWS. We have customers using on-premise systems with excellent networking, and no way to test that sort of environment in AWS currently.
3. To allow us to backup our AWS storage outside AWS, ensuring we have a copy outside that cloud environment.
Migrating from on-premise to cloud for the workloads we’ve chosen has been an unexpectedly straightforward journey from my viewpoint. Although I didn’t have to do the implementation!
If you would like more information on anything discussed in this article, please contact me via the comments section.
More information on using Kognitio in AWS is here, including a link taking you to the AWS Marketplace page for Kognitio.