Big data experts among top priorities for 2017 tech hiring


Posted By : admin Comments are off
181116 - Image credit: iStockphoto/BernardaSv
Categories :#AnalyticsNews

Individuals with proven skills and expertise in big data analytics will be among the top priorities for IT recruiters in 2017, along with those with knowledge of cyber security, a new survey has found.

Research by Jobsite and recruitment consultancy Robert Walters found nearly half (47 per cent) of hiring managers expect to increase the number of IT workers they employ in the next 12 months, Computer Weekly reports.

More than half of respondents (54 per cent) said individuals with cyber security expertise would be among their top priorities for the year ahead, while 36 per cent said those with skills in business intelligence and big data will be in high demand.

The study noted this reflects a growing awareness among employers about how their organisations can benefit from effective use of data.

Lee Allen, sales director at Jobsite, said: "As businesses look to increase market share and drive cost efficiencies, analysis of external and internal data is becoming more and more prominent."

Sectors that will show particularly high demand for big data expertise include manufacturing, media, automotive and FMCG, where employers will find competition for the top talent is intense.

Therefore, businesses will have to offer a range of incentives if they are to stand out from their competitors as appealing employers for big data pros. For example, nearly seven out of ten respondents (69 per cent) said they would be offering flexible working conditions, while 54 per cent will highlight opportunities for career development.

Ahsan Iqbal, associate director for technology recruitment at Robert Walters, said: "Competitive salaries will be essential to attract the best candidates, but employers shouldn't underestimate the importance of other policies, such as flexible hours, the option to work remotely and the potential for long-term career development."

Stephen Hawking: Big data vital to scientific advances


Posted By : admin Comments are off
171116 - Image credit: iStockphoto/kentoh
Categories :#AnalyticsNews

Big data analytics will be integral to some of the biggest scientific advances ever seen in the coming years as recognition grows of the potential of this technology.

This is according to Professor Stephen Hawking, who was speaking at the launch of Cambridge University's new Cantab Capital Institute for the Mathematics of Information (CCIMI) last week (November 10th).

He observed that in today's "dazzlingly complex world", it is essential that we are able to make sense of the vast amount of data in order to identify meaning among the noise. However, it is only now that organisations are recognising just how much data there is in any given domain, and what tools will be needed to make the most of it.

Prof Hawking said: "The power of information … only comes from the sophistication of the insights which that information lends itself to. The purpose of using information, in this context, is to drive new insight."

Another question will be what new mathematical tools are required to open up new fields of insight, which will be where the CCIMI will be focusing its efforts. "This is the heart of the Cantab Capital Institute: to drive forward the development of insight, and so enrich a multitude of fields of relevance to us all," he continued.

Echoing comments on artificial intelligence made earlier this year, Prof Hawking also stated: "It is imperative we get machine learning right – progress here may represent some of the most important scientific advances in human history."

The CCIMI is a collaboration between the Departments of Applied Mathematics and Theoretical Physics and Pure Mathematics and Mathematical Statistics. It will work across disciplines to develop new mathematical solutions and methodologies to help understand, analyse, process and simulate data.

Academics from the university will team up with economists and social scientists to develop advanced risk analysis tools for use in financial markets, as well as collaborate with physicists and engineers to explore software and hardware development security, and work with biomedical scientists concentrating on data science in healthcare and biology.

Cambridge University stated: "The advance of data science and the solutions to big data questions heavily rely on fundamental mathematical techniques and in particular, their intra-disciplinary engagement."

This will be at the forefront of the CCIMI's operations, which has been established with the help of a £5 million donation from Cambridge-based hedge fund management firm Cantab Capital Partners. Initially, there will be five PhD students based within the Institute in addition to faculty, and their work will encompass a range of applications across a variety of industry sectors and academic disciplines.

Big data and IoT to drive cloud market


Posted By : admin Comments are off
141116 - Image credit: iStockphoto/emyerson
Categories :#AnalyticsNews

Big data analytics and Internet of Things (IoT) deployments will be among the main drivers of cloud computing traffic in the coming years, which is set to rise nearly four-fold by the end of the decade.

This is according to Cisco's latest Global Cloud Index, which forecast that the total amount of traffic using the cloud is set to grow from 3.9 zettabytes in 2015 to 14.1 zettabytes in 2020. By the end of the forecast period, cloud technology is expected to account for 92 per cent of total data centre traffic.

Cisco attributed this rise to an increase in migration to cloud architecture due to its ability to scale quickly and support more workloads than traditional data centres. This is something that will be particularly important for businesses looking to increase their big data analytics capabilities.

The report noted that analytics and IoT deployments will see the largest growth within the business workloads sector, with these technologies expected to account for 22 per cent of workloads.

Globally, the amount of data stored is expected to quintuple by 2020, from 171 exabytes in 2016 to 915 exabytes. Of these, information for use in big data applications will make up 27 per cent of overall storage, up from 15 per cent in 2015.

By 2020, the amount of information created (although not necessarily stored) by IoT solutions will reach 600 zettabytes per year. This will be 275 times higher than projected traffic going from data centres to end users/devices and 39 times higher than total projected data centre traffic.   

However, the potential for even greater growth remains high, as large amounts of data generated that could be valuable to analytics operations will not be held within data centres. Cisco predicted that by 2020, the amount of data stored on devices will be five times higher than that in data centres.

This could mean IT departments need to rethink how they collate and process data when developing an analytics solution, as the tools they build may well be required to gather data from multiple sources in order to deliver effective results.

Doug Webster, vice-president of service provider marketing at Cisco, commented: "In the six years of this study, cloud computing has advanced from an emerging technology to an essential scalable and flexible part of architecture for service providers of all types around the globe."

He added: "We forecast this significant cloud migration and the increased amount of network traffic generated as a result to continue at a rapid rate, as operators streamline infrastructures to help them more profitably deliver IP-based services to businesses and consumers alike."

Facebook blocks social media algorithm to calculate quotes


Posted By : admin Comments are off
021116 - Image credit: iStockphoto/alexaldo
Categories :#AnalyticsNews

Facebook has blocked plans by a UK car insurer to use big data analytics to calculate quotes for customers based partly on information gathered from their social media postings.

Admiral unveiled the opt-in solution, called firstcarquote, that was intended to analyse the Facebook accounts of first-time drivers in order to identify personality traits that could be an indicator of how safe a driver they are likely to be, the Guardian reported. 

However, the social network has since refused permission for Admiral to proceed, as it was found to be in breach of the site's guidelines on how companies should use such information.

Admiral planned to scrape data from users' status updates and likes, with the company claiming it could lead to discounts of up to 15 per cent being offered to individuals identified as lower risk. Admiral also added that it will not be used to apply financial penalties to those deemed to be less safe, and no quotes will be offered that are higher than if the tool was not used.

The Guardian explained the algorithm would look favourably on posts that indicate users are conscientious and well-organised. For example, if a user uses short, concrete sentences, or arranges to meet friends at a specific time and place rather than just "later", these will be seen as positives.

On the other hand, overuse of exclamation points and words such as "always" and "never" may be taken as indications a driver is overconfident and could count against them.

The service was another example of how the insurance industry is looking to apply advanced big data analytics solutions to its decision-making, and take advantage of capabilities that allow it to gather and review very large sets of unstructured data, such as social media postings.

However, Facebook's rejection of the plans may serve as a reminder to businesses that they must take extreme care when using personal information as part of their big data analytics developments.

In explaining why it has blocked the firstcarquote project, a spokesman for Facebook said: "Protecting the privacy of the people on Facebook is of utmost importance to us. We have clear guidelines that prevent information being obtained from Facebook from being used to make decisions about eligibility."

Before Facebook blocked the service, leader of the firstcarquote project at Admiral Dan Miles sought to reassure those who may have privacy concerns, telling the Guardian: "It is incredibly transparent. If you don't want to use it in a quote then you don’t have to."

He added that the algorithm was "very much a test product" for the company as it seeks to explore the potential of what big data analytics can offer to the industry – as well as what its customers are prepared to accept in order to get lower quotes.

How can you get Hadoop integrated into your business?


Posted By : admin Comments are off
Categories :#AnalyticsNews

With more organisations looking to add big data analytics capabilities to their operations in order to take advantage of the huge amounts of information they have available, many firms will be examining which technologies will be the best options for their business.

One of the most popular choices for firms will be Hadoop, which is a tempting option due to its flexibility and ability to effectively manage very large data sets.

In fact, it has been forecast that by 2020, as many as three-quarters of Fortune 2000 companies will be running Hadoop clusters of at least 1,000 nodes.

But getting up and running with the technology will prove challenging for many businesses. Hadoop remains a highly complex solution that requires a high level of understanding and patience if companies are to make the most of it.

Therefore, it will be vital for organisations to develop and adhere to proven best practices if they are to see a return from their Hadoop investment. Several of these were recently identified by Ronen Kofman, vice-president of product at Stratoscale.

He noted, for example, that it is a bad idea to immediately jump into large-scale Hadoop deployments, as the complexity and costs involved with this open up businesses to significant risks, should the project fail.

However, he added that the flexibility and scalability of Hadoop make it easy to start small, with limited pilots, then add functionality as businesses become more familiar and comfortable with the solution. While it is straightforward to add nodes to a cluster as needed, it is harder to scale down should an initial development prove to be overly-optimistic.

"Choosing a small project to run as a proof-of-concept allows development and infrastructure staff to familiarise themselves with the inter-workings of this technology, enabling them to support other groups' big data requirements in their organisation with reduced implementation risks," Mr Kofman said.

Another essential factor to consider is how business manage the workloads of their Hadoop clusters. The open-source framework of Hadoop enables businesses to very quickly build up vast stores of data without the need for costly purpose-built infrastructure, by taking advantage of technology such as cloud computing

But if close attention is not paid to how these are deployed, it is easy to over-build a cluster. Mr Kofman said: "Effective workload management is a necessary Hadoop best practice. Hadoop architecture and management have to look at whole cluster and not just single batch jobs in order to avoid future roadblocks."

Organisations also need to maintain a close eye on their clusters, as there are many moving parts that will need to be monitored, and Hadoop's inbuilt redundancies are somewhat limited.

"Your cluster monitoring needs to report on the whole cluster as well as on specific nodes," Mr Kofman continued. "It also needs to be scalable and be able to automatically track an eventual increase in the amount of nodes in the cluster."

Other areas that IT departments need to pay close attention to include how data coming from multiple sources integrates, and what protections are in place to secure sensitive information. 

Getting all these right is vital if Hadoop projects are to be successful. With big data set to play such a vital role in the future direction of almost every company, being able to gather, process and manage this effectively will be essential.

Big data ‘still seen as IT-dominated projects’


Posted By : admin Comments are off
Image credit: iStockphoto/BernardaSv
Categories :#AnalyticsNews

The majority of chief information officers (CIOs) still view big data analytics as primarily IT-based projects, although there is a growing recognition of the impact it can have on business departments.

This is among the findings of a new survey conducted by recruitment firm Robert Half in Australia, which revealed that 54 per cent of CIOs surveyed say the technology's main impact comes in the IT department.

However, nearly one in five professionals (18 per cent) believe that the technology has more impact on their operations departments, while 16 per cent see key benefits being felt by their finance teams. 

This indicates there is growing awareness of big data analytics' potential to transform operations throughout all parts of a business, even though there is still work to be done to improve understanding of what the technology is capable of.

For instance, almost half (49 per cent) of CIOs felt that non-IT senior management do not have enough knowledge about big data to use it effectively. David Jones, senior managing director at Robert Half Asia Pacific, said this suggests many firms are still in the early stages of incorporating analytics into their processes.

"Big data has changed everything about the way business is done, but its value is still being optimised and harnessing its fullest potential is still considered a challenge for many businesses," he continued.

Mr Jones said: "Businesses have to take on an enterprise-wide approach to leverage the full potential of what big data has to offer and senior management plays a key role. A company's board and leaders need to be fully engaged about the impact data can have on its business operations and overall success."

He noted that the initial requirements for implementing big data, such as setting up new software and hardware systems, can demand a significant financial investment from organisations, which may make executives wary. However, once fully operational, advantages such as cost reductions can have a major impact on a business' performance.

The cost of capturing the necessary information for big data analytics was named by respondents as one of the biggest challenges of big data, with 46 per cent of CIOs citing this as an issue. This was followed by data protection and security issues (43 per cent) and the technical considerations of implementing big data processes (also 43 per cent).

Mr Jones therefore said that in order to get the most out of big data, companies are increasingly looking for technology professionals who do not only have proven skills in data analytics, but also have strong business and financial acumen. This will be essential if IT teams are to clearly explain to senior management the advantages and insights they can gain from the technology.

"In our increasingly data-driven world, using data to make informed, strategic decisions that benefit operations in all departments and impact a company's bottom line is crucial for any company," he added.

Effective use of big data algorithms ‘vital’ for retailers


Posted By : admin Comments are off
Image credit: iStockphoto/monsitj
Categories :#AnalyticsNews

It is essential that retailers are able to create effective algorithms to make the most of the large quantities of data they possess if they are to ensure competitive advantage.

This is according to Gartner, which said that this can help them cut costs and improve their top-line revenue in a digital economy where there is a huge volume of information available for analysis.

Speaking at the Gartner Symposium/ITxpo in Australia, principal research analyst at the firm Kelsie Marian said there are several examples where retailers who have acted aggressively to implement such solutions have seen strong results.

She noted this sector is particularly well-placed to take advantage of the technology, as it is traditionally a major hoarder of data, with many companies having years' worth of store-level data available to them. But while this has been used for activities such as demand planning since the 1980s, today's solutions need to be drastically different.

"Data is ubiquitous in the new retail environment, and retailers will survive only if quality data is embedded into every decision, minute by minute, across the retail organisation," Ms Marian said. "But retailers can't humanly scale to keep pace with growth of data, so a fundamentally different approach is necessary." 

Gartner therefore described the future for the sector as being 'algorithmic retailing', which it defined as the application of advanced big data analytics across a complex retail structure, in order to "deliver an efficient and flexible, yet unified, customer experience".

By 2020, the organisation forecast that leading firms in the retail sector will have embraced algorithmic approaches to planning their operations, which will lead the top ten companies to cut up to a third of their headquarters' merchandising staff.

Gartner highlighted several key areas where these advanced analytics are set to have an impact on retail operations.

One of the major operations will be to assist in determining the cost of goods sold. This is dependent on a number of factors, and is driven by the selection, assortment, pricing, promotion and inventory levels of items. Therefore, there is a great deal of potential for algorithms to improve performance, reducing overall costs and increasing top-line revenue.

Elsewhere, this technology can also help optimise how labour is deployed, improve customer service and improve the handling of back-office administrative tasks, from HR to distribution.

In order to take full advantage of the potential of big data and advanced algorithms, Gartner noted there are several steps that retailers must take.

Firstly, it recommended that chief information officers (CIOs) in the sector need to formulate a plan for identifying and classifying all sources of data within their business, as well as spot any gaps that need to be filled.

They also need to be prepared for the explosion in data generated by products, customers and stores that will be caused by the introduction of Internet of Things devices to their ecosystem.

Developing a framework to identify current and future opportunities where algorithms and automations can improve performance is also a must, as is reviewing how other retailers are using the technology.

"Retail CIOs and their teams play a pivotal role in helping business leaders understand the benefits and limitations of algorithms, and how algorithms can support their business goals," said Ms Marian.

Big data ‘set to lower healthcare costs’


Posted By : admin Comments are off
Image credit: iStockphoto/kentoh
Categories :#AnalyticsNews

The implementation of advanced big data analytics solutions in the healthcare sector could help significantly lower costs in the marketplace by changing the way treatments are developed, prepared and delivered.

This is according to a new study by Lux Research, which noted that soaring costs are a problem that is continuing to plague the industry, as previous efforts to address this have had little impact.

In the US, for example, it stated the introduction of the Affordable Care Act has had limited success in tackling the problem, while in the UK, it has been reported that the government is unlikely to direct additional funding towards the NHS in the upcoming Autumn Statement, despite political pressures to create a full seven-day service.

However, the emergence of new, advanced big data analytics solutions can help healthcare providers reduce their costs without resorting to cutting services, Lux stated.

Mark Bünger, Lux Research vice-president and lead author of the report, titled 'Industrial Big Data and Analytics in Digital Health', said: "Whereas solving many past healthcare problems seemed to be a matter of scientific discovery, health policy, or adequate funding, today's most pressing problems are due to a lack of information – or lack of understanding of what to do with it."

He added that big data solutions that meet these challenges are already delivering measurable benefits in terms of both cost and patient outcomes, while partnerships between large technology providers, pharmaceutical firms and academics are bearing fruit.

Among the findings highlighted in the report, Lux noted that big data can help providers offer more personalised therapies, which have the potential to greatly enhance the fight against some of the most severe diseases.

It stated that by studying molecular biomarkers and genetic profiles, cloud-based analytics enable decisions to be made faster, resulting in better outcomes and reduced costs.

Coupling big data with artificial intelligence (AI) also holds a great deal of promise for the healthcare sector, as it offers a more efficient way of analysing very large data sets.

Applications for this in the healthcare sector may include radiology, where AI can help doctors review patient images and CT scans and spot anomalies that may be missed by a human eye.

AI also has a key role to play in the development of therapeutic and caregiving robots, as well as other aids that help monitor cognitive function and diagnostics, Lux stated.

Elsewhere, big data analytics can also help hospitals cut costs by, for example, helping optimise resource allocation, both when it comes to direct patient care and other activities.

Lux added: "Cost gains come from semi-automated diagnostic tools and decision-support algorithms that help focus expensive interventions, medical equipment, and caregivers' time on the patients who need them most."

Finance and manufacturing lead the way for big data investments


Posted By : admin Comments are off
Image credit: iStockphoto/tonefotografia Created:
Categories :#AnalyticsNews

Businesses in the finance and manufacturing sectors will be among the biggest users of big data analytics solutions in the coming years as they strive to make the most of the huge amounts of information their activities generate.

This is according to new research by International Data Corporation (IDC), which found that banking, professional services, discrete manufacturing, process manufacturing and central government will account for almost half of global big data investments in 2016. They will remain the top five sectors for the technology until at least 2020.

Of these, banking will be the largest sector, both in terms of overall revenue and the fastest growth in spending. In 2016, this sector is expected to invest around $17 billion (£13.82 billion) in the technology.

Total spending on big data technologies is expected to grow by 11.3 per cent in 2016, reaching $130.1 billion. The market will then continue to see strong performance until 2020, by which time it is forecast to be worth more than $203 billion.

Over the next few years other parts of the economy that are expected to contribute to this strong growth in big data investments include telecommunications, insurance, utilities and transportation. However, they will be far from alone, as 16 of the 18 sectors examined by IDC are forecast to experience double-digit compound annual growth rates for big data projects between 2015 and 2020.

Jessica Goepfert, program director, Customer Insights and Analysis, at IDC, said: "In our end-user research, respondents from organisations in these industries are placing a high priority on big data analytics initiatives over other technology investments. Within banking, many of these efforts are focused on risk management, fraud prevention and compliance-related activities."

She added that for sectors such as banking and telecoms, improving customer experience will be at their heart of their big data investments. For example, she noted that technologies are increasingly being deployed in call centres to give agents the information they need to deliver the best possible service.

The primary drivers of big data analytics technology will be large companies – those with more than 500 employees. These organisations will generate revenues of more than $154 billion for the sector by 2020. However, small and medium-sized businesses should not be overlooked, as they will remain a significant contributor to the market. Overall, more than a quarter of big data revenue will come from companies with fewer than 500 employees.

Dan Vesset, group vice-president, Analytics and Information Management, at IDC, said: "The availability of data, a new generation of technology, and a cultural shift toward data-driven decision making continue to drive demand for big data and analytics technology and services."



Posted By : admin 1 Comment
kognitio benchmark tests
At the recent Strata conference in New York we received a lot of interest in the informal benchmarking we have been carrying out that compares Kognitio on Hadoop to some other SQL on Hadoop technologies. We have decided to formalise the benchmarking process by producing a paper detailing our testing and results. In the meantime, we will be releasing intermediate results in this blog. Preliminary results show Kognitio comes out top on SQL support and single query performance is significantly faster than Impala. Read on for more details.

It is clear from recent conversations that many organisations have issues using the tools in the standard Hadoop distributions to support enterprise level SQL on data in Hadoop. This is caused by a number of issues including:

  • SQL maturity – some products cannot handle all the SQL generated by developers and/or third party tools. They either do not support the SQL, or produce very poor query plans
  • Query performance – queries that are supported perform poorly even under single user workload
  • Concurrency – products cannot handle concurrent mixed workload well in terms of performance and give errors when under load

Bearing in mind the types of workload we have been discussing (primarily BI and complex analytics) we decided to initially concentrate on the TPC-DS benchmark. This is a well-respected, widely used query set that is representative of the type of query that seems to be most problematic. The TPC framework is also designed for benchmarking concurrent workloads.

Currently we are testing against Hive, Impala and SparkSQL as delivered in Cloudera 5.7.1 using a 12 node cluster. We will shortly be upgrading our test cluster to the most recent release of Cloudera before running the main benchmarks for the paper. We have also done some initial testing of SparkSQL 2.0 on a small HortonWorks cluster and plan to be including the Cloudera beta of SparkSQL 2.0 in the performance tests.

SQL Maturity

A common theme we’ve heard is that one of the major pain points in Hadoop adoption is the need to migrate existing SQL workloads to work on data in Hadoop. With this in mind we initially looked at the breadth of SQL that each product will execute before moving onto performance. We have categorised each of the 99 TPC-DS queries as follows

  • Runs “out of the box” (no changes needed)
  • Minor syntax changes – such as removing reserved words or “grammatical” changes
  • Long running – SQL compiles but query doesn’t come back within 1 hour
  • Syntax not currently supported

If a query requires major changes to run, it is considered not supported (see the TPC-DS documentation).

Technology Out of the Box Minor Changes Long Running Not Supported
Kognitio on Hadoop 76 23
Hive 1 30 8 6 55
Impala 55 18 2 24
Spark 1.6 39 12 3 43
Spark 2.0 2 72 25 1 1

The above table shows that many products have a long way to go and the step change in SQL supported in Spark 2.0 (from 1.6) shows the developers have recognised this. Kognitio and other technologies that are making the move from the analytical DWH space are at a distinct advantage here as they already possess the mature SQL capability required for enterprise level support.

Query Performance

The results shown right are for a single stream executing over 1TB of data but our goal is to look at concurrent mixed workloads typically found in enterprise applications.

As well as supporting all 99 queries (23 with small syntax changes) initial results for a single query stream show Kognitio is very performant compared to Impala. Kognitio runs 89 out of the 99 queries in under a minute whereas only 58 queries run in under a minute on Impala. However we recognise the real test comes in increasing the number of streams so watch this space as we increase concurrency and add Hive and Spark timings too.

sql on hadoop benchmark tests

A bit about how we run the tests

We’ve developed a benchmarking toolkit based around the TPC framework which can be used to easily test concurrent query sets across technologies on Hadoop platforms. We designed this modular toolkit to allow testers to develop their own benchmark test and are planning to make the toolkit available on github in the coming weeks once we have finished some “How to Use” documentation.

In progress and to come

As I write this we are still looking at a few interim results presented here:

1. Need to complete syntax changes for Hive so these figures may change in the final paper

2. The single query that is not supported by Spark 2.0 did execute but a Cartesian join was used leading to incorrect results.

We are planning to move on to full concurrent workloads in the next week and will publish these and the toolkit soon.