How can you keep your data lake as clean as possible?

27

Jul
2016
Posted By : admin Comments are off
260716 - Image credit: iStockphoto/Pixtum
Categories :#AnalyticsNews

One of the key trends in big data analytics for the last couple of years has been the concept of the 'data lake'. The idea behind this is to place all a business' incoming data in a single location, from which it can be studied at will.

But while this may seem like a simple idea in principle, the reality may often be far different. If organisations are not careful with how they manage this, it can quickly become clogged with poor-quality information, irrelevant details and inaccuracies, which could see a firm's data lake ending up looking more like a swamp.

So how can this be avoided? In a new report, Constellation Research explained that many businesses fail to appreciate that a data lake should not be viewed as a replacement for a traditional data warehouse, which is able to support predictable production queries and reports against well-structured data. 

Instead, it noted: "The value in the data lake is in exploring and blending data and using the power of data at scale to find correlations, model behaviors, predict outcomes, make recommendations, and trigger smarter decisions and actions."

Where many poor implementations fail is if a business does not put in place a clear structure to order the data within their lake. There may be an assumption that simply deploying a Hadoop framework is enough to create an effective data lake, but in reality, this is not the case.

Constellation Research vice-president and principal analyst Doug Henschen, who authored the report, noted that despite it's name, it will be a mistake to consider a data lake as a single, monolithic repository into which data can be dumped without thought or planning.

Instead, they should look to split their data lake into 'zones' based on the profile of a particular piece of information.

"If Hadoop-based data lakes are to succeed, you'll need to ingest and retain raw data in a landing zone with enough metadata tagging to know what it is and where it's from," Mr Henschen wrote.

For instance, businesses should set up zones for refined data that had been cleansed and is ready for broad use across the business. There should also be zones for application-specific data that can be developed by aggregating, transforming and enriching data from multiple sources. A zone for data experimentation was also recommended.

This will not be an easy goal to achieve, Mr Henschen stated, as it will require businesses to pay much closer attention to data as it enters the company, as opposed to simply ingesting everything and then looking to categorise it later.

Although the Hadoop community has been working on a range of tools to help with the  ingestion, transformation and cataloging of data, many IT professionals are still not hugely familiar with these. However, Mr Henschen said there is good news on this front, as a broader ecosystem has emerged around Hadoop, aiming to tackle the problems associated with managing data lakes.

How are retailers making the most of big data?

26

Jul
2016
Posted By : admin Comments are off
260716 - Image credit: iStockphoto/emyerson
Categories :#AnalyticsNews

One part of the economy that's been particularly quick to embrace the potential of big data is the retail sector. Given the large amounts of customer information these firms collect as a matter of course, being able to feed this into an advanced analytics platform in order to gain insight is a natural fit.

Therefore, forward thinking retailers were some of the first adopters of big data analytics technology, and have developed their innovations into mature solutions that can give them a leg-up over competitors. But what does this look like in the real world?

Datanami recently highlighted several key use cases for big data that retailers are employing. While some are straightforward, there are also some more complex solutions in place that companies are using to understand the market and offer the best products and service.

For starters, product recommendation is a key area for big data. This is particularly popular among ecommerce retailers, as it is a relatively simple use of the technology, but one that can have a big impact. By using machine learning techniques and historical data, smart retailers can generate accurate recommendations before the customer leaves their site.

Eric Thorston, Hortonworks‘ general manager for consumer products, told Datanami: When you think about recommendations, everybody wants to beat Amazon. Love them or hate them – most retailers hate them – Amazon makes from 35 per cent to 60 per cent revenue uplift on recommendations, and everybody is saying, How can we get a piece of that?

But this is just the tip of the iceberg when it comes to what big data can offer retailers. For instance, another common use case for the technology is market basket analysis. Looking at which groups of products are commonly purchased together is an activity that has been carried out manually for decades, but with the advent of tools such as Hadoop, retailers can automate the process and delve much deeper into their data.

In the past, such activities may only use a small sample of customers, with receipts going back one or two years. But big data can greatly expand this, offering companies much more accurate results they can use to inform future strategy.

Big data is also a major benefit when it comes to analysing unstructured data, such as social media posts. In today's environment, any company that does not listen to its customers on platforms such as Twitter and Instagram will be missing out on a huge amount of potentially valuable information, and retailers are no exception.

Tools such as Hadoop use natural language processing to extract information from these channels and play a critical role in helping firms understand their audience. However, Mr Thorston warned this is an activity that must be conducted carefully.

"The minute you make a wrong move, you lose," he said. "The obligation is to use it judiciously. That prevents the misuse and that also preserves and supports and aligns to the ultimate goal, which is customer intimacy, customer loyalty, increased revenue, and increased margin."

These are just a few examples of how big data can help retailers. In addition to this, processes such as price optimisation, inventory management and fraud detection all stand to benefit from the technology.

How the aviation sector is embracing big data

22

Jul
2016
Posted By : admin Comments are off
220716 - mage credit: iStockphoto/Maxiphoto
Categories :#AnalyticsNews

The aviation sector has always been a leader when it comes to technology, so it should be no surprise that the industry has been quick to embrace the potential of big data.

In many ways, it's no surprise that there are many opportunities for data to have an impact. With millions of people taking to the skies every day, this gives airlines a huge pool of resources they can use to improve services and determine trends. 

Meanwhile, the advent of Internet of Things sensors offers airlines greater ability to conduct activities such as predictive maintenance, as well as giving manufacturers more insight into what is happening on the factory floor.

Among the companies looking to take advantage of this is Boeing, which has just announced a new agreement with Microsoft that will see it use the technology firm's Azure cloud computing platform to run a range of analytical operations.

Big data analytics operations that will benefit from this include real-time information on purchasing and leasing aeroplanes and engines, as well providing customers as route planning, managing inventory, and maintaining fleets.

For instance, Boeing's advanced airplane health solutions are currently used on more than 3,800 airplanes operating around the globe and allow customers to use real-time data to optimise operational performance, fuel use, maintenance, and supply chain performance. Meanwhile, nearly 13,000 aircraft a day benefit from digital navigational tools. 

The manufacturer also claims that the use of big data-based crew scheduling applications can reduce the costs of these operations by as much as seven per cent.

Kevin Crowley, Boeing vice-president of Digital Aviation commented: "Boeing's expertise and extensive aviation data resources coupled with Microsoft's cloud technology will accelerate innovation in areas such as predictive maintenance and flight optimisation, allowing airlines to drive down costs and improve operational efficiency."

Elsewhere, budget airline Ryanair has also announced a new partnership with visual analytics firm Qlik this week, which is says will consolidate data from across the company and allow employees instant insight into what it going on within the business.

It hopes that in the future, the use of this data will allow the airline to make better business decisions in time-sensitive departments such as flight and ground operations, as well as improving the services they offer to passengers.

For instance, Ryanair aims to boost its in-flight retail offering, as well as helping to optimise the supply chain by understanding the anticipated passenger mix on a given flight and matching this with an appropriate range of products and sufficient stock for the flight.

"We're building a complete overview of what’s going on across the business and it is playing a major role in the way we are evolving the services we offer to customers," said Shane Finnegan, senior BI developer at Ryanair. "Ultimately, we want to find the best ways to make our customers happy on-board, while being able to offer them the lowest fares on the market."

Insurance sector ‘must be careful’ in how it uses big data

21

Jul
2016
Posted By : admin Comments are off
210716 - Image credit: iStockphoto/cifotart
Categories :#AnalyticsNews

The increased use of big data in the insurance sector to conduct more personalised risk analysis and tailor quotes accordingly could lead to the create of a new 'underclass' of consumers who struggle to secure coverage unless the industry treats the technology with care.

This is the warning of a new report from the Chartered Institute of Insurance (CII), which said the use of the technology could result in some people being refused insurance altogether if they are deemed to be too risky.

The Financial Times reports that big data analytics is increasingly being viewed as a key part of the future of insurance due to its ability to give providers a more complete picture of their customers, thereby leading to a more personalised service.

Much of the discussion surrounding this so far has focused on the ability to offer discounts on premiums for activities such as careful driving and healthy lifestyles, but at the other end of the scale, this personalised approach could leave people priced out of the market.

"While in some cases this may be to do with modifiable behaviour, like driving style, it could easily be due to factors that people can't control, such as where they live, age, genetic conditions or health problems," the report stated.

Therefore, insurers need to be very careful about how they approach the use of big data analytics. While there are undeniable benefits to the technology, insurers must be wary about the extent to which they rely on this.

"Data is a double-edged sword," said David Thomson, director of policy and public affairs at the CII. "The insurance sector needs to be careful about moving away from pooled risk into individual pricing. They need to think about the broader public interest."

He added that if the industry cannot ensure that coverage is available to everyone – particularly in areas such as health insurance – intervention from the government may be required.

This has already been seen in some areas, such as home insurance. The Financial Times noted that improved mapping and data analysis has allowed insurers to much more accurately identify homes and businesses that are at highest risk of flooding.

This led to complaints from many people that cover became unaffordable for these areas, so the government created the Flood Re organisation, which aims to lower the cost of insurance for people living in high-risk areas.

"Regulators are trying to catch up on this issue," said Mr Thomson. "So there is a huge emphasis on insurers to guard their own reputations and business models. As in banking, algorithms can be good and bad."

At the moment, there are some restrictions on the data insurers can take into account when calculating premiums. Health and life insurers, for example, cannot use predictive genetic test results under an agreement between the government and the Association of British Insurers. This is currently set to expire in 2019, although a review is due next year that could see it extended.

How can you ensure the quality of IoT data?

14

Jul
2016
Posted By : admin Comments are off
How can you ensure the quality of IoT data? Image credit: iStockphoto/cifotart
Categories :#AnalyticsNews

One of the biggest trends affecting the analytics sector at the moment is the emergence of the Internet of Things (IoT) as a key source of data.

Over the next few years, the number of IoT devices is set to explode. By the end of 2020, Juniper Research forecasts there will be some 38 billion such items in use, a threefold increase since 2015.

But while this will present huge new opportunities for businesses to apply big data analytics to the information generated in order to gain valuable insight, it does pose a range of risk as well.

Although questions such as privacy are well documented, one issue that is frequently overlooked is the quality of the data itself. Businesses may assume that because the incoming data will be taken directly from sensors, there will be little that can go wrong with it, but in fact, this is not necessarily the case.

It was noted by Mary Shacklett, president of Transworld Data, that one issue that may frequently affect the quality of IoT data lies in fundamental flaws in the way the embedded software used in the devices is developed.

She explained in an article for Tech Republic that historically, developers of this software – which runs machines, produces machine automation, and enables machines to talk to one another – did not always employ the same methods as they would for more traditional apps.

"This meant that detailed quality assurance (QA) testing on the programs, or ensuring that program upgrades were administered to all machines or products out in the field, didn't always occur," Ms Shacklett stated. 

The result of this could be significant for big data operations. If an undetected flaw in an IoT device's embedded software results in inaccurate data being generated, this could lead to an erroneous analytics conclusion that has a major impact on the business.

Although this is changing as more manufacturers mandate strict compliance and QA testing from their embedded software developers – with sectors such as automotive, aerospace, and medical equipment leading the way due to their high quality standards, for now, this remains a risk that must be considered when using IoT data.

To counter this, Ms Shacklett highlighted two key steps to ensure the quality of this information. Firstly, she noted that users must monitor their generated data closely, and immediately investigate any unusual readings, which also need to be reported to the appropriate teams.

For instance, "if the team charged with end responsibility for machines/devices sees anything unusual with the data, immediate action should be taken on the floor, and they must report back to the analytics team that a potential problem could affect data".

Organisations also need to ensure that vendors are kept in the loop, on both the analytics and machine side. It was noted that as hardware and software is never perfect, there may be some instances where data might be skewed by a known issue that a machine manufacturer or IoT provider is experiencing.

Machine learning a key focus for big data initiatives

12

Jul
2016
Posted By : admin Comments are off
Image credit: iStockphoto/Pixtum
Categories :#AnalyticsNews

A large number of companies will look to introduce machine learning capabilities as part of their efforts to exploit big data in the coming years, a new survey has found.

Research by Evans Data found more than a third of big data developers (36 per cent) now use some elements of machine learning in their projects. While the market for this is still largely fragmented, the financial and manufacturing sectors are showing particular interest in the technology, as are businesses looking to take advantage of Internet of Things opportunities.

Janel Garvin, chief executive of Evans Data, explained that machine learning encompasses a range of techniques that are rapidly being adopted by big data developers, who are in an excellent position to lead the way and show what the technology is capable of.

“We are seeing more and more interest from developers in all forms of cognitive computing, including pattern recognition, natural language recognition, and neural networks and we fully expect that the programs of tomorrow are going to based on these nascent technologies of today.”

The most used analytical model that links in closely with artificial intelligence and machine learning development was found to be decision trees. This was followed by linear regression and logistics regression were as next most cited analytical models.

Logistics, distribution, or operations were the company departments found to be most likely to be using advanced data analytics or big data solutions.

Among the survey’s other findings, it was revealed that two-thirds of big data developers are spending at least some of their time instrumenting processes. Meanwhile, 42 per cent are embracing real-time data analytics, while 38 per cent are building capabilities to analyse unstructured data.

The top improvement to data and analytics that developers would like to see is the improved security of off-site data stores.

Most firms set to boost investment in real-time analytics

04

Jul
2016
Posted By : admin Comments are off
Categories :#AnalyticsNews

The vast majority of companies in the retail, technology, banking, healthcare and life sciences sectors will be investing in real-time analytics tools for studying human and machine-generated data.

According to research conducted by OpsClarity, 92 per cent of these organisations expect to increase their focus on streaming data applications within the next 12 months.

To do this, almost four-fifths of respondents (79 per cent) will be reducing investment in batch processing tools, or even eliminating these entirely, as they shift their resources to real-time analytics.

Dhruv Jain, chief executive and co-founder of OpsClarity, stated that the ability to study data in real-time can give businesses a significant competitive advantage in today's digital economy, allowing them to become more agile and innovative.

"With new fast data technologies, companies can make real-time decisions about customer intentions and provide instant and highly personalised offers, rather than sending an offline offer in an email a week later," he said. "It also allows companies to almost instantaneously detect fraud and intrusions, rather than waiting to collect all the data and processing it after it is too late."

One of the key use cases for this technology will be to enhance customer-facing applications. OpsClarity noted that businesses are now able to leverage insights gleaned from multiple streams of real-time data in order to enable timely decisions and responses to queries. 

This type of real-time analysis is now being built directly into customer-facing, business-critical applications. A third of survey respondents (32 per cent) said their real-time solutions would be used primarily to powercore customer-facing applications, whereas 29 per cent will be focusing on improving internal processes.

Almost four out of ten professionals (39 per cent) said they would be deploying real-time data analytics for both purposes.

Jay Kreps, chief executive and co-founder of Confluent, added that real-time data and streaming processes are becoming a central part of how modern businesses harness the information available to them.

"For modern companies, data is no longer just powering stale daily reports – it's being baked into an increasingly sophisticated set of applications, from detecting fraud and powering real-time analytics to guiding smarter customer interactions," he continued.

One of the most popular solutions for handling real-time data is Apache Kafka, with 86 per cent of software developers, architects and DevOps professionals using the open-source message broker.

Mr Kreps noted that this provides a real-time platform for thousands of firms, including major companies such as Uber, Netflix and Goldman Sachs.

Meanwhile, Apache Spark is the data processing technology of choice for 70 per cent of businesses, while 54 per cent use HDFS data sink.

Although there are a wide range of data framework being deployed, and strong indications that many of these will be here to stay for the foreseeable future, the survey revealed a strong preference for open source technologies.

Nearly half (47 per cent) of software developers, architects and DevOps professionals say they exclusively use open source, and another 44 per cent use both commercial and open source.

How big data supports 2016’s summer of sport

30

Jun
2016
Posted By : admin Comments are off
Categories :#AnalyticsNews

There are now countless examples of how big data can help companies across all industries improve decision-making, boost customer service and give employees a better insight into the wider industry. But the technology is far more wide-ranging than many people realise.

In fact, big data will have a key role to play in several of this summer's biggest sporting events. 2016 is a big year for international sports, with Euro 2016 and the Copa America Centenario taking place on either side of the Atlantic, before the Rio Olympics gets underway in August.

But many of this summer's events will be heavily reliant on big data, both to help team and competitors improve their performance, and keep fans up-to-date on what's going on.

For example, one event that's set to greatly increase its use of big data this year is the Tour de France. With almost 200 riders traversing 3,535km of French countryside, strong TV, radio and online coverage is essential for the fans following along.

This year, they will have a lot more information and insight into the event thanks to big data. Tech Week Europe reports that Dimension Data – which is not only delivering information to race organisers the Amaury Sport Organisation (ASO), but sponsoring its own team – will be providing a huge range of information.

Last year, the firm analysed up to six billion bits of data for every stage, turning it into information to help contextualise the race, and in 2016 it's set to review even more.

Adam Foster, the company's head of sports, said: "This year, we're working with a much broader palette, which means access to more meaningful race data, race routes, riders and current weather conditions. What's exciting this year is the ability to deliver all this information to ASO through a unified digital platform."

Having real-time access to multiple video feeds, social media posts and live race information in a single intuitive interface will "greatly enhance" the coverage of the event, he continued.

However, it is not just the Tour de France where big data will have an expanded role to play this year. 

Forbes noted that Wimbledon, which got underway on Monday (June 28th), will be turning to IBM's Watson analytics and machine learning platform to analyse the hundreds of thousands of social media mentions generated by the event.

Alexandra Willis, head of communications, content and digital at the All England Lawn Tennis and Croquet Club, explained: "This allows us to not just look at and respond to trends, but to actually pre-empt them. We're hoping this will help in our quest, not necessarily to always be first but certainly to be early into the conversation when critical things are happening."

In theory, she said this should enable the club to monitor interest in a particular court of player and pre-empt any emerging trends before they become apparent on services like Twitter. This will help it curate content for its media output based on what its audience is most likely to be interested in, rather that reacting to trends after the fact, as has been the case in previous years.

Younger workers most optimistic about big data

28

Jun
2016
Posted By : admin Comments are off
Image credit: iStockphoto/cifotar
Categories :#AnalyticsNews

Companies remain highly optimistic that big data analytics solutions will have a transformative effect on the way they do businesses, but younger employees are far more confident than their older counterparts, a new survey has found.

IDG's 2016 Data and Analytics Survey found that more than half of businesses (53 per cent) plan to implement big data initiatives within the next 12 months, are are already undergoing such as process.  

Overall, 78 per cent of employees agree or strongly agree that the collection and analysis of big data has the potential to fundamentally change the way their company does business in the next one to three years. Meanwhile, 71 per cent agree or strongly agree that big data will create new revenue opportunities and/or lines of business for their company in the same timeframe. 

However, a generational gap is emerging between younger workers who are enthusiastic about the technology and older employees who take a more cautious view. Those aged between 18 and 34 are far more likely than older workers to have a positive view of big data and its potential to transform a business.

"These age-linked differences may be attributable to younger employees being more comfortable with the latest technologies and more inured to the inevitability of technology-driven disruption," IDG stated.

However, it also suggested that as older workers will have seen many hyped developments come and go over their careers, they are less willing to predict that any particular trend will be a source of fundamental change, even one as far-reaching as big data.

The survey also examined the sources of data for use in analytics operations. It found that the average business gathers more than half of its data (54 per cent) from internal sources, while 25 per cent comes from external sources, with 21 per cent being a combination of the two.

The top sources of data for all companies, regardless of size, are sales and financial transaction (56 per cent), leads and sales contacts from customer databases (51 per cent), and email and productivity applications (both 39 per cent). 

IDG's survey did note that the types of data firms focus on differ depending on the size of the companies. Larger enterprises are more likely to collect transactional data, machine-generated/sensor data, government and public domain data, and data from security monitoring. However, smaller businesses concentrate their efforts on email, data from third-party databases, social media, and statistics from news media. 

One of the biggest issues for companies of all sizes will be handling unstructured data, such as emails, word documents and presentations. Due to their disorganized model and lack of a pre-defined database, deriving insight from this sources will prove difficult for many firms.

This may be why just 17 per cent of firms view unstructured data as a primary focus for their big data analytics initiatives, while nearly half (45 per cent) rate it as one of their main challenges.

IoT and cloud ‘the future of Hadoop’

24

Jun
2016
Posted By : admin Comments are off
Iot, cloud storage, hadoop, big data
Categories :#AnalyticsNews

The creator of Hadoop, Doug Cutting, has said that cloud computing and Internet of Things (IoT) applications will be the basis for the next phase of growth for the platform.

So far, most deployments of the big data analytics tool have been in large organisations in sectors such as finance, telecommunications and internet sectors, but this is changing as more use cases emerge for the technology.

Much of this is down to the growing use of digitally-connected sensors in almost all industries, which are generating huge amounts of data that businesses will need to quickly interpret if they are to make the most of the information available to them.

Mr Cutting highlighted several major companies that have already adopted HAdoop to help them handle this huge influx of sensor data.

"Caterpillar collects data from all of its machines," he said. "Tesla is able to gather more information than anyone else in the self-driving business, they're collecting information on actual road conditions, because they have cars sending all the data back. And Airbus is loading all their sensor data from planes into Hadoop, to understand and optimise their processes."

One sector that is on the verge of a revolution in how it manages information is the automotive industry, as a growing number of cars are being equipped with IoT sensors and networking capabilities.

Mr Cutting noted that almost every new car now sold has a cellular modem installed, while almost half of new cellular devices are not phones, but other connected items.

Until now, Hadoop has often been deployed as a key component of a 'data lake', where businesses pool all their incoming data into a single, centralised resource they can dip into in order to perform analytics. However, use cases for IoT typically have a need for data to be exchanged rapidly between end-devices and the central repository.

Therefore, there has been a focus recently on the development of new tools to facilitate this faster exchange of information, such as Flume and Kafka.

Mr Cutting particularly highlighted Apache Kudu as having a key role to play in this. He said: "What Kudu lets you do is update things in real-time. It's possible to do these things using HDFS but it's much more convenient to use Kudu if you're trying to model the current state of the world."

He also noted that while the majority of Hadoop applications are currently on-premises, cloud deployments are growing twice as fast, so it will be vital that providers can deliver ways to embrace this technology in their offerings.

"We are spending a lot of time on making our offerings work well in the cloud," Mr Cutting continued. "We're trying to provide really powerful high-level tools to make the lives of those delivering this tech a lot easier."

Facebook

Twitter

LinkedId