Head and heart for Big DataUni Research Computing
Big Data is about very large volumes of data and advanced systems, but it is just as much about people.
By Andreas R. Graven
Mathematician and physicist Klaus Johannsen is definitely one of them.He heads a completely new Centre for Big Data at Uni Research. Without people with specialised knowledge, the huge volumes of data available today are quite worthless, says Johannsen. He is research director at Uni Research Computing, and has expert knowledge of data-driven research and mathematical modelling.
“Businesses and companies can buy as many new systems as they like. They can collect masses of information without any magic formula. What is needed is the expertise to handle large volumes of complex research data. That’s where we come in. We have the specialists, and we have built up a unique system that provides for optimised results,” says Johannsen.
New solutions and opportunities
He realised early on that the ability to find key information in large volumes of data would be sought-after.
“My heart has long been set on a Centre for Big Data, and now we have finally got it. Big Data will be very important in Norway in the coming years. That is true both for researchers and for industry, in the hunt for new solutions, knowledge and commercial opportunities,” says Johannsen.
Johannsen already had a lively interest in data back in the early 1980s, but this ambitious would-be programmer’s first encounter with a computer was a disappointment.
As a young student, he tested a Commodore VIC-20 with around 11 KB of RAM, a machine that would have struggled to produce this article. The possibilities were severely limited. Time had to do its work for the young Johannsen, and it duly did.
Results and advances
In recent decades the world has experienced one computing revolution after another.
Now we are almost drowning in data.How this data can be processed, and what it can give the public and private sectors by way of new and unexpected results and advances, are among the big questions in the field.
Klaus Johannsen and seven staff at the newly-established Centre for Big Data Analysis at Uni Research aim to find out. When the board approved the establishment of the Centre in the autumn of 2014, Johannsen and colleagues had already been studying the potential of Big Data for two years. The system the researchers use is the Apache Hadoop framework.For Johannsen and his colleagues, a lot hangs on setting this tool up to process research data in the best possible way.
This is a big job, because although Hadoop is designed to store and process masses of information, it is not made for data from the world of research.
“We are developing strategies to enable Big Data to be used in both research and industry, and we have our hands full adapting the system for research data,” says Johannsen.
He is among those who believe that we are on the brink of a paradigm shift in the way data can be processed, and that we will see completely new opportunities and innovations based on Big Data.
In the right place at the right time
While Johannsen was frustrated in the 1980s at the lack of computing power and gave up programming until the start of the 1990s, he now seems to be in the right place at the right time.
Contrary to what you might think, Big Data is not so much about getting your hands on as much computing power as possible.
“We actually spend just 10% or our money on hardware. Big Data is mostly about what people are able to get out of the data, and what possibilities they discover,” says Johannsen.
With his background in both physics and applied maths, he takes an experimental approach in which he is concerned to learn from what he sees.
Johannsen believes that Big Data will change research.
“Research will not be just about theories derived from small volumes of data, but increasingly about the way large volumes of data can be exploited,” he says.
“Of course the approach from abstraction through theory to integration will still exist, but we will have a Big Data approach as well. That is more practical, and you can run analyses on large volumes of data based on actual events,” says Johannsen.
For example: You can say in theory that there are 1 million ways of describing a problem, so it can be hard to choose the right way. But when you use Big Data, it turns out that there may only be 40 ways.
“So we may end up finding that reality can actually be a lot simpler in many cases than the complexity we are likely to be working with at a theoretical level,” says Johannsen.
He cites weather forecasts as an example of this trend:
“In this field, there has been a particularly big improvement in the implementation and analysis of Big Data in the models meteorologists use to forecast the weather.
Good, important climate data
In the climate field in particular, Uni Research Computing is sitting on what Johannsen describes as the best set of data for wind conditions in the North Sea.
“We have 25 terabytes of data that can tell the wind power industry where it would be advantageous to establish wind farms, where there are wind shadows and how the weather conditions could develop over ten years. We can help them to examine whether it makes financial sense to invest in wind farms. Researchers may have their theories, but industry wants results and does not need to concern itself with the way things work at the theoretical level,” says Johannsen.
He also has 800 terabytes of climate data that can be run through Hadoop.
“This is data that will be relevant to the IPPC, political decision-makers and private industry. We are also working on projects in the humanities and the social sciences,” says Johannsen.
Problem and solution – at the same time
In the future, he intends to focus on developing technical solutions, programs, data processing and new applications for Big Data. With large data volumes, much of the job is to find the needle in the haystack, but what if you don’t know what the needle is, then what do you do?
“One thing is that you want to find relevant information, but you have so much data that you can’t do it. The next problem is to know what actually is relevant information; even if you see it, you don’t necessarily know whether it is important.
That is why we have to work with Big Data in such a way that we look for both the problem and the solution at the same time – you just don’t know which you will find first,” says Johannsen.
June 22, 2015, 3:38 p.m.