Abstract

	SIDUS-AIR
Project start:
	1 April 2015
Project end:
	31 March 2019
More info (PDF):
	[[media: \| pdf]]
Contact:
	[[Josef Bigun]]
Application Area:
	[[]]
Involved internal personnel
Involved external personnel
Involved partners
	- University of Skövde, Örebro University, RISE Viktoria

AIR is a five-year research project financed by the KK-stiftelsen. The project is carried out by researchers at Halmstad University, University of Skövde, Örebro University and RISE Viktoria (formerly Viktoria Swedish ICT), from Sweden. The focus of the proposed distributed research environment is on action and intention recognition in human interaction with autonomous systems (or AIR, for short). More specifically, the focus is on the interaction of humans and autonomous systems that move in shared physical spaces.

The BIDAF project addresses challenges on several levels:

Platforms to store and process the data.
Machine learning algorithms to analyze the data.
High level tools to access the results.

All of these challenges must be addressed together, in order to enable end users to successfully perform analysis of massive data: (i) the hardware and platform level with the capacity to collect, store, and process the necessary volumes of data in real time, (ii) machine learning algorithms to model and analyse the collected data, and (iii) high level tools and functionality to access the results and to allow exploring and visualizing both the data and the models.

Challenge 1. To develop a computation platform suitable for machine learning of massive streaming and distributed data.

One of the important characteristics of Big Data is that it is often streaming or at least constantly updated. It typically originates from a large number of distributed sources, and is, like most real world data, inherently noisy, vague or uncertain. At the same time, due to sheer size, a scalable framework for efficient processing is needed to adequately take advantage of it. However, today’s Big Data platforms are not well adapted to the specific needs of machine learning algorithms:

Current platforms lack functionality suitable for analysing real-time, streaming and distributed data.
Machine learning requires storing and updating an internal model of the data. Current platforms lack suitable support for stateful computing.
The advanced processing in machine learning requires a more flexible computational structure than provided within the map-reduce paradigm of big data platforms, for example, iteration.

Challenge 2. To develop machine learning algorithms suitable for handling both the opportunities and challenges with massive, distributed, and streaming data produced in society.

A lot of recent research in machine learning, as a means to automatically sieve through large amounts of information, model it, and draw conclusions, are motivated by Big Data. Traditional machine learning algorithms are, however, not suitable for dealing with the opportunities nor the challenges that come with massive distributed and streaming data:

Many machine learning methods are designed for small training sets, trying to squeeze maximum out of them, usually by iterating over the examples many times. Then they use cross validation schemes to evaluate the methods on, again, limited amounts of data. With larger, especially streaming, data, there should be no need to iterate over the same examples for training and validation.
A large class of successful machine learning algorithms are sample based (e.g., kernel density estimators and support vector machines), meaning that the model increases in size as more data arrives. This can quickly become infeasible, and so there is a need for more compact models, ones that manage to catch the essence of the data, without out-of-control growth in size.
Most machine learning approaches assume a fixed training set, or possibly a batch-wise updated scenario, where training and usage can be separated. When new data arrives continuously, and the underlying reality changes constantly, the models need to gradually adapt. When the knowledge is to be used by many users, or by users with varying interests, a single model is not enough. There is a need for methods that can learn many models at the same time, each capturing different aspects of the data, and combine them in flexible ways to provide up-to-date, relevant knowledge.
Data coming from different sources and being of different types raises several uncertainty issues associated with it, such as the validity, precision, and bias of those sources. This again changes the analytics task in a qualitative way, and calls for principled methods to handle all those aspects throughout the full course of data processing. Machine learning algorithms needs to take this uncertainty into account when creating models, but also be capable of propagating it into their results.

Challenge 3. To provide analytics methodology and high-level interactive functionality, to make the value in massive data easier available to end-users.

Big Data Analytics is capable of highlighting interesting aspects and discovering things of which users are completely unaware: detecting deviations, anomalies and trends, analysing key values, relations and co-occurrences, as well as making predictions. A crucial aspect of Big Data Analytics is enabling end users to use machine learning solutions more efficiently. On the one hand, unrealistic expectations have to be addressed by clearer presentation of the models, their quality and applicability limitations. At the same time, the full capabilities of machine learning in the Big Data context need to be made available to those who can benefit from it the most. The solution is the combination of elevating the abstraction level of machine learning algorithms, increased interactivity, using better visualisation techniques, and engaging end users into the whole data analytics cycle.

SIDUS-AIR

Abstract

Challenge 1. To develop a computation platform suitable for machine learning of massive streaming and distributed data.

Challenge 2. To develop machine learning algorithms suitable for handling both the opportunities and challenges with massive, distributed, and streaming data produced in society.

Challenge 3. To provide analytics methodology and high-level interactive functionality, to make the value in massive data easier available to end-users.

Navigation menu

Page actions

Page actions

Personal tools

Research

Education

Partners

People

Contact

Links

Internal

Tools

Search