Fast AI

Project Department: Uni Research Computing (group: Center for Big Data Analysis) period: until 31.12.16

About the project

Fast AI (or "Embarrassingly parallel execution framework") enables existing applications and pieces of code, such as shell scripts, Fortran math libraries, R scripts or the popular Java-based collection of Machine Learning algorithms for data mining Weka, to be executed by a Spark framework on Hadoop clusters without or with less modifications.

In the case of Weka for example, the idea is to have customized Map and Reduce Wrappers of Weka classes for the Apache Spark distributed computing framework, as well as a set of utilities and a command line interface that enable the execution of the code through Spark on a Hadoop cluster.

The capability of integrating existing code in the Hadoop infrastructure without needing to modify it is highly valuable: it provides an essential gain in terms of performance, even if the code itself has not been written for distributed computing. The cluster acts in this case as an HPC (High-Performance Computing) machine and parallelizes the code execution by spawning it through its nodes.

Old pieces of code can for example still run in a more efficient way than on a single Desktop machine, or code needed by users who are not capable to re-write it specifically for distributed computing frameworks like Spark. Even in cases where  the parallelization does not represent a substantial gain, the integration of the results or the code itself in software of the distributed framework is immediate. The power of having a "Big Data engine" in-house is thus fully exploited.

 

"Redesigning your application to run multithreaded on a multicore machine is a little like learning to swim by jumping into the deep end." - Herb Sutter, chair of the ISO C++ standards committee, Microsoft.

People

Research Topics

cp: 2019-11-12 09:16:22