21 January 2009

Data Analysis On Small Computers

A powerful yet compact algorithm has been developed that can be used on laptop computers to extract features and patterns from huge and complex data sets. A powerful computing tool that allows scientists to extract features and patterns from enormously large and complex sets of raw data has been developed by scientists at University of California and Lawrence Livermore National Laboratory. The tool – a set of problem-solving calculations known as an algorithm – is compact enough to run on computers with as little as two gigabytes of memory. The team that developed this algorithm has already used it to probe a slew of phenomena represented by billions of data points, including analyzing and creating images of flame surfaces; searching for clusters and voids in a virtual universe experiment; and identifying and tracking pockets of fluid in a simulated mixing of two fluids. Computers are widely used to perform simulations of real-world phenomena and to capture results of physical experiments and observations, storing this information as collections of numbers.

But as the size of these data sets has burgeoned, hand-in-hand with computer capacity, analysis has grown increasingly difficult. A mathematical tool to extract and visualize useful features from data sets has existed for nearly 40 years – in theory. Called the Morse-Smale complex, it partitions sets by similarity of features and encodes them into mathematical terms. But working with the Morse-Smale complex is not easy. The algorithm divides data sets into parcels of cells, then analyzes each parcel separately using the Morse-Smale complex. Results of those computations are then merged together. As new parcels are created from merged parcels, they are analyzed and merged yet again. At each step, data that do not need to be stored in memory are discarded, drastically reducing the computing power required to run the calculations. One test of the algorithm was to use it to analyze and track the formation and movement of pockets of fluid in the simulated mixing of two fluids: one dense, one light. The complexity of this data set is so vast – it consists of more than one billion data points on a three-dimensional grid – it challenges even supercomputers.

More information:

http://www.sciencedaily.com/releases/2009/01/090108082531.htm