Home | About Us | eSubscribe | Links |        
   CSIRO  |  SOLVE  | Issue 13  |  Jun 08  
 
ARTICLE
INFORMATION TECHNOLOGY:
Crunching the information explosion
By Dr Gio Braidotti
 

Fed up with science and business partners drowning in electronic data, CSIRO set about finding how best to extract useful information from vast, and growing, data sets

The computing super-snags arising from massive data sets are neither about hardware nor software. To CSIRO scientists, the problem sits uncomfortably in between the two. It resolves into the need for intelligent ways to probe enormous quantities of complex data, making it intelligible in meaningful, accurate and useful ways.

Illustration: Paul Jareax

While scientists, with their passion for recording massive amounts of information, were the first to succumb to data overload, the business and government sectors have been rapidly catching up. Convinced the problem was only going to get worse, CSIRO realised it was in a rare position to develop solutions across a broad range of disciplines and sectors. In 2007, it set up Terabyte Science, a new research theme headed by Dr John Taylor.

“The number of hard disk drives being purchased is growing, so the ability to store data is keeping up,” Dr Taylor says. “Similarly, large parallel computers are offering processing speeds of 476 teraflops, which is very fast. So we know how to build the hardware … it's making sense of data that is lagging.”

The problem has two very different aspects. On one hand, Terabyte Science needs to have a true understanding of what is ‘meaningful' to many different disciplines, since astronomy, finance, imaging-rich computing, genomics, environmental modelling or materials science all rely on subtle, specialist knowledge to determine what, in a data set, is significant. On the other hand, existing computing tools to analyse data and databases simply do not scale-up easily to meet the needs of super-sized data sets, which presents a second, more technical, algorithmic side to the problem.

By drawing on CSIRO expertise across its broad spectrum of R&D fields, Dr Taylor is putting together a growing, multidisciplinary team of specialists in which to embed the mathematicians and IT experts. R&D is then undertaken in four areas: solving big computation issues associated with modelling, data streaming, imaging, and collaboration tools that allow people to work together, interacting and manipulating the same data-rich objects.

Despite the research theme being just one year old, Terabyte Science solutions have already been launched onto the market. Dr Taylor provides the example of a tool developed to analyse imaging data produced by phase-contrast X-ray imaging of wood structure. Such analysis is undertaken with a view to processing wood based on its underlying physical structure, in a bid to ensure uses that best suit the timber. Success stands to add considerable value to the industry.

“We developed a Windows-based software tool to help analyse the X-ray imaging data,” Dr Taylor says. “Running on clusters of two PC machines, with improvements to the original software, and using all 16 processing cores if necessary, it improves processing speeds by a factor of 360 over the original code. This represents a substantial improvement in productivity with opportunities to analyse larger and more complex images.”

A key issue with this and other projects is whether the solution ‘scales'. That is, if 100 processors are added to the cluster then will the algorithm actually run nearly 100 times faster? Recently, this issue was solved for the timber algorithm, a success that bodes well for Terabyte Science, which is in discussion with the Australian Synchrotron to provide a large computer, with 126 cores or more, capable of reconstructing 3-D X-ray images in real-time for researchers using the Imaging and Medical Therapy beamlines.

“What we are trying to provide is the ability to check the data researchers collect – on site and in real time – allowing them to verify that their experiments are working and make corrections if something is going wrong … something that is not currently possible,” Dr Taylor says.

Also under discussion is the possibility of contributing to the processing algorithms required by the Square Kilometre Array, the world's largest radio telescope project, which will be built either in Western Australia or South Africa .

Satellite data, environmental modelling and management tools for fisheries are also being investigated. However, the scope of Terabyte Science goes beyond solutions for imaging and modelling situations. Also of concern are data-streaming issues, constructing search engines that go beyond detecting a specific word and on to ‘understand' the semantic context, for example.

On a different front, there is a push to develop interactive computer technology that allows collaborators on a project to work together to manipulate the same data-rich objects in computationally complex ways. The project includes research on the psychology of interacting in computer environments.

“We are not interested in the issue of building hardware or software,” Dr Taylor says, “but of providing scalable algorithms that are effective at extracting and using meaningful data in situation sensitive ways.”

APPLICATION Terabyte Science is addressing problems associated with analysing massive amounts of electronic data across scientific, business and government sectors

BENEFIT New methods, tools and interactive capabilities to facilitate the extraction of useful information from complex data sets

 

For further information contact:
CSIRO Enquiries
Email: Solve@csiro.au      Web: www.csiro.au
Freecall: 1300 363 400       International: +61 3 9545 2176

Back to main
 

 
Solve
IN THIS ISSUE

 

 

Last Updated: June 17, 2008
© 2006 CSIRO Australia. For use of CSIRO material contact solve@csiro.au
 
Use of this website and content is subject to our Legal Notice and Privacy Statement.
Please contact us for assistance, or to provide feedback or comments.