It’s not a data deluge – it’s worse than that

Keynote presentation at the Third International Workshop on Data Intensive Distributed Computing (DIDC'10) held in conjunction with HPDC'10, Chicago IL. IU was among the many organizations that developed the phrase “data deluge” to describe the prodigious capabilities of digital instrument...

Full description

Bibliographic Details
Main Author: Stewart, Craig A.
Format: Conference Object
Language:English
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/2022/13195
Description
Summary:Keynote presentation at the Third International Workshop on Data Intensive Distributed Computing (DIDC'10) held in conjunction with HPDC'10, Chicago IL. IU was among the many organizations that developed the phrase “data deluge” to describe the prodigious capabilities of digital instruments to produce data. A deluge calls to mind an extremely heavy rain, or maybe being drenched by a large wave. Unfortunately the situation we have is worse than that. The new capabilities of next-generation sequencing machines, digital video, and the capability of scientists to put high-output devices in remote locations makes the data issue far more challenging that it has ever been. This talk focuses on two general areas of handling data issues: wide area filesystems and movement of data across long distances; and the challenges of data management when data production rates simply exceed the capabilities of the network connecting source to analysis facilities. Examples will be drawn from use of the IU Data Capacitor, now the most widely used globally-accessible file system in the history of the TeraGrid; and field studies with data sources ranging from the Antarctic ice cap to African villages to telescopes on remote mountains. Some successes and many emerging challenges will be discussed.