Buja and Alpert

``Computational and Large Data Issues in NCAR's Climate System Model''
Dr. Lawrence Buja and Dr. Ethan Alpert
National Center for Atmospheric Research
Boulder, CO 80307-3000
E-mail: southern@ncar.ucar.edu, ethan@ncar.ucar.edu
URL: http://www.cgd.ucar.edu/csm

The Climate System Model (CSM) project at the National Center for Atmospheric Research (NCAR) addresses one of the most difficult and urgent challenges in science today: understanding and predicting the Earth's climate, particularly climate variation and possible human induced climate change. Its user community spans a wide range of domains, ranging from the science arena, to the classroom and extending into the political circles of national policy making and societal impacts assessment.

As this model and its large data holdings are made available to its diverse community, a number of hurdles must be cleared. First, much of the existing CSM model output remains virtually inaccessible to all but those who are closely associated with the model development. Second, even when available, the data products themselves are in a discipline-specific form which inhibits easy cross-domain information exchange. Finally, although the CSM code itself is freely available, many researchers lack the computation resources necessary to carry out further scientific investigations with such a computationally intensive model.

At the National Center for Atmospheric Research, we are engaged in a number of efforts to address these issues.

CLIMATE MODEL DATA ACCESS

The scientists who comprise the CSM research community span many disciplines and are located at a variety of sites around the globe. This presents a huge challenge to provide these researchers fast, intuitive and secure remote access to the data for research and decision making.

For typical climate modeling scenarios, the CSM typically produce tens of gigabytes of data . Further, the real-world observational datasets needed for model evaluation also consist of terabytes of data. Current data access methods (e.g. FTP) are inadequate to supply these data products to the CSM user community. Even the Next Generation Internet will not come close to addressing the massive data delivery needs of the CSM.

There is a clear need to provide easy and open access to the CSM data archives at NCAR, allowing outside researchers to remotely analyze and reduce the data to manageable volumes. Current projects aimed at this include extending the development of University of Rhode Island's Distributed Ocean Data System (DODS), evaluating the applicability of CORBA methodologies and broadening the NCAR Command Language (NCL) to utilize the unique computational and storage resources of NCAR's supercomputing center.

Toward a Common Currency for Scientific Information Exchange

Even greater than distance, as a barrier to understanding interactions among complex systems, are the discipline-specific forms in which the products of advancing knowledge typically are stored, transmitted and processed. Data containing complex structures or large values of numerical values can only be represented and used in a computer, which has resulted in the development of many specialized software tools and applications for data analysis and processing. These tools are often highly focussed with in a specific discipline and often don't address functionality beyond the scope of the scientists latest endeavor. In many ways, scientific discourse is limited because often only a select group of scientists have both access the necessary computational resources and the necessary software skills to work with digital data. This is compounded further when scientists from other disciplines wish to investigate cross disciplinary data and aren't experts in the specific domain. What is needed is a common currency for scientific data exchange and analysis. A common currency would handle the details of translating data from its original discipline specific form and representation in to a variety of equivalent representations. A common currency would also facilitate common ways of specify data reduction and processing tasks.

Defining a common currency for scientific data exchange represents many challenges. Systems that would handle this common currency exchange would be widely distributed, and require discipline specific intelligent "objects" to handle and sequence data conversion and processing. Additionally, user interfaces need to be flexible to allow a variety of user's from different disciplines and skill levels to locate, understand and use effectively.

Much research is need to begin to understand how to implement a common currency for scientific data. The following are areas of key research needed:

Can sufficient abstractions of scientific data be developed to support the object-oriented classification of scientific data representations?
What types of meta-data are needed to support enabling the determination of computational semantics in an automated fashion?
Can extant data belong to multiple classes of data and still have a single format and original representation?
What types of interfaces (natural language, GUI) will best facilitate cross disciplinary distribute data processing application development? Can these interfaces be sufficiently intuitive or scientific researchers?
How can classification of scientific data assist in locating data?

Distributed Climate Modeling

The CSM is a loose coupling of distinct, yet simultaneously executing, atmospheric, ocean, land surface and sea-ice climate models linked together by small, well-defined, inter-component interface exchanges. The complex nature and enormous resources consumed by climate system models of this class have generally limited their application to studies at coarse geographical resolutions carried out at national or international supercomputer centers.

The advent of high-speed research networks, powerful mid-level computers and high-capacity storage media provides the necessary infrastructure to conduct large climate simulations at widely-distributed research centers. Spreading the large climate modeling computational load among a number of smaller research institutions would make it possible to extend the traditionally coarse climate modeling framework to interactively incorporate any number of fine-scale regional models at a scale impossible to achieve at a single supercomputer center. While proof of concept integrations show this is technically possible, arranging the necessary logistics and political cooperation to carry it through may be quite another matter.

Summary

Distributing a complex climate model and its large data archives to a diverse user community presents a number of challenges and we realize that these issues are by no means unique to the climate modeling community.

In the data arena, our basic premise is that distributed, object-oriented design and development methods, applied to suitable abstractions, can yield common currency for open cross-disciplinary scientific and technology information analysis and exchange.

Computationally, a distributed modeling framework holds the promise for allowing widely separated research centers to coordinate their resources to carry out major climate simulations, releasing large climate modeling out of the tight confines of the major supercomputer centers and into a much wider and diverse research community.