``Metadata from the Perspective of an Environmental Scientist''
Prof. Francis Bretherton
University of Wisconsin--Madison
Space Science and Engineering Center
Madison, WI 53706
E-mail: fbretherton@ssec.wisc.edu

Background

Environmental sciences centered on climate and the atmosphere, but with concern for broad interdisciplinary issues such as Global Climate Change. Sound management and integration of environmental information from very diverse sources is central to meeting policy makers' concerns.

Problem Areas

  1. Locating information-e.g. a web search for distributed technical collections. There is no existing superstructure for Keyword Schemas, though the FGCDC content standards provide a start. We need standards for registering and accessing specialized subsets and extensions emerging from diverse disciplines, and a framework for negotiating commonalities, synonyms, and homonyms among such subsets. The Dublin Core is a must, but tools for implementing it unambiguously are not quite there yet.
  2. A critical requirement for understanding environmental change is documenting existing data streams for long-term analysis. A good test is to imagine what our successor scientists will think 20 years from now, when they are trying to determine whether an environmental variable has really changed, or whether apparent changes are due to the way we encoded or processed the measurements. The only evidence is the metadata record, and there are no insiders around to be asked.

    This issue involves an open-ended class of users, and the biggest problem is failure to record things that assumed to be either unimportant or generally known. The only strategy is:

    Metadata content check lists + Scientific judgement

    but there is little technical support for the activity. The inadequacies of the past require very expensive and difficult data archeology. With growing automation the problem is rapidly getting worse, not better.

  3. Documenting complex computer models. Environmental science involves many high-volume observing systems such as satellites. Transforming these observations into usable data products frequently involves complex computer models (e.g. atmospheric general circulation models). The documentation from such models is almost never adequate to reconstruct the model from scratch, and even then rarely produces the same results. We need a framework for explicitly modeling the science concepts in a language that links the conceptual level to declarative data structures and standard algorithms. A fundamental obstacle is the need to treat ``approximate equivalence'' of distinct representations of real-world objects, as opposed to ``mathematical equivalence'' expressed by axioms of equivalence classes. Such science concept modeling will inevitably be local and incomplete, raising downstream issues of merging independent, and possibly inconsistent, sub-models.