``Metadata from the Perspective of an Environmental
Scientist''
Prof. Francis Bretherton
University of Wisconsin--Madison
Space Science and Engineering Center
Madison, WI 53706
E-mail: fbretherton@ssec.wisc.edu
Background
Environmental sciences centered on climate and the atmosphere, but
with concern for broad interdisciplinary issues such as Global Climate
Change. Sound management and integration of environmental information
from very diverse sources is central to meeting policy makers' concerns.
Problem Areas
-
Locating information-e.g. a web search for distributed technical
collections. There is no existing superstructure for Keyword Schemas,
though the FGCDC content standards provide a start. We need standards
for registering and accessing specialized subsets and extensions
emerging from diverse disciplines, and a framework for negotiating
commonalities, synonyms, and homonyms among such subsets. The Dublin
Core is a must, but tools for implementing it unambiguously are
not quite there yet.
-
A critical requirement for understanding environmental change is
documenting existing data streams for long-term analysis. A good test
is to imagine what our successor scientists will think 20 years from
now, when they are trying to determine whether an environmental
variable has really changed, or whether apparent changes are due to
the way we encoded or processed the measurements. The only evidence
is the metadata record, and there are no insiders around to be asked.
This issue involves an open-ended class of users, and the biggest
problem is failure to record things that assumed to be either
unimportant or generally known. The only strategy is:
Metadata content check lists + Scientific judgement
but there is little technical support for the activity. The
inadequacies of the past require very expensive and difficult data
archeology. With growing automation the problem is rapidly getting
worse, not better.
-
Documenting complex computer models. Environmental science involves
many high-volume observing systems such as satellites. Transforming
these observations into usable data products frequently involves
complex computer models (e.g. atmospheric general circulation
models). The documentation from such models is almost never adequate
to reconstruct the model from scratch, and even then rarely produces
the same results. We need a framework for explicitly modeling the
science concepts in a language that links the conceptual level to
declarative data structures and standard algorithms. A fundamental
obstacle is the need to treat ``approximate equivalence'' of distinct
representations of real-world objects, as opposed to ``mathematical
equivalence'' expressed by axioms of equivalence classes. Such
science concept modeling will inevitably be local and incomplete,
raising downstream issues of merging independent, and possibly
inconsistent, sub-models.