``Desktop Scientific Experiment Management''
Prof. Yannis Ioannidis
University of Athens, Department of Informatics
Panepistimioupolis, TYPA Buildings
157-71 Ilisia, Athens, Hellas (Greece)
E-mail: yannis@cs.wisc.edu

Prof. Miron Livny
University of Wisconsin
Department of Computer Sciences
Madison, WI 53706
E-mail: miron@cs.wisc.edu

Motivation

In the past few years, several scientific communities have initiated very ambitious and broad-ranged projects in their disciplines. The NASA Eos effort and the NIH Human Genome project are two examples of such national and international scientific endeavors. A major part of these projects is the collection of huge amounts of data (sometimes measured in petabytes) on complex phenomena. Managing this surge of scientific data poses significant challenges, many of which cannot be effectively addressed by existing database technology. This has resulted in much research activity in the area of Scientific Database Systems.

Nevertheless, still little attention has been devoted to the needs of small teams of scientists who perform individual experimental studies in their laboratories. In particular, a major problem that many experimental scientists are facing is that there are no adequate experiment management tools that are powerful enough to capture the complexity of the experiments and at the same time are natural and intuitive to the non-expert. A small laboratory that can easily generate and store several megabytes of data per day is still dependent on the good old paper notebook when it comes to keeping track of the data.

In addition to generating a significant amount of data during its course, an experimental study often involves accessing many data-providing ``systems'', including simulation tools, laboratory equipment, statistical analysis packages, visualization packages, administrative tools, and others. They typically form a heterogeneous collection of legacy systems whose interactions capture much of the experiment flow of the study.

To address the above challenges, what is needed is a management system offering high-level modeling of and access to the data generated by an experimental study, integrated access to all external data-providing ``systems'', the ability to plug in any such system without touching it, and an advanced conceptual interface that hides the input/output details of these systems.

The ZOO Project

Over the past four years, based on NSF funding (partly within the ``Scientific Databases Initiative''), we have collaborated with several domain scientists, studied the needs of a wide range of experimental disciplines, developed solutions to some of the basic problems in experiment management, and made significant progress towards implementing a simple Desktop Experiment Management Environment (DEME) called Zoo. (For further information, visit our web site http://www.cs.wisc.edu/ZOO.) Our work has proceeded in a tight loop between developing generic experiment management technology that is implemented in a generic tool, Zoo, and installing customized enhancements of the tool that constitute full systems (complete Customized Desktop Experiment Management Systems (CDEMSs)) in laboratories of interest(We use the term `laboratory' to indicate any scientific environment where experiments are conducted, be it a physical laboratory in the traditional sense, a virtual laboratory involving scientists collaborating across the network, simulation-based modeling, etc.). The following are the main features of the system:

  1. A generic architecture that captures the essence of experimental studies in arbitrary disciplines.
  2. An open architecture that permits its customization to specific studies or environments with little effort, offering tools for connecting the system to many external applications.
  3. An object-based database server that supports a high-level data model and query language, captures and leads the entire experimental process, can deal with some `status' queries for in-flight experiments, and offers many desirable features of complex scientific workflows.
  4. A modular translator of data from their conceptualization and representation in the database to those required by the experimentation environment (typically flat files) and vice-versa.
  5. A uniform visual interface that can be used throughout the life-cycle of experimental studies.
A key characteristic that brings many of the above together is that the conceptual/logical schema of the experiment data has been placed at the center of Zoo. Whether the activity is designing the study, invoking an experiment, querying the data, or analyzing query results, a (visual) representation of the schema is used to perform the activity. The schema essentially captures the study itself, including its data and all forms of metadata.

Zoo has been successfully tested by a collaborating group in Soil Sciences for plant-growth simulation experiments. ZOO has been customized to communicate with several simulators used for such experiments, and the resulting CDEMS has been used to drive test runs on them. Likewise, Zoo has also been tested by a collaborating group in Biochemistry for NMR experiments run on spectrometers, with very positive results again.

Research Challenges

Our experience from installed software as tested and evaluated in real-life settings has shown that one of the biggest challenges of experiment management systems like Zoo lies with their user interface. These include problems related to schema visualization, visual query languages, dynamic queries, approximate query answers, incremental queries, visual translation specification between internal objects and external files, and others. Another major challenge lies in dealing with arbitrary external systems. Especially critical is the semantic integration of ``schemas'' of external software, dealing with interactive (as opposed to batch) external systems, and querying external systems while they are running and obtaining useful status information. Finally, modeling advanced experiment flows still remains an elusive goal, which is nevertheless very important for scientific experiments.