``Frameworks for Distributed Query and Search in Scientific
Digital Libraries''
Dr. Jakka Sairamesh
IBM T. J. Watson Research
Hawthorne, NY, 10532
E-mail: jramesh@watson.ibm.com
URL: http://www.cs.columbia.edu/~jakka/home.html
Prof. Christos N. Nikolaou
University of Crete and ICS-FORTH
Leoforos Knossou, Heraklion, Crete 71409, Greece
E-mail: nikolau@ics.forth.gr
URL: http://www.ics.forth.gr/~nikolau/nikolau.html
We present an architecture for storing, managing and presenting geographic and coastal information of various kinds to a variety of users. The primary goal is to provide transparent access to information stored in various databases spatially distributed. We first present a scenario and then the architecture. Our architecture is based on OMG CORBA services and Object frameworks for access control and repository services. This work grew out of work mentioned in [1,2].
Introduction
Suppose we have an extensive database of the physical, chemical and biological properties of the coastal region under consideration. This database includes the bathymetry of the region and various physical, chemical and biological properties of the water column. Such properties include such phenomena as currents, wave spectra, wind spectra, salinity, temperature, chemical and biological concentrations [1,2].
A typical query of the database might be phrased as follows: ``Find the region of 3d space within the given coastal region and the time interval, within which the concentration of a certain chemical or microorganism may exceed a certain value.'' Scientists may be interested in questions of this form in order to be better able to understand the physical and chemical processes in a coastal region. Local civil authorities may be interested in issuing permits for fishing or declaring certain coastal zones as health risks, inappropriate for tourism, swimming, etc.
More generally interrogation of the database can be thought of as a query of the type ``find a subset of a given set containing points with a specified property.'' From a logical set theoretic point of view, this interrogation operation is the computation of an intersection set created from two sets characterized by respective logical properties which are supposed to hold simultaneously for all elements of the set. This logical Boolean intersection operation is one of the most common DB interrogations and may require time consuming searches in very large databases. Here the sets involved are not amorphous clouds of discrete records but rather connected, smoothly shaped high-dimensional objects that represent certain multivariate functions.
Scientists interested in using existing programs (or algorithms, numerical techniques) to study the properties of the fresh data points (collected by the sensors) in the databases, and also look at previous research papers, and possibly some annotations, could ask for very complex queries. From the users view, the information system must provide a transparent view to the existing programs or numerical techniques, databases, and documents in an integrated fashion. This could mean searching for existing programs (which are indexed by keywords) and applying them to new data which could be located elsewhere.
Architecture
The web provides a simple way to represent and access documents, but for a large distributed system that needs to work together to solve the kinds of queries mentioned above, several issues of naming, access control, metadata, repositories naturally emerge. Though these technologies currently exist independently, an integrated solution to indexing, querying and presenting information objects customized to the various users of the system is still in its early stages [2].
Several efforts [1,2,3] are underway to solve these issues for a large spatially distributed database system. With the rapidly emerging Java enabled technologies, access to various legacy databases and legacy systems is becoming feasible via the Web. In addition, the Web is steadily embracing object-oriented frameworks such as OMG CORBA for better services and flexibility. Leveraging on these new technologies, the basic components of our architecture are the following:
Description of Architecture
Clients submit simple or complex queries via the world-wide web interface to the Digital Library system. The queries are submitted via Java enabled browsers. The system is accessible by scientists, engineers, local authorities and system administrators, but they all have different access restrictions. Users requests invoke various services such as meta, index and search to locate the objects which match the user-query. The agents (agents representing users) will fork various sub-agents to search in parallel across various databases (legacy and new) and collect/present the various information objects to the user. We assume that documents are stored in Digital Library or document databases, and metadata services are provided to access the documents.
From the scenario above, when a user selects a region of a map through the WWW browser, the coordinates of the region will be used to index the appropriate information about the region. This implies a metadata service that maps the regions of the map to the appropriate region-information. Therefore, users can zoom into a region of the map (or image) and query for various properties about the region or perform some operations on-line. It is likely that the information about a region could be dispersed across several database sites (for example the detailed image of the region will be stored separately from the data objects). For this service, distributed search queries will be sent to the various databases to obtain the objects. Metadata services describing the GIS objects will be used to index/search for the appropriate GIS objects (multi-dimensional data and images).
Conclusions
In this paper, we outline the issues in providing a distributed architecture for accessing the multimedia information from spatially separated scientific databases. We provide a OMG based object framework and architecture for scientists, engineers and administrators to access the scientific databases.