``Searching Information from Multiple Sources''
Prof. Clement Yu
University of Illinois at Chicago
Department of Electrical
Engineering and Computer Science
Chicago, IL 60607-7053
E-mail: yu@eecs.uic.edu
URL: http://www.eecs.uic.edu/eecspeople/yu.htm
Currently, enormous amount of information is generated by numerous people. As a consequence, information is distributed and stored in various sites. The Internet is one such example. In order to retrieve useful information in response to a query, a naive method is to broadcast the query to all sites. Then the search engine at each site processes the query and retrieves a set of documents. These documents are merged and then an appropriate ranked list of documents is presented to the user. The method described above does not make use of system resources efficiently, because the query may be sent to many sites which do not contain useful information. This is a waste of communication resources. In addition, the search engine at those sites will need to process the query. They may even return some useless documents. Finally, the transmission of useless documents and their subsequent merging waste further system resources. In order to reduce the waste, the contents of each database is represented by a representative. When a query is submitted by a user, it is compared to the representatives of the databases. Then the system will estimate the number of documents in each database which are most similar to the query. Based on these estimates, only those databases which contain sufficiently large number of most similar documents will be searched. The research issues we have been studying consist of: