``Quality of Information (QoI): Mitigating the Lack of Authority Control''
James C. French
University of Virginia, School of Engineering and Applied Science
Department of Computer Science
Charlottesville, VA 22903
E-mail: french@cs.virginia.edu
URL: http://www.cs.virginia.edu/brochure/profs/french.html

Linking disparate information resources into a cohesive knowledge base involves solving a number of vexing interoperability issues. Even if we adopt unambiguous standards for required metadata and agree completely on the semantics, we still have a serious problem in resolving individual field values. The quality of information (QoI) is the issue.

Integrated research environments for scientists and engineers will have many problems related to the integration of fielded information. Consider bibliographic records as an example. Here we have fields (attributes) such as title, author names, author affiliations, journal name, date of publication, etc. If we were to attempt to merge multiple bibliographic databases from several sources, we would have to grapple directly with the issue of object identity: when are two records denoting the same object? It might be quite difficult, indeed undecidable in general, to make these associations. However, when domain knowledge is introduced the problem becomes more tractable.

In earlier work [3,2,1,4], we have used lexical methods and clustering analysis for this problem and have had excellent results. However, it has become clear that we are at the limit of what can be achieved by syntactic methods alone. To make further progress we must introduce additional knowledge into the process, first general knowledge and finally very specific domain knowledge. We are now considering approaches involving combining evidence to improve belief and that appears to be promising.

Librarians attempt to solve this problem by use of a controlled vocabulary; they refer to this as authority control. An authority file is used to keep track of all permissible strings allowable in a field. We believe that strict authority control will prove to be impossible in a dynamic information environment and view our approach toward value resolution as mitigating the lack of authority control, a more realistic goal.

  1. J. C. French, A. L. Powell, and W. R. C. III, ``Efficient Searching in Distributed Digital Libraries,'' Proceedings of the 3rd ACM International Conference on Digital Libraries (DL '98), Pittsburgh, PA, 23-26 June 1998.
  2. J. C. French, A. L. Powell, and E. Schulman, ``Applications of Approximate Word Matching in Information Retrieval,'' Proceedings of the 6th International Conference on Information and Knowledge Management (CIKM '97), pages 9-15, Las Vegas, Nevada, 10-14 November 1997.
  3. J. C. French, A. L. Powell, E. Schulman, and J. L. Pfaltz, ``Automating the Construction of Authority Files in Digital Libraries: A Case Study,'' in C. Peters and C. Thanos, eds., First European Conference on Research and Advanced Technology for Digital Libraries, volume 1324 of Lecture Notes in Computer Science, pages 55-71, Pisa, 1-3 September 1997. Springer-Verlag.
  4. E. Schulman, J. C. French, A. L. Powell, G. Eichhorn, M. J. Kurtz, and S. S. Murray, ``Trends in Astronomical Publication Between 1975 and 1996,'' Publication of the Astronomical Society of the Pacific, 109:1278-1284, 1997.

This work supported in part by DARPA contract N66001-97-C-8542, NSF grant CDA-9529253, and NASA GSRP NGT5-50062.