``Quality of Information (QoI): Mitigating the Lack of Authority
Control''
James C. French
University of Virginia, School of Engineering and Applied Science
Department of Computer Science
Charlottesville, VA 22903
E-mail: french@cs.virginia.edu
URL: http://www.cs.virginia.edu/brochure/profs/french.html
Linking disparate information resources into a cohesive knowledge base involves solving a number of vexing interoperability issues. Even if we adopt unambiguous standards for required metadata and agree completely on the semantics, we still have a serious problem in resolving individual field values. The quality of information (QoI) is the issue.
Integrated research environments for scientists and engineers will have many problems related to the integration of fielded information. Consider bibliographic records as an example. Here we have fields (attributes) such as title, author names, author affiliations, journal name, date of publication, etc. If we were to attempt to merge multiple bibliographic databases from several sources, we would have to grapple directly with the issue of object identity: when are two records denoting the same object? It might be quite difficult, indeed undecidable in general, to make these associations. However, when domain knowledge is introduced the problem becomes more tractable.
In earlier work [3,2,1,4], we have used lexical methods and clustering analysis for this problem and have had excellent results. However, it has become clear that we are at the limit of what can be achieved by syntactic methods alone. To make further progress we must introduce additional knowledge into the process, first general knowledge and finally very specific domain knowledge. We are now considering approaches involving combining evidence to improve belief and that appears to be promising.
Librarians attempt to solve this problem by use of a controlled vocabulary; they refer to this as authority control. An authority file is used to keep track of all permissible strings allowable in a field. We believe that strict authority control will prove to be impossible in a dynamic information environment and view our approach toward value resolution as mitigating the lack of authority control, a more realistic goal.
This work supported in part by DARPA contract N66001-97-C-8542, NSF grant CDA-9529253, and NASA GSRP NGT5-50062.