Brunel University Research Archive (BURA) >
Schools >
School of Information Systems, Computing and Mathematics >
School of Information Systems, Computing and Mathematics Research Papers >

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/4376

Title: Human assessments of document similarity
Authors: Westerman, S J
Cribbin, T
Collins, J
Publication Date: 2010
Publisher: Wiley-Blackwell
Citation: American Society for Information Science and Technology. In press
Abstract: Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.
URI: http://dx.doi.org/10.1002/asi.21361
http://bura.brunel.ac.uk/handle/2438/4376
ISSN: 1532-2882
Appears in Collections:Information Systems and Computing
School of Information Systems, Computing and Mathematics Research Papers

Files in This Item:

File Description SizeFormat
Fulltext.pdf251.64 kBAdobe PDFView/Open

Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.

 


Library (c) Brunel University.    Powered By: DSpace
Send us your
Feedback. Last Updated: September 14, 2010.
Managed by:
Hassan Bhuiyan