Human assessments of document similarity

Westerman, SJ; Cribbin, T; Collins, J

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/4376

Full metadata record

DC Field	Value	Language
dc.contributor.author	Westerman, SJ	-
dc.contributor.author	Cribbin, T	-
dc.contributor.author	Collins, J	-
dc.date.accessioned	2010-05-27T09:01:45Z	-
dc.date.available	2010-05-27T09:01:45Z	-
dc.date.issued	2010	-
dc.identifier.citation	American Society for Information Science and Technology, 61(8): 1535-1542, Aug 2010	en
dc.identifier.issn	1532-2882	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/4376	-
dc.identifier.uri	http://onlinelibrary.wiley.com/doi/10.1002/asi.21361/abstract	en
dc.description.abstract	Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.	en
dc.language.iso	en	en
dc.publisher	Wiley-Blackwell	en
dc.title	Human assessments of document similarity	en
dc.type	Research Paper	en
dc.identifier.doi	http://dx.doi.org/10.1002/asi.21361	-
Appears in Collections:	Computer Science Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
Fulltext.pdf		251.64 kB	Adobe PDF	View/Open

Show simple item record