Human assessments of document similarity

Westerman, S J; Cribbin, T; Collins, Julie

You are here : University of Gloucestershire > Research > Research Repository

Human assessments of document similarity

Tools

Westerman, S J, Cribbin, T and Collins, Julie (2010) Human assessments of document similarity. Journal of the American Society for Information Science and Technology, 61 (8). pp. 1535-1542. doi:10.1002/asi.21361

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1002/asi.21361

Abstract

Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.

Item Type:	Article
Subjects:	B Philosophy. Psychology. Religion > BF Psychology
Divisions:	Schools and Research Institutes > School of Education, Health and Sciences
Research Priority Areas:	Health, Life Sciences, Sport and Wellbeing
Depositing User:	Julie Collins
Date Deposited:	19 Jan 2015 12:34
Last Modified:	05 Aug 2025 09:39
URI:	https://eprints.glos.ac.uk/id/eprint/1203

University Staff: Request a correction | Repository Editors: Update this record

Altmetric

View Altmetric information about this item.

CORE (COnnecting REpositories)

University Of Gloucestershire

Find Us On Social Media:

Other University Web Sites

Staffnet (Staff Only)

University of Gloucestershire, The Park, Cheltenham, Gloucestershire, GL50 2RH. Telephone +44 (0)844 8010001.

© UoG 2008-24
Knowledge Base
Accessibility
Privacy and Cookies
Disclaimer
Comments concerning this page to Webmaster