Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/6702
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Cribbin, T | - |
dc.date.accessioned | 2012-09-21T15:13:22Z | - |
dc.date.available | 2012-09-21T15:13:22Z | - |
dc.date.issued | 2011 | - |
dc.identifier.citation | Journal of the American Society for Information Science and Technology, 62(6): 1188 - 1207, Jun 2011 | en_US |
dc.identifier.issn | 1532-2882 | - |
dc.identifier.uri | http://onlinelibrary.wiley.com/doi/10.1002/asi.21519/abstract | en |
dc.identifier.uri | http://bura.brunel.ac.uk/handle/2438/6702 | - |
dc.description | This is the post-print of the Article - Copyright @ 2011 ASIS&T | en_US |
dc.description.abstract | Document similarity models are typically derived from a term-document vector space representation by comparing all vector-pairs using some similarity measure. Computing similarity directly from a ‘bag of words’ model can be problematic because term independence causes the relationships between synonymous and related terms and the contextual influences that determine the ‘sense’ of polysemous terms to be ignored. This paper compares two methods that potentially address these problems by modelling the higher-order relationships that lie latent within the original vector space. The first is latent semantic analysis (LSA), a dimension reduction method which is a well known means of addressing the vocabulary mismatch problem in information retrieval systems. The second is the lesser known, yet conceptually simple approach of second-order similarity (SOS) analysis, where similarity is measured in terms of profiles of first-order similarities as computed directly from the term-document space. Nearest neighbour tests show that SOS analysis produces similarity models that are consistently better than both first-order and LSA derived models at resolving both coarse and fine level semantic clusters. SOS analysis has been criticised for its cubic complexity. A second contribution is the novel application of vector truncation to reduce the run-time by a constant factor. Speed-ups of four to ten times are found to be easily achievable without losing the structural benefits associated with SOS analysis. | en_US |
dc.language.iso | en | en_US |
dc.publisher | American Society for Information Science and Technology | en_US |
dc.title | Discovering latent topical structure by second-order similarity analysis | en_US |
dc.type | Article | en_US |
dc.identifier.doi | http://dx.doi.org/10.1002/asi.21519 | - |
pubs.organisational-data | /Brunel | - |
pubs.organisational-data | /Brunel/Brunel Active Staff | - |
pubs.organisational-data | /Brunel/Brunel Active Staff/School of Info. Systems, Comp & Maths | - |
pubs.organisational-data | /Brunel/Brunel Active Staff/School of Info. Systems, Comp & Maths/IS and Computing | - |
pubs.organisational-data | /Brunel/University Research Centres and Groups | - |
pubs.organisational-data | /Brunel/University Research Centres and Groups/School of Information Systems, Computing and Mathematics - URCs and Groups | - |
pubs.organisational-data | /Brunel/University Research Centres and Groups/School of Information Systems, Computing and Mathematics - URCs and Groups/Multidisclipary Assessment of Technology Centre for Healthcare (MATCH) | - |
pubs.organisational-data | /Brunel/University Research Centres and Groups/School of Information Systems, Computing and Mathematics - URCs and Groups/People and Interactivity Research Centre | - |
Appears in Collections: | Publications Computer Science Dept of Computer Science Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Fulltext.pdf | 1.16 MB | Adobe PDF | View/Open |
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.