Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/32399Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Liu, Y | - |
| dc.contributor.author | Li, M | - |
| dc.contributor.author | Khan, M | - |
| dc.contributor.author | Qi, M | - |
| dc.date.accessioned | 2025-11-24T15:36:00Z | - |
| dc.date.available | 2025-11-24T15:36:00Z | - |
| dc.date.issued | 2014-06-24 | - |
| dc.identifier | ORCiD: Maozhen Li https://orcid.org/0000-0002-0820-5487 | - |
| dc.identifier.citation | Liu, Y. et al. (2014) 'A MapReduce based distributed LSI for scalable information retrieval', Computing and Informatics, 33 (2), pp. 259 - 280. Available at: https://www.cai.sk/ojs/index.php/cai/article/view/995 | en_US |
| dc.identifier.issn | 1335-9150 | - |
| dc.identifier.uri | https://bura.brunel.ac.uk/handle/2438/32399 | - |
| dc.description.abstract | Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapReduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources. | en_US |
| dc.description.sponsorship | This research is partially supported by the 973 project on Network Big Data Analytics No. 2014CB340404. | en_US |
| dc.format.extent | 259 - 280 | - |
| dc.format.medium | Print-Electronic | - |
| dc.language.iso | en_US | en_US |
| dc.publisher | Slovak Academy of Sciences | en_US |
| dc.rights | Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International | - |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | - |
| dc.source.uri | https://www.cai.sk/ojs/index.php/cai/article/view/995 | - |
| dc.subject | information retrieval | en_US |
| dc.subject | latent semantic indexing | en_US |
| dc.subject | MapReduce | en_US |
| dc.subject | load balancing | en_US |
| dc.subject | genetic algorithms | en_US |
| dc.title | A MapReduce based distributed LSI for scalable information retrieval | en_US |
| dc.type | Article | en_US |
| dc.relation.isPartOf | Computing and Informatics | - |
| pubs.issue | 2 | - |
| pubs.publication-status | Published | - |
| pubs.volume | 33 | - |
| dc.rights.license | https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.en | - |
| dc.rights.holder | Slovak Academy of Sciences | - |
| Appears in Collections: | Dept of Electronic and Electrical Engineering Research Papers | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| FullText.pdf | Copyright © 2014 Slovak Academy of Sciences. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/). | 1.12 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License