Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/7379
Title: Enhancing recall and precision of web search using genetic algorithm
Authors: Al-Dallal, Ammar Sami
Advisors: Shaker, R
El-Haddadeh, R
Keywords: Information retrieval;Genetic algorithm;HTML indexing;Document evaluation function;Web search
Issue Date: 2012
Publisher: Brunel University, School of Information Systems, Computing and Mathematics
Abstract: Due to rapid growth of the number of Web pages, web users encounter two main problems, namely: many of the retrieved documents are not related to the user query which is called low precision, and many of relevant documents have not been retrieved yet which is called low recall. Information Retrieval (IR) is an essential and useful technique for Web search; thus, different approaches and techniques are developed. Because of its parallel mechanism with high-dimensional space, Genetic Algorithm (GA) has been adopted to solve many of optimization problems where IR is one of them. This thesis proposes searching model which is based on GA to retrieve HTML documents. This model is called IR Using GA or IRUGA. It is composed of two main units. The first unit is the document indexing unit to index the HTML documents. The second unit is the GA mechanism which applies selection, crossover, and mutation operators to produce the final result, while specially designed fitness function is applied to evaluate the documents. The performance of IRUGA is investigated using the speed of convergence of the retrieval process, precision at rank N, recall at rank N, and precision at recall N. In addition, the proposed fitness function is compared experimentally with Okapi-BM25 function and Bayesian inference network model function. Moreover, IRUGA is compared with traditional IR using the same fitness function to examine the performance in terms of time required by each technique to retrieve the documents. The new techniques developed for document representation, the GA operators and the fitness function managed to achieves an improvement over 90% for the recall and precision measures. And the relevance of the retrieved document is much higher than that retrieved by the other models. Moreover, a massive comparison of techniques applied to GA operators is performed by highlighting the strengths and weaknesses of each existing technique of GA operators. Overall, IRUGA is a promising technique in Web search domain that provides a high quality search results in terms of recall and precision.
Description: This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.
URI: http://bura.brunel.ac.uk/handle/2438/7379
Appears in Collections:Computer Science
Dept of Computer Science Theses

Files in This Item:
File Description SizeFormat 
FulltextThesis.pdf2.79 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.