Optimization of Computing and Networking Resources of a Hadoop Cluster Based on Software Defined Network

Khaleel, A; Al-Raweshidy, H

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/17072

Title:	Optimization of Computing and Networking Resources of a Hadoop Cluster Based on Software Defined Network
Authors:	Khaleel, A Al-Raweshidy, H
Keywords:	big data;data centre network;genetic algorithm;genetic programming;hadoop;MapReduce
Issue Date:	17-Oct-2018
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Citation:	Khaleel, A. and Al-Raweshidy, H. (2019) 'Optimization of Computing and Networking Resources of a Hadoop Cluster Based on Software Defined Network', IEEE Access, 6, pp. 61351 - 61365. doi: 10.1109/ACCESS.2018.2876385.
Abstract:	In this paper, we discuss some challenges regarding the Hadoop framework. One of the main ones is the computing performance of Hadoop MapReduce jobs in terms of CPU, memory, and hard disk I/O. The networking side of a Hadoop cluster is another challenge, especially for large-scale clusters with many switch devices and computing nodes, such as a data center network. The configurations of Hadoop MapReduce parameters can have a significant impact on the computing performance of a Hadoop cluster. All issues relating to Hadoop MapReduce parameter settings are addressed. Some significant parameters of Hadoop MapReduce are tuned using a novel intelligent technique based on both genetic programming and a genetic Algorithm, with the aim of optimizing the performance of a Hadoop MapReduce job. The Hadoop framework has more than 150 configurations of parameters and hence, setting them manually is not difficult, but also time-consuming. Consequently, the above-mentioned algorithms are used to search for the optimum values of parameter settings. The software-defined network (SDN) is also employed to improve the networking performance of a Hadoop cluster, thus accelerating Hadoop jobs. Experiments have been carried out on two typical applications of Hadoop, including a Word Count Application and Tera Sort application, using 14 virtual machines in both a traditional network and an SDN. The results for the traditional network show that our proposed technique improves MapReduce jobs' performance for 20 GB with the Word Count application by 69.63% and 30.31% when compared to the default and Gunther work, respectively. While for the Tera Sort application, the performance of Hadoop MapReduce is improved by 73.39% and 55.93%, compared with the default and Gunther work, respectively. Moreover, the experimental results in an SDN environment showed that the performance of a Hadoop MapReduce job is further improved due to the advantages of the intelligent and centralized management achieved...
URI:	https://bura.brunel.ac.uk/handle/2438/17072
DOI:	https://doi.org/10.1109/ACCESS.2018.2876385
Other Identifiers:	ORCiD: Ali Khaleel https://orcid.org/0000-0003-2521-3996 ORCiD: Hamed Al-Raweshidy https://orcid.org/0000-0002-3702-8192
Appears in Collections:	Department of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2019 The Author(s) Published under license by Institute of Electrical and Electronics Engineers (IEEE). This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/	1.6 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License