Please use this identifier to cite or link to this item:
|Title:||Optimisation of computing and networking resources of a Hadoop cluster based on software defined network|
|Keywords:||Big Data;Data centre network;Genetic algorithm;Genetic programming;Hadoop;MapReduce|
|Publisher:||Institute of Electrical and Electronics Engineers|
|Citation:||IEEE Access, 2018|
|Abstract:||in this paper, we discuss some challenges regarding the Hadoop framework. One of the main ones is the computing performance of Hadoop MapReduce jobs in terms of CPU, memory and hard disk I/O. The networking side of a Hadoop cluster is another challenge, especially for large scale clusters with many switch devices and computing nodes, such as a data centre network. The configurations of Hadoop MapReduce parameters can have a significant impact on the computing performance of a Hadoop cluster. All issues relating to Hadoop MapReduce parameter settings are addressed. Some significant parameters of Hadoop MapReduce are tuned using a novel intelligent technique based on both genetic programming and a genetic Algorithm, with aim of optimising the performance of a Hadoop MapReduce job. In the Hadoop framework, there are more than 150 configurations of parameters and hence, setting them manually is not difficult, but also time consuming. Consequently, the above-mentioned algorithms are used to search for the optimum values of parameter settings. Software Defined Network (SDN) is also employed to improve the networking performance of a Hadoop cluster, thus accelerating Hadoop jobs. Experiments have been carried out on two typical applications of Hadoop, including a Word Count Application and Tera Sort application, using 14 virtual machines in both a traditional network and an SDN. The results for the traditional network show that our proposed technique improves MapReduce jobs performance for 20 GB with the Word Count application by 69.63% and 30.31% when compared to the default and Gunther work, respectively. Whilst for the Tera Sort application, the performance of Hadoop MapReduce is improved by 73.39% and 55.93%, compared with the default and Gunther work, respectively. Moreover, the experimental results in an SDN environment showed the performance of a Hadoop MapReduce job is further improved due to the advantages of the intelligent and centralised management achieved using it. Another experiment has been conducted to evaluate the performance of Hadoop jobs using a large scale cluster in a data centre network, also based on SDN, with the results revealing that this exceeded the performance of a conventional network|
|Appears in Collections:||Dept of Electronic and Computer Engineering Research Papers|
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.