Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/11317
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorLi, M-
dc.contributor.advisorMeng, H-
dc.contributor.authorCui, Jianbin-
dc.date.accessioned2015-09-04T15:27:31Z-
dc.date.available2015-09-04T15:27:31Z-
dc.date.issued2015-
dc.identifier.urihttp://bura.brunel.ac.uk/handle/2438/11317-
dc.descriptionThis thesis was submitted for the degree of Master of Philosophy and awarded by Brunel University Londonen_US
dc.description.abstractThe rapid development of Internet and cloud computing technologies has led to explosive generation and processing of huge amounts of data. The ever increasing data volumes bring great values to societies, but in the meantime bring forward a number of challenges. Data mining techniques have been widely used in decision analysis in financial, medical, management, business and many other fields. However, how to analyse and mine valuable information from the massive data has become a crucial problem as the traditional methods are hardly to achieve high scalability in data processing. Recently, MapReduce has emerged into a major programming model in dealing with big data analytics. Apache Hadoop, which is an open-source implementation of MapReduce, has been widely taken up by the community. Hadoop facilitates the utilization of a large number of inexpensive commodity computers. In addition, Hadoop provides support in dealing with faults which is especially useful for long running jobs. Mahout is a new open-source project of Apache, providing a number of machine learning and data mining algorithms based on the Hadoop platform. As a machine learning technique, K-means has been widely used in data analytics through clustering. However, K-means experiences high overhead in computation when the size of data to be analysed is large. This thesis parallelizes K-means using the MapReduce model and implements a parallel K-means with Mahout on the Hadoop platform. The parallel K-means reduces the computation time significantly in comparison with the standard K-means in dealing with a large data set. In addition, this thesis further evaluates the impact of Hadoop parameters on the performance of the Hadoop framework.en_US
dc.language.isoenen_US
dc.publisherBrunel University Londonen_US
dc.relation.urihttp://bura.brunel.ac.uk/bitstream/2438/11317/1/Jianbin%20Cui-1326088.pdf-
dc.subjectCloud computingen_US
dc.subjectParallel computingen_US
dc.subjectParallel k-means algorithmen_US
dc.subjectHadoop parameteren_US
dc.subjectMap reduceen_US
dc.titleParallelizing k-means with hadoop/mahout for big data analyticsen_US
dc.typeThesisen_US
Appears in Collections:Electronic and Computer Engineering
Dept of Electronic and Electrical Engineering Theses

Files in This Item:
File Description SizeFormat 
Jianbin Cui-1326088.pdf1.03 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.