A MapReduce-based parallel K-means clustering for large-scale CIM data verification

Deng, C; Liu, Y; Xu, L; Yang, J; Liu, J; Li, S; Li, M

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/11832

Full metadata record

DC Field	Value	Language
dc.contributor.author	Deng, C	-
dc.contributor.author	Liu, Y	-
dc.contributor.author	Xu, L	-
dc.contributor.author	Yang, J	-
dc.contributor.author	Liu, J	-
dc.contributor.author	Li, S	-
dc.contributor.author	Li, M	-
dc.date.accessioned	2016-01-13T10:38:41Z	-
dc.date.available	2015-08-28	-
dc.date.available	2016-01-13T10:38:41Z	-
dc.date.issued	2015	-
dc.identifier.citation	Concurrency Computation, 27, (6): pp.1375-1638, (2015)	en_US
dc.identifier.issn	1532-0626	-
dc.identifier.issn	1532-0634	-
dc.identifier.uri	http://onlinelibrary.wiley.com/doi/10.1002/cpe.3580/epdf	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/11832	-
dc.description.abstract	The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification.	en_US
dc.description.sponsorship	National Science Foundation of China (no. 51437003), also National Basic Research Program (973) of China under grant no. 2014CB340404	en_US
dc.language.iso	en	en_US
dc.publisher	Wiley	en_US
dc.subject	CIM verification	en_US
dc.subject	Stochastic sampling	en_US
dc.subject	Clustering	en_US
dc.subject	MapReduce	en_US
dc.subject	Load balancing	en_US
dc.title	A MapReduce-based parallel K-means clustering for large-scale CIM data verification	en_US
dc.type	Article	en_US
dc.identifier.doi	http://dx.doi.org/10.1002/cpe.3580	-
dc.relation.isPartOf	Concurrency Computation	-
pubs.publication-status	Accepted	-
pubs.publication-status	Published	-
Appears in Collections:	Department of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
Fulltext.pdf		665.78 kB	Adobe PDF	View/Open

Show simple item record