Retrieve and refine

Wei, B; Li, Y; Li, G; Xia, X; Jin, Z

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30912

Title:	Retrieve and refine
Authors:	Wei, B Li, Y Li, G Xia, X Jin, Z
Keywords:	comment generation;deep learning
Issue Date:	21-Sep-2020
Publisher:	Association for Computing Machinery (ACM)
Citation:	Wei, B. et al. (2020) 'Retrieve and refine', ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Virtual Event, Melbourne, Australia, 21-25 September, pp. 349 - 360. doi: 10.1145/3324884.3416578.
Abstract:	Code comment generation which aims to automatically generate natural language descriptions for source code, is a crucial task in the field of automatic software development. Traditional comment generation methods use manually-crafted templates or information retrieval (IR) techniques to generate summaries for source code. In recent years, neural network-based methods which leveraged acclaimed encoder-decoder deep learning framework to learn comment generation patterns from a large-scale parallel code corpus, have achieved impressive results. However, these emerging methods only take code-related information as input. Software reuse is common in the process of software development, meaning that comments of similar code snippets are helpful for comment generation. Inspired by the IR-based and template-based approaches, in this paper, we propose a neural comment generation approach where we use the existing comments of similar code snippets as exemplars to guide comment generation. Specifically, given a piece of code, we first use an IR technique to retrieve a similar code snippet and treat its comment as an exemplar. Then we design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input, and leverages the information from the exemplar to assist in the target comment generation based on the semantic similarity between the source code and the similar code. We evaluate our approach on a large-scale Java corpus, which contains about 2M samples, and experimental results demonstrate that our model outperforms the state-of-the-art methods by a substantial margin.
Description:	The accepted manuscript is available at arXiv, arXiv:2010.04459v1 [cs.SE] (https://doi.org/10.48550/arXiv.2010.04459). Comments: to be published in the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020) (ASE'20).
URI:	https://bura.brunel.ac.uk/handle/2438/30912
DOI:	https://doi.org/10.1145/3324884.3416578
ISBN:	978-1-4503-6768-4
Other Identifiers:	ORCiD: Yongmin Li https://orcid.org/0000-0003-1668-2440 arXiv:2010.04459v1 [cs.SE]
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2020 Association for Computing Machinery (ACM). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org (see: https://authors.acm.org/author-resources/author-rights).	929.24 kB	Adobe PDF	View/Open

Show full item record