EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

Li, JA; Li, Y; Li, G; Hu, X; Xia, X; Jin, Z

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30911

Title:	EditSum: A Retrieve-and-Edit Framework for Source Code Summarization
Authors:	Li, JA Li, Y Li, G Hu, X Xia, X Jin, Z
Keywords:	code summarization;information retrieval;deep learning
Issue Date:	15-Nov-2021
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Citation:	Li, J.A. et al. (2021) 'EditSum: A Retrieve-and-Edit Framework for Source Code Summarization', 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Melbourne, Australia / Virtual Event, 15-19 November, pp. 155 - 166. doi: 10.1109/ase51524.2021.9678724.
Abstract:	Existing studies show that code summaries help developers understand and maintain source code. Unfortunately, these summaries are often missing or outdated in software projects. Code summarization aims to generate natural language descriptions automatically for source code. According to Gros et al., code summaries are highly structured and have repetitive patterns (e.g. "return true if..."). Besides the patternized words, a code summary also contains important keywords, which are the key to reflecting the functionality of the code. However, the state-of-the-art approaches perform poorly on predicting the keywords, which leads to the generated summaries suffer a loss in informativeness. To alleviate this problem, this paper proposes a novel retrieve-and-edit approach named EditSum for code summarization. Specifically, EditSum first retrieves a similar code snippet from a pre-defined corpus and treats its summary as a prototype summary to learn the pattern. Then, EditSum edits the prototype automatically to combine the pattern in the prototype with the semantic information of input code. Our motivation is that the retrieved prototype provides a good start-point for post-generation because the summaries of similar code snippets often have the same pattern. The post-editing process further reuses the patternized words in prototype and generates keywords based on the semantic information of input code. We conduct experiments on a large-scale Java corpus (2M) and experimental results demonstrate that EditSum outperforms the state-of-the-art approaches by a substantial margin. The human evaluation also proves the summaries generated by EditSum are more informative and useful. We also verify that EditSum performs well on predicting the patternized words and keywords.
Description:	The accepted manuscript is available at arXiv, arXiv:2308.13775v2 [cs.SE] (https://doi.org/10.48550/arXiv.2308.13775). Comments: Accepted by the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021).
URI:	https://bura.brunel.ac.uk/handle/2438/30911
DOI:	https://doi.org/10.1109/ase51524.2021.9678724
ISBN:	978-1-6654-0337-5 (ebk) 978-1-6654-4784-3 (PoD)
ISSN:	1938-4300
Other Identifiers:	ORCiD: Yongmin Li https://orcid.org/0000-0003-1668-2440
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2021 The Author(s) / Institute of Electrical and Electronics Engineers (IEEE). arXiv.org - Non-exclusive license to distribute. The URI https://arxiv.org/licenses/nonexclusive-distrib/1.0/ is used to record the fact that the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission.	819.8 kB	Adobe PDF	View/Open

Show full item record