dc.description.abstract | With the advancement of Large Language Models (LLMs), these models have become pivotal aids in software development. However, LLMs still face numerous challenges in terms of the accuracy and reliability of code generation. This paper aims to thoroughly analyze the correctness of current LLMs in code generation, explore their practical limitations, and propose solutions to enhance the accuracy of generated code.
This paper introduces a code generation method based on LLMs, named JudgeCoder, which employs a multi-agent system and Chain of Thought (CoT) strategy to increase the correctness of code generation. By simulating the division of labor in team coding environments, the process separates code generation, test data generation, and test execution, thereby reducing the illusion phenomena often caused by unclear task division in a single LLM. Moreover, the paper presents a strategy combining Chain of Thought with Self-Consistency (CoT-SC), which further detects erroneous test data produced by model illusions, preventing the entry into incorrect correction processes. In experiments, JudgeCoder demonstrates good performance, achieving state-of-the-art results on the HumanEval and HumanEval-ET datasets. The results confirm that the proposed voting mechanism, coupled with appropriate prompting strategies and reasonable error judgment mechanisms, can effectively enhance the accuracy of generated code. These findings not only validate the practicality of JudgeCoder but also provide a directional framework for future research in LLM-based automatic code generation. | en_US |