參考文獻 |
[1] A.M.TURING,“I.—COMPUTINGMACHINERYANDINTELLIGENCE,”Mind,
vol. LIX, pp. 433–460, 10 1950.
[2] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J.-Y. Nie, and J.-R. Wen, “A survey of large language models,” 2023.
[3] J. Gao and C.-Y. Lin, “Introduction to the special issue on statistical language modeling,” ACM Transactions on Asian Language Information Processing, vol. 3, p. 87–93, jun 2004.
[4] T.Mikolov,M.Karafiát,L.Burget,J.Cernockỳ,andS.Khudanpur,“Recurrentneu- ral network based language model.,” in Interspeech, vol. 2, pp. 1045–1048, Makuhari, 2010.
[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2023.
[6] W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion param- eter models with simple and efficient sparsity,” 2022.
[7] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019.
[8] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” 2019.
[9] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus, “Emergent abilities of large language models,” 2022.
[10] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakan- tan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[11] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko,J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiri- donov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “Palm: Scaling language modeling with pathways,” 2022.
[12] OpenAI, “Chatgpt,” 2022. Accessed: 2024-06-18.
[13] GitHub, “Copilot,” 2022. Accessed: 2024-06-18.
[14] Google, “Gemini,” 2023. Accessed: 2024-06-18.
[15] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
[16] Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago, et al., “Competition-level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022.
[17] B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, et al., “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023.
[18] R. D. Austin, “The effects of time pressure on quality in software development: An agency model,” Information systems research, vol. 12, no. 2, pp. 195–207, 2001.
[19] C. Bird, D. Ford, T. Zimmermann, N. Forsgren, E. Kalliamvakou, T. Lowdermilk, and I. Gazit, “Taking flight with copilot: Early insights and opportunities of ai- powered pair-programming tools,” Queue, vol. 20, no. 6, pp. 35–57, 2022.
[20] L.Williams,R.R.Kessler,W.Cunningham,andR.Jeffries,“Strengtheningthecase for pair programming,” IEEE software, vol. 17, no. 4, pp. 19–25, 2000.
[21] K. Beck and M. Fowler, Planning extreme programming. Addison-Wesley Profes- sional, 2001.
[22] J.Wei,X.Wang,D.Schuurmans,M.Bosma,F.Xia,E.Chi,Q.V.Le,D.Zhou,etal., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in neural information processing systems, vol. 35, pp. 24824–24837, 2022.
[23] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.
[24] B. Chen, F. Zhang, A. Nguyen, D. Zan, Z. Lin, J.-G. Lou, and W. Chen, “Codet: Code generation with generated tests,” 2022.
[25] J.Austin,A.Odena,M.Nye,M.Bosma,H.Michalewski,D.Dohan,E.Jiang,C.Cai, M. Terry, Q. Le, and C. Sutton, “Program synthesis with large language models,” 2021.
[26] D. Hendrycks, S. Basart, S. Kadavath, M. Mazeika, A. Arora, E. Guo, C. Burns, S. Puranik, H. He, D. Song, and J. Steinhardt, “Measuring coding challenge compe- tence with apps,” NeurIPS, 2021.
[27] Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. D. Lago, T. Hubert, P. Choy, C. de Masson d’Autume, I. Babuschkin, X. Chen, P.-S. Huang, J. Welbl, S. Gowal, A. Cherepanov, J. Mol- loy, D. J. Mankowitz, E. S. Robson, P. Kohli, N. de Freitas, K. Kavukcuoglu, and O. Vinyals, “Competition-level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022.
[28] X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching large language models to self-debug,” arXiv preprint arXiv:2304.05128, 2023.
[29] X. Jiang, Y. Dong, L. Wang, Q. Shang, and G. Li, “Self-planning code generation with large language model,” arXiv preprint arXiv:2303.06689, 2023.
[30] Y. Dong, X. Jiang, Z. Jin, and G. Li, “Self-collaboration code generation via chat- gpt,” arXiv preprint arXiv:2304.07590, 2023.
[31] W. Chen, Y. Su, J. Zuo, C. Yang, C. Yuan, C. Qian, C.-M. Chan, Y. Qin, Y. Lu, R. Xie, et al., “Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents,” arXiv preprint arXiv:2308.10848, 2023.
[32] S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, et al., “Metagpt: Meta programming for multi-agent collaborative framework,” arXiv preprint arXiv:2308.00352, 2023.
[33] Q.Wu,G.Bansal,J.Zhang,Y.Wu,S.Zhang,E.Zhu,B.Li,L.Jiang,X.Zhang,and C. Wang, “Autogen: Enabling next-gen llm applications via multi-agent conversation framework,” arXiv preprint arXiv:2308.08155, 2023.
[34] C.-M.Chan,W.Chen,Y.Su,J.Yu,W.Xue,S.Zhang,J.Fu,andZ.Liu,“Chateval: Towards better llm-based evaluators through multi-agent debate,” arXiv preprint arXiv:2308.07201, 2023.
[35] Y. Shoham, “Agent-oriented programming,” Artificial intelligence, vol. 60, no. 1, pp. 51–92, 1993.
[36] D. Huang, Q. Bu, Y. Qing, and H. Cui, “Codecot: Tackling code syntax errors in cot reasoning for code generation,” 2024.
[37] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
[38] D. Huang, Q. Bu, J. M. Zhang, M. Luck, and H. Cui, “Agentcoder: Multi-agent- based code generation with iterative testing and optimisation,” 2024.
[39] G. Van Rossum and F. L. Drake Jr, “Python tutorial,” 1995.
[40] IEEE Spectrum, “The top programming languages 2023,” 2023. Accessed: 2024-06- 18.
[41] P. Developers, “Pylint.” Accessed: 2024-06-18.
[42] M. Developers, “Mypy.” Accessed: 2024-06-18.
[43] G. Fraser and A. Arcuri, “Evosuite: automatic test suite generation for object- oriented software,” in Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp. 416–419, 2011.
[44] M.F.Roslan,J.M.Rojas,andP.McMinn,“Anempiricalcomparisonofevosuiteand dspot for improving developer-written test suites with respect to mutation score,” in Search-Based Software Engineering (M. Papadakis and S. R. Vergilio, eds.), (Cham), pp. 19–34, Springer International Publishing, 2022.
[45] Z.Yuan,Y.Lou,M.Liu,S.Ding,K.Wang,Y.Chen,andX.Peng,“Nomoremanual tests? evaluating and improving chatgpt for unit test generation,” 2023.
[46] Z. Xie, Y. Chen, C. Zhi, S. Deng, and J. Yin, “Chatunitest: a chatgpt-based auto- mated unit test generation tool,” 2023.
[47] N. Al Madi, “How readable is model-generated code? examining readability and visual inspection of github copilot,” in Proceedings of the 37th IEEE/ACM Inter- national Conference on Automated Software Engineering, ASE ’22, (New York, NY, USA), Association for Computing Machinery, 2023.
[48] J.-Y. Yao, K.-P. Ning, Z.-H. Liu, M.-N. Ning, and L. Yuan, “Llm lies: Hallucinations are not bugs, but features as adversarial examples,” arXiv preprint arXiv:2310.01469, 2023.
[49] D. Huang, Q. Bu, J. M. Zhang, M. Luck, and H. Cui, “Agentcoder: Multi-agent- based code generation with iterative testing and optimisation,” arXiv preprint arXiv:2312.13010, 2023.
[50] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” Advances in neural information processing systems, vol. 35, pp. 22199–22213, 2022.
[51] L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, et al., “Judging llm-as-a-judge with mt-bench and chatbot arena,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[52] S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, and P. S. Liang, “Spoc: Search-based pseudocode to code,” Advances in Neural Information Process- ing Systems, vol. 32, 2019.
[53] Y. Dong, J. Ding, X. Jiang, G. Li, Z. Li, and Z. Jin, “Codescore: Evaluating code generation by learning code execution,” arXiv preprint arXiv:2301.09043, 2023.
[54] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
[55] H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
[56] OpenAI, “Humaneval: Evaluating large language models trained on code.” https: //github.com/openai/human-eval, 2021. Accessed: 2024-06-28.
[57] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Lan- guage agents with verbal reinforcement learning,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[58] K. Zhang, Z. Li, J. Li, G. Li, and Z. Jin, “Self-edit: Fault-aware code editor for code generation,” arXiv preprint arXiv:2305.04087, 2023.
[59] A. Majd, M. Vahidi-Asl, A. Khalilian, A. Baraani-Dastjerdi, and B. Zamani, “Code4bench: A multidimensional benchmark of codeforces data for different pro- gram analysis techniques,” Journal of Computer Languages, vol. 53, pp. 38–52, 2019.
[60] A. Manifesto, “Manifesto for agile software development,” 2001.
[61] S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[62] M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, et al., “Graph of thoughts: Solving elaborate problems with large language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 17682–17690, 2024. |