A Prompt-Based Framework for the Automated Generation of Natural Language Instructions Across Diverse Domains and Tasks

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：6

、訪客IP：3.144.112.72

姓名

林禹彤(Yu-Tung Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(A Prompt-Based Framework for the Automated Generation of Natural Language Instructions Across Diverse Domains and Tasks)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識於電子病歷之研究	★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索	★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現	★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法
★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結	★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data
★ 藉由加入多重語音辨識結果來改善對話狀態追蹤	★ 對話系統應用於中文線上客服助理:以電信領域為例
★ 應用遞歸神經網路於適當的時機回答問題	★ 使用多任務學習改善使用者意圖分類
★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法	★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-5-31以後開放)

摘要(中)

隨著 ELMo, GPT, BERT 等大型語言模型的興起，自然語言處理領
域的研究逐漸轉向兩階段的訓練模式，預訓練大型語言模型及以下
游任務為目標做微調，而後續對於研究模型適應於多任務或未見過
的任務上，逐漸發現預訓練模型的泛化潛力，進而引導出指令微調
(instruction tuning) 的概念，這也同時影響了標註資料從原有針對不同
任務所設計的資料，轉向需要指令形式的資料 (instructional data)，過
去的方法曾經為將過去標註的各種自然語言處理任務的資料加上指
令，成為指令微調的材料，而這浩大的工程也引起了如何自動化產生
指令資料的研究，本論文提出了新的以提示工程為基礎，自動化產生
指令資料的框架，我們設計了五種提示引導現有的經指令微調過的語
言模型，產生對應不同領域主題和任務，超過 1 萬筆的指令資料，此
框架的可控制指定任務的特性，改善了傳統利用現有自然語言處理資
料所產生的指令資料，和後來學者所提出的自動化產生指令資料的方
法，兩者皆出現了資料任務類型的不平衡現象。並且我們為第一個嘗
試使用自動化方法產生可用於強化學習中的獎勵模型資料，雖然本實
驗並無直接測試資料在強化學習中的影響力，但本實驗利用指令微調
GPT-3，並且利用近幾個月所提出的 G-Eval 方法來自動化評估不管是
產生的資料本身又或是指令微調後的結果，得到了優於基線 0.15 到
0.45 差距的成果。

摘要(英)

With the emergence of large-scale language models such as ELMo, GPT,
BERT, the focus of research in natural language processing has shifted towards two-stage training paradigms. This involves pretraining large language
models and fine-tuning them on downstream tasks. The progress in multitasking research and the exploration of applicability to unseen tasks have revealed
the potential for generalization in pretrained language models. This has paved
the way for the development of the concept of instruction tuning.
This shift in research direction has also impacted the type of labeled data
required. Instead of task-specific annotated data, there is a need for instructional data. Previous approaches involved adding instructions to existing annotated natural language processing datasets. However, this proved to be a
significant undertaking. Subsequently, researchers explored automated methods for generating instructional data.
In this experiment, we propose a novel prompt-based framework for automated instruction data generation. We design five prompts to guide existing
instruction-tuned language models in generating instructional data across various domains and tasks, resulting in a dataset of over 10,000 instructions. This
framework provides control over the specified task characteristics, improving upon both traditional approaches using existing NLP data and automated
methods proposed by other researchers. Both previous approaches suffered
from data and task type imbalances.
Furthermore, we are the first to attempt generating reward model data for
reinforcement learning using an automated approach. While the experiment
did not directly evaluate the impact of the data in reinforcement learning, we
employed instruction tuning with GPT-3 and utilized the recently proposed
G-Eval method to automate the evaluation of both the generated data and
the instruction-tuned results. Our findings show significant improvements
ranging from 0.15 to 0.45 over the baselines.

關鍵字(中)

★ 提示工程
★ 資料自動生成
★ 指令資料

關鍵字(英)

★ prompt engineering
★ data generation
★ instruction data

論文目次

中文摘要 v
Abstract vi
致謝 vii
Contents viii
List of Figures x
List of Tables xi
1 Introduction 1
2 Related Work 5
2.1 Instruction Tuning and Instruction Data Generation . . . . . . . . . . . . 5
3 Methodology 7
3.1 Prepare Seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Define System Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Instruction and Response Generation . . . . . . . . . . . . . . . . . . . . 13
4 Data Analysis 14
4.1 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 Instruction Quality . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.2 Response Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Experiments and Results 37
5.1 Instruction Tuned on GPT3 . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.2 Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6 Conclusion 46
Bibliography 48

參考文獻

[1] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep Contextualized Word Representations,” in Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long Papers), (New Orleans, Louisiana), pp. 2227–2237, Association for Computational Linguistics, June
2018.
[2] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language
Understanding by Generative Pre-Training,”
[3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
of Deep Bidirectional Transformers for Language Understanding,” May 2019.
arXiv:1810.04805 [cs].
[4] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger,
T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse,
M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot
Learners,” July 2020. arXiv:2005.14165 [cs].
[5] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai,
and Q. V. Le, “Finetuned Language Models Are Zero-Shot Learners,” Feb. 2022.
arXiv:2109.01652 [cs].
48
[6] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang,
S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller,
M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” Mar. 2022.
arXiv:2203.02155 [cs].
[7] S. Mishra, D. Khashabi, C. Baral, and H. Hajishirzi, “Cross-Task Generalization via
Natural Language Crowdsourcing Instructions,” Mar. 2022. arXiv:2104.08773 [cs].
[8] Y. Wang, S. Mishra, P. Alipoormolabashi, Y. Kordi, A. Mirzaei, A. Arunkumar,
A. Ashok, A. S. Dhanasekaran, A. Naik, D. Stap, E. Pathak, G. Karamanolakis,
H. G. Lai, I. Purohit, I. Mondal, J. Anderson, K. Kuznia, K. Doshi, M. Patel, K. K.
Pal, M. Moradshahi, M. Parmar, M. Purohit, N. Varshney, P. R. Kaza, P. Verma,
R. S. Puri, R. Karia, S. K. Sampat, S. Doshi, S. Mishra, S. Reddy, S. Patro,
T. Dixit, X. Shen, C. Baral, Y. Choi, N. A. Smith, H. Hajishirzi, and D. Khashabi,
“Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+
NLP Tasks,” Oct. 2022. arXiv:2204.07705 [cs].
[9] Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi,
“Self-Instruct: Aligning Language Model with Self Generated Instructions,” Dec.
2022. arXiv:2212.10560 [cs].
[10] O. Honovich, T. Scialom, O. Levy, and T. Schick, “Unnatural Instructions: Tuning
Language Models with (Almost) No Human Labor,” Dec. 2022. arXiv:2212.09689
[cs].
[11] OpenAI, “GPT-4 Technical Report,” Mar. 2023. arXiv:2303.08774 [cs].
[12] Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, “G-Eval: NLG Evaluation using
GPT-4 with Better Human Alignment,” Apr. 2023. arXiv:2303.16634 [cs].
[13] “Stanford CRFM.”
[14] “Introducing ChatGPT.”
49
[15] B. Peng, C. Li, P. He, M. Galley, and J. Gao, “Instruction Tuning with GPT-4,” Apr.
2023. arXiv:2304.03277 [cs].
[16] T. Schick and H. Schütze, “Generating Datasets with Pretrained Language Models,”
in Proceedings of the 2021 Conference on Empirical Methods in Natural Language
Processing, (Online and Punta Cana, Dominican Republic), pp. 6943–6951, Association for Computational Linguistics, Nov. 2021.
[17] O. Weller, N. Lourie, M. Gardner, and M. E. Peters, “Learning from Task Descriptions,” in Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP), (Online), pp. 1361–1375, Association for Computational Linguistics, Nov. 2020.
[18] V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin,
A. Stiegler, T. L. Scao, A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, S. S.
Sharma, E. Szczechla, T. Kim, G. Chhablani, N. Nayak, D. Datta, J. Chang, M. T.-J.
Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang,
T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fevry, J. A. Fries, R. Teehan, T. Bers,
S. Biderman, L. Gao, T. Wolf, and A. M. Rush, “Multitask Prompted Training Enables Zero-Shot Task Generalization,” Mar. 2022. arXiv:2110.08207 [cs].
[19] S. H. Bach, V. Sanh, Z.-X. Yong, A. Webson, C. Raffel, N. V. Nayak, A. Sharma,
T. Kim, M. S. Bari, T. Fevry, Z. Alyafeai, M. Dey, A. Santilli, Z. Sun, S. BenDavid, C. Xu, G. Chhablani, H. Wang, J. A. Fries, M. S. Al-shaibani, S. Sharma,
U. Thakker, K. Almubarak, X. Tang, D. Radev, M. T.-J. Jiang, and A. M. Rush,
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts,” Mar. 2022. arXiv:2202.01279 [cs].
[20] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang,
M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen,
A. Chowdhery, A. Castro-Ros, M. Pellat, K. Robinson, D. Valter, S. Narang,
G. Mishra, A. Yu, V. Zhao, Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean,
50
J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei, “Scaling Instruction-Finetuned
Language Models,” Dec. 2022. arXiv:2210.11416 [cs].
[21] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and
D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Jan. 2023. arXiv:2201.11903 [cs].
[22] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain,
S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark,
S. McCandlish, C. Olah, B. Mann, and J. Kaplan, “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback,” Apr. 2022.
arXiv:2204.05862 [cs].

指導教授

蔡宗翰(Tzong-Han Tsai)

審核日期

2023-5-30

推文