博碩士論文 954403003 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:6 、訪客IP:18.117.196.217
姓名 許盛貴(Sheng-Kuei Hsu)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 探勘及搜尋程式碼以指引軟體開發
(Mining and Searching Source Codes to Guide Software Development)
相關論文
★ 網路合作式協同教學設計平台-以國中九年一貫課程為例★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例★ 探討國內田徑競賽資訊系統-以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例★ 導入資訊安全管理制度之資安管理成熟度研究-以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究★ 郵件系統異常使用行為偵測與處理-以T公司為例
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 肇因於軟體的快速變動,軟體程式庫及應用程式框架通常缺少完備的文件說明。同時,數以萬計的開發人員和一些組織已開發超過三十萬個開放源始碼專案(open-source projects),而這類軟體工程資料已形成了一個豐富的知識庫。這類知識在軟體實作或軟體維護階段可被用於提昇軟體開發人員的開發效率及效能。為針對這些待解問題,在本研究中,我們提出MACs (Mining API Code snippets for code reuse) 方法用於探勘軟體程式碼,以及提出MSCS (Multi-Segment Code Search) 方法用於搜尋程式碼。在探勘程式碼方面,我們應用了資料探勘技術於程式碼專案,以指引開發人員透過相關API (application programming interface) 使用樣式(usage patterns),如「開發人員寫了這列程式碼敘述,同時也寫了...」,給定一組程式檔案,找出的關聯樣式(association-rule patterns)能建議相關的程式碼,進而形成物件導向程式的結構;而探勘出來的序列樣式(sequential-rule patterns)能在方法裡預測可能的API序列。在搜尋程式碼方面,我們將程式碼劃分為三類型區段(segments):metadata區段、code-data區段及structural-data區段,然後應用不同的stemming及stop-word過濾處理,以建立多區段索引資料庫,用於進一步搜尋程序。在初步的評估中,我們提出一些實驗針對MACs和MSCS進行效用性(usefulness)評估。實驗結果顯示MACs系統有顯著的潛力以協助軟體開發人員,此外,在對MSCS的實驗中也指出,我們的多區段程式碼搜尋方案提供了更多的程式碼搜尋機制,使得有更多相關程式碼被找到。
摘要(英) In software development, lack of API (application programming interface) documents and lack of knowledge on how to use specific APIs still need to be addressed. Moreover, with more than 300,000 open-source projects created by millions of software developers and organizations, the software engineering data have formed a great and rich knowledge base. Such knowledge can be used to improve software developers’ efficiency and effectiveness. To address these issues and assist software developers, we propose the MACs (Mining API Code snippets for code reuse) approach for mining source codes and the MSCS (Multi-Segment Code Search) scheme for searching source codes. In MACs, we apply data mining to source code projects to guide developers through related API-usage patterns: “Developers who code the program statement also code….” Given a set of source code files, the mined association rules suggest related code snippets to form the components of object-oriented programs. The mined sequential rules predict likely additional API sequences within a method. In MSCS, we segment source code files into three types of segments: meta-data segment, code-data segment and structural-data segment, then applies different stemming and stop-word filtering processes to build a multi-segment index database for further search. Our preliminary evaluation shows that MACs has significant potential to assist developers, especially API newcomers, and provides an alternative method for code reuse. In addition, the experimental results of MSCS indicate that our approach provides a more flexible source code search mechanism that allows a greater number of relevant items to be found.
關鍵字(中) ★ 探勘軟體儲存庫
★ API使用樣式
★ 程式碼重用
★ 程式碼搜尋
★ 程式碼擷取
★ 多區段方案
關鍵字(英) ★ mining software repositories
★ API-usage pattern
★ code reuse
★ code search
★ code retrieval
★ multi-segment scheme
論文目次 摘要 ---------------------------------------------------I
Abstract----------------------------------------------II
誌謝 -------------------------------------------------III
LIST OF FIGURES --------------------------------------VI
LIST OF TABLES --------------------------------------VII
CHAPTER 1 INTRODUCTION -----------------------------1
1-1 MOTIVATION ----------------------------------------1
1-2 PROBLEMS ------------------------------------------6
1-3 SOLUTION -----------------------------------------11
1-4 RESEARCH PURPOSES --------------------------------13
1-5 ORGANIZATION OF THIS DISSERTATION ----------------15
CHAPTER 2 LITERATURE REVIEW --------------------------16
2-1 OVERVIEW -----------------------------------------16
2-2 DATA MINING --------------------------------------17
2-2-1 Data Mining Foundation -------------------------17
2-2-2 Data Preprocessing -----------------------------18
2-2-3 Association Rule Mining ------------------------19
2-2-4 Sequential Rule Mining -------------------------20
2-3 VECTOR SPACE MODEL -------------------------------22
2-4 MINING SOURCE CODE FOR SOFTWARE DEVELOPMENT ------23
2-4-1 Mining Source Codes for Reuse Purposes ---------23
2-4-2 Mining Source Codes for Other Purposes ---------25
2-4-3 Summary ----------------------------------------28
2-5 SEARCHING SOURCE CODES FOR SOFTWARE DEVELOPMENT --29
2-5-1 Approaches for Searching Source Codes ----------29
2-5-2 Summary ----------------------------------------30
2-6 ABSTRACT FORM OF A CODE STATEMENT ----------------32
2-6-1 Intermediate Code of Compiler ------------------32
2-6-2 Other Item Form Design -------------------------33
2-6-3 Summary ----------------------------------------34
2-7 ABSTRACT SYNTAX TREE (AST) -----------------------36
CHAPTER 3 THE APPROACH FOR MINING AND SEARCHING
SOURCE CODES --------------------------------------38
3-1 OVERVIEW -----------------------------------------38
3-2 DEFINITION ---------------------------------------39
3-3 THE MACS APPROACH --------------------------------41
3-3-1 Architecture of the MACs approach --------------41
3-3-2 the Framework for representing source files
and queries in MACs ----------------------------44
3-3-3 The Function R in MACs - the Constrained Rule
Mining -----------------------------------------47
3-4 THE MSCS APPROACH --------------------------------55
3-4-1 Definition -------------------------------------55
3-4-2 Architecture of the MSCS approach --------------56
3-4-3 A Multi-segment Scheme for Code Search ---------59
CHAPTER 4 IMPLEMENTATION AND USAGE SCENARIOS ---------65
4-1 THE MACS TOOLS -----------------------------------65
4-1-1 Data Preprocessing and Data Mining -------------66
4-1-2 Querying and Reusing the Association
Pattern Rules ----------------------------------67
4-1-3 Querying and Reusing the Sequential
Pattern Rules ----------------------------------68
4-2 THE MSCS TOOLS -----------------------------------70
4-2-1 Implementation ---------------------------------70
4-2-2 Example of Querying Source Code ----------------71
4-2-3 A code-data stop-word list ---------------------72
CHAPTER 5 EVALUATION AND DISCUSSION ------------------74
5-1 EVALUATION FOR MACS ------------------------------74
5-1-1 Selecting the Sample ---------------------------74
5-1-2 Evaluation Criteria ----------------------------74
5-1-3 Result -----------------------------------------76
5-1-4 Performance ------------------------------------78
5-2 EVALUATION FOR MSCS ------------------------------79
5-2-1 Measuring Retrieval Effectiveness --------------79
5-2-2 Results ----------------------------------------80
5-3 DISCUSSION ---------------------------------------82
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS ---------------84
6-1 CONCLUSIONS --------------------------------------84
6-2 FUTURE WORKS -------------------------------------86
REFERENCES -------------------------------------------89
參考文獻 Abran, A., Moore, J., Bourque, P., Dupuis, R. L., & Tripp, L. (2001). Guide to the Software Engineering Body of Knowledge–SWEBOK, trial version. IEEE-Computer Society Press.
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Proc. of the ACM SIGMOD Conference on Management of Data, 207 -216.
Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules. Proc. 20th Very Large Data Bases Conf. (VLDB), 487-499.
Aho, A. V. (2007). Compilers: principles, techniques, & tools. Pearson Education India.
Alvarez-Macias, J. L., Mata-Vasquez, J., & Riquelme-Santos, J. C. (2004). Data mining for the management of software development process. International Journal of Software Engineering and Knowledge Engineering, 14(6), 665-695.
Bajracharya, S., Ngo, T., Linstead, E., Dou, Y., Rigor, P., Baldi, P., & Lopes, C. (2006). Sourcerer: a search engine for open source code supporting structure-based search. Proc. OOPSLA 2006, 682-682.
Binkley, D. (2007). Source Code Analysis: A Road Map. Future of Software Engineering, p.104-119.
Chang, R. Y., Podgurski, A., & Yang, J. (2007). Finding what’s not there: A new approach to revealing neglected conditions in software. In Proceedings of the 2007 International Symposium on Software Testing and Analysis (ISSTA), 163–173.
Charrada, E., Koziolek, A., & Glinz, M. (2012). Identifying outdated requirements based on source code changes. the 20th IEEE International Conference (RE) in Requirements Engineering Conference (RE), 61–70.
Chikofsky, E. J., & Cross, J. H. (1990). Reverse engineering and design recovery: A taxonomy. Software, IEEE, 7(1), 13-17.
Cubranic, D., Murphy, G. C., Singer, J., & Booth, K. S. (2005). Hipikat: a project memory for software development. IEEE Transactions on Software Engineering, 31(6), 446-465.
Deshpande, A., & Riehle, D. (2008). The total growth of open source. In Fourth Conference on Open Source Systems. Springer Verlag.
Eclipse AST. (2013). Eclipse Astract Synctax Tree. Retrieved from http://www.eclipse.org/articles/Article-JavaCodeManipulation_AST/index.html
Elish, M. O. (2009). Improved estimation of software project effort using multiple additive regression trees. Expert Systems with Applications, 36(7), 10774-10778.
Engler, D., Chen, D. Y., Hallem, S., Chou, A., & Chelf, B. (2001). Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of 18th Symposium on Operating System Principles (SOSP), 57–72.
Fox, C. (1990). A stop list for general text. ACM-SIGIR Forum, 24, 19-35.
Frakes, W. B., & Pole, T. P. (1994). An empirical study of representation methods for reusable software components. IEEE Trans. on Software Engineering, 20(8), 617-630.
Gall, H. C., Fluri, B., & Pinzger, M. (2009). Change Analysis with Evolizer and ChangeDistiller. IEEE Software, 26(1), 26-33.
Google. (2012). Google Code Search. Retrieved from http://code.google.com/.
Han, J., Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques. Morgan kaufmann.
Hassan, A. E. (2008). The road ahead for Mining Software Repositories. In Proceedings of the Frontiers of Software Maintenance, FoSM 2008, 48-57.
Hassan, A. E., & Holt, R. C. (2004). Predicting change propagation in software systems. In Proceedings of the 20th IEEE International Conference on Software Maintenance, 284-293.
Hassan, A. E., & Xie, T. (2010). Software Intelligence: The Future of Mining Software Engineering Data. Proceedings of the FSE/SDP workshop on Future of software engineering research, 161-166.
Holmes, R., & Murphy, G. C. (2005). Using structural context to recommend source code examples. Proceedings of the 27th international conference on Software engineering, 15-21.
Hou, D., & Li, L. (2011). Obstacles in Using Frameworks and APIs: An Exploratory Study of Programmers’ Newsgroup Discussions. IEEE 19th International Conference on Program Comprehension (ICPC), 91-100.
Hsu, S., & Lin, S. (2010a). A Multi-block Scheme for Searching Source Codes, Proceedings of International Computer Symposium (ICS), 608-613.
Hsu, S., & Lin, S. (2010b). Mining Source Codes to Guide Software Development. ACIIDS 2010, Part I, LNCS/LNAI 5990, 445-454.
Hsu, S., & Lin, S. (2011a). A Block-Structured Model for Source Code Retrieval. ACIIDS 2011, LNCS/LNAI, 6592, 161-170.
Hsu, S., & Lin, S. (2011b). MACs: Mining API Code Snippets for Code Reuse. Expert Systems with Applications, Vol. 38, No. 6, 7291-7301.
Hyde, R. (2006). Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level. No Starch Press.
Jian, P., Jiawei, H., Mortazavi-Asl, B., Jianyong, W., Pinto, H., Qiming, C., et al. (2004). Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424-1440.
Johnson, R. E. (1992). Documenting frameworks using patterns. In Proceedings of the Object-oriented programming systems, languages, and applications.
Kagdi, H., Collard, M. L., & Maletic, J. I. (2007). A survey and taxonomy of approaches for mining software repositories in the context of software evolution. Journal of Software Maintenance and Evolution-Research and Practice, 19(2), 77-131.
Keivanloo, I., Forbes C., Hmood A., Erfani M., Neal, C., Peristerakis, G., & Rilling, J. (2012) A Linked Data platform for mining software repositories. Working Conference on Mining Software Repositories (MSR 12), 32-35.
Koch, S. (2007). Software evolution in open source projects - a large-scale investigation. Journal of Software Maintenance and Evolution-Research and Practice, 19(6), 361-382.
Koders. (2013). Koders.com. Retrieved from http://www.koders.com/.
Krugle. (2013). Krugle.com. Retrieved from http://www.krugle.com/.
Lemos, O. A. L., Bajracharya, S. K., Ossher, J., Morla, R. S., Masiero, P. C., Baldi, P., & Lopes, C. V. (2007). Codegenie: using test-cases to search and reuse source code. Proc. ASE. The twenty-second IEEE/ACM international conference on Automated software engineering (ASE’07), 525-526.
Lethbridge, T. C., Singer, J., & Forward, A. (2003). How software engineers use documentation: the state of the practice. IEEE Software, 20(6), 35-39.
Li, Z., Lu, S., Myagmar, S., & Zhou, Y. (2006). CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering, 32(3), 176-192.
Li, Z., & Zhou, Y. (2005). PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software codes. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE), 306–315.
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., & Baldi, P. (2009). Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2), 300-336.
Lucia, A. D., Penta, M. D., Oliveto, R. (2011). Improving Source Code Lexicon via Traceability and Information Retrieval. IEEE Transactions on Software Engineering, v.37 n.2, 205-227.
Mandelin, D., Xu, L., Bod, R., & Kimelman, D. (2005). Jungloid mining: helping to navigate the API jungle. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation.
Maarek, Y. S., Berry, D. M., & Kaiser, G. E. (1991). An information retrieval approach for automatically constructing software libraries. IEEE Trans. on Software Engineering Vol. 17(8), 800-813.
Michail, A. (2000). Data mining library reuse patterns using generalized association rules. In Proceedings of the 2000 International Conference on Software Engineering, 167-176.
Michail, A. (2001). CodeWeb: data mining library reuse patterns. In Proceedings of the 2001 International Conference on Software Engineering, 827-828.
Michail, A., & Xie, T. (2005). Helping users avoid bugs in GUI applications. In Proceedings of the 27th International Conference on Software Engineering, ICSE 2005, 107-116.
Mileva, Y., Dallmeier, V., & Zeller, A. (2010). Mining API Popularity. In Testing - Practice and Research Techniques, Lecture Notes in Computer Science. vol. 6303, 173-180.
Mockus, A., Ping, Z., & Li, P. L. (2005). Predictors of customer perceived software quality. In Proceedings of the 27th International Conference on Software Engineering, 225-233.
Moreno Garcia, M. N., Ramos Roman, I., Garcia Penalvo, F. J., & Toro Bonilla, M. (2008). An association rule mining method for estimating the impact of project management policies on software quality, development time and effort. Expert Systems with Applications, 34(1), 522-529.
Pan, K., Kim, S. H., & Whitehead, E. J. (2009). Toward an understanding of bug fix patterns. Empirical Software Engineering, 14(3), 286-315.
Purushothaman, R., & Perry, D. E. (2005). Toward understanding the rhetoric of small source code changes. IEEE Transactions on Software Engineering, 31(6), 511-526.
Qinbao, S., Shepperd, M., Cartwright, M., & Mair, C. (2006). Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), 69-82.
Ray-Yaung, C., Podgurski, A., & Jiong, Y. (2008). Discovering Neglected Conditions in Software by Mining Dependence Graphs. IEEE Transactions on Software Engineering, 34(5), 579-596.
Robillard, M. P. (2009). What makes APIs hard to learn? Answers from developers. IEEE Software, 26(6):26–34.
Robillard, M., & DeLine, R. (2011) A Field Study of API Learning Obstacles. Empirical Software Engineering, 16, 703-732.
Sahavechaphan, N., & Claypool, K. (2006). XSnippet: mining For sample code. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for information retrieval. Communications of the ACM, 18(11):613–620.
Shafer, J. C., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. In Proceedings of the Twenty-Second International Conference on Very Large Databases, 544--555.
Sunghun, K., Whitehead, E. J., & Yi, Z. (2008). Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering, 34(2), 181-196.
Thummalapenta, S., & Xie, T. (2008). SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering 327-336.
Turhan, B., Kocak, G., & Bener, A. (2009). Data mining source code for locating software bugs: A case study in telecommunication industry. Expert Systems with Applications, 36(6), 9986-9990.
SourceForge. (2013). SourceForge.net. Retrieved from http://www.sourceforge.net/.
Whitehead, J., & Zimmermann, T. (2012). Introduction to the Special Issue on Mining Software Repositories in 2010. Expir Software Eng, 17, 500-502.
Williams, C. C., & Hollingsworth, J. K. (2005). Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering, 31(6), 466-480.
Xie, T., & Pei, J. (2006). MAPO: mining API usages from open source repositories. In Proceedings of the 2006 international workshop on Mining software repositories.
Xie, T., Thummalapenta, Lo, S. D., & Liu, C. (2009). Data mining for Software Engineering. IEEE Computer.
Ying, A. T. T., Murphy, G. C., Ng, R., & Chu-Carroll, M. C. (2004). Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9), 574-586.
Zimmermann, T., Zeller, A., Weissgerber, P., & Diehl, S. (2005). Mining version histories to guide software changes. IEEE Transactions on Software Engineering, 31(6), 429-445.
指導教授 林熙禎(Shi-Jen Lin) 審核日期 2013-7-16
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明