藉由資料探勘的排序方式提昇程式碼搜尋品質─以Koders為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：18.221.52.77

姓名

廖振傑(Jhen-jie Liao) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

藉由資料探勘的排序方式提昇程式碼搜尋品質─以Koders為例
(Using Data Mining Technology to Refine Koders Code Search Results)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著開放原始碼軟體的普及與日益倍增，有愈來愈多的開放原始碼可以從網路上取得。因而興起了一種新的網路服務─程式碼搜尋。程式碼搜尋引擎提供了程式開發者一個便利的管道，幫助程式開發者快速使用一些已經存在的類別或架構所提供的應用程式介面 (Application Programming Interfaces, APIs) ，藉此提昇軟體生產效率。然而這些從網路上所取得的程式碼搜尋結果，往往無法有效的解決程式開發者的需求。主要是因為有許多相似或不相關的檔案出現於程式碼搜尋結果之中，造成程式開發者無法快速取得有用的程式碼。
因此本研究提出一個改良搜尋引擎的系統架構，透過自己撰寫的網頁擷取程式將 Koders 的搜尋結果存取至資料庫當中；再透過本研究定義的資料前處理動作，進行資料清理。不只是使用關鍵字搜尋還考慮到程式的結構化特性；之後再透過資料探勘的階層演算法進行分群與重新排序，並且在每一個群集上賦予新的標籤，希冀可以使得搜尋結果更符合使用者的需求。
最後本研究使用案例的方式來解釋所提出的系統架構是否可以有效改善搜尋結果，並且與相關的學術研究做比較與分析。

摘要(英)

With the popularity of open source software, there are more and more source codes could be downloaded over the Internet. Thus a new Internet service, code search engine emerged. Code search engine provides a convenient way to help developers to reuse existing Application Programming Interfaces (APIs) and improve software productivity. However, these search results obtained from the code search engine cannot effectively satisfy developers’ needs. This is because there are many unrelated files appear in code search results and it makes the developer couldn’t get useful code quickly.
Therefore, we propose a system architecture to improve the existing search engine. First, we develop a web program to extract the Koders’ search results and store the data to the local repository. Second, we define a rule to filter unrelated files and parse these files into the database format in the data preprocessing stage. Third, some data mining algorithms were used to cluster and re-rank the Koders’ search results. Fourth, we use some unique tags to identify clusters and expect the search results can satisfy the developers’ needs.
Finally, we use a case to explain whether the proposed system architecture can effectively help developers to find out the useful source code, and compare with related prior research.

關鍵字(中)

★ 資料探勘
★ 開放原始碼
★ 程式碼搜尋引擎
★ 階層演算法
★ 群集分析

關鍵字(英)

★ Cluster Analysis
★ Code Search Engine
★ Open Source Code
★ Data Mining

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 研究方法 4
1.5 論文架構 5
第二章文獻探討 6
2.1 開放原始碼簡介 6
2.2 程式碼比對 8
2.3 程式碼排序 10
2.4 資料標籤辨識 12
2.5 資料探勘與群集分析 14
2.5.1 資料探勘簡介 14
2.5.2 群集分析 14
2.6 小結 20
第三章系統設計與架構 21
3.1 系統架構 21
3.2 程式碼搜尋引擎 22
3.3 資料前處理 23
3.4 程式碼擷取 24
3.5 資料探勘與排序 26
第四章實驗結果與討論 30
4.1 系統實作與案例說明 30
4.2 演算法效能評估 34
4.3 系統效能評估 38
4.4 相關搜尋引擎比較 43
4.5 相關研究比較 46
第五章結論與未來研究方向 48
5.1 結論 48
5.2 未來研究方向 49
參考文獻 51
中文參考文獻 51
英文參考文獻 51
網頁資料 53

參考文獻

1. 平震宇，「一個適用於行動裝置的網頁搜尋結果分群系統之研究」，元智大學資訊管理研究所碩士論文，2007。
2. 洪菁憶，「循序探勘在軟體版本控制上的應用」，中央大學資訊管理研究所碩士論文，2008。
3. 陳文華，「應用資料倉儲系統建立CRM」，資訊與電腦，pp.122-127，1999。
4. 龔良民，「衍生性群集分析方法之探定理論與應用」，中山大學資訊管理研究所碩士論文，1998。
5. Bajracharya, S., Ngo, T., Linstead, E., Dou, Y., Rigor, P., Baldi, P., and Lopes, C., “Sourcerer: A search engine for open source code supporting structure-based search.” In Proc. of OOPSLA’06 Companion, pp. 25-26, 2006.
6. Berry, M. J. A., and Linoff, G., “Data Mining Technique for Marketing.” Sale, and Customer Support, Wiley Computer, 1997.
7. Day, W. H. E., and Edelsbrunner, H., “Efficient algorithms for agglomerative hierarchical clustering methods.” Journal of Classification (1:1), pp. 7-24, 1984.
8. Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J., “Knowledge discovery in databases: An overview.” AI Magazine (13:3), pp. 57-70, 1992.
9. Grupe, F. H., and Owrang, M. M., “Database Mining Discovering New Knowledge and Cooperative Advantage,” Information System Management (12:4), pp. 26-30, 1995.
10. Holmes, R., and Murphy, G. C., “Using structural context to recommend source code examples.” 27th International Conference on Software Engineering, pp. 117-125, 2005.
11. Holmes, R., Walker, R. J., and Murphy, G. C., “Approximate structural context matching: An approach to recommend relevant examples.” IEEE Transactions on Software Engineering (32:12), pp. 958-970, 2006.
12. Jiawei, H., and Micheline, K., “Data Mining：Concepts and Techniques,” Morgan Kaufmann, pp. 59-60, 2001.
13. Kaufman, L., and Rousseeuw, P. J., “Finding Groups in Data: An Introduction to Cluster Analysis.” John Wiley & Sons Inc, 2005
14. Kawaguchi, S., Garg, P. K., Matsushita, M., and Inoue, K., “Automatic categorization algorithm for evolvable software archive.” 6th International Workshop on Principles of Software Evolution, pp. 195-200, 2003.
15. Kuhn, A., Ducasse, S., and Gírba, T., “Semantic clustering: Identifying topics in source code.” Information and Software Technology (49:3), pp.230-243, 2007.
16. Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., and Baldi, P., “Mining concepts from code with probabilistic topic models.” Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, November 05-09, 2007.
17. Lorigo, L., Pan, B., Hembrooke, H., Joachims, T., Granka, L., and Gay, G., “The Influence of Task and Gender on Search and Evaluation Behavior Using Google.” Information Processing and Management (42), pp. 1123-1131, 2006.
18. Mandelin, D., Xu, L., Bodik, R., and Kimelman, D., “Jungloid mining: helping to navigate the API jungle.” In Proc. of PLDI 2005, pp. 48-61, 2005.
19. Rousidis, D., and Tjortjis, C., “Clustering Data Retrieved from Java Source Code to Support Software Maintenance: A Case Study.” Proceedings of the Ninth European Conference on Software Maintenance and Reengineering, pp.276-279, 2005.
20. Sahavechaphan, N., and Claypool, K., “XSnippet:Mining for sample code.” In Proc. of OOPSLA, pp. 413–430, 2006.
21. Thummalapenta, S., and Xie, T., “PARSEWeb：A Programmer Assistant for Reusing Open Source Code on the Web.” In Proc. of ASE 2007, pp. 204-213, 2007.
22. Xie, T., and Pei, J., “MAPO: Mining API usages from open source repositories.” In Proc. of MSR’06, pp. 54-57, 2006.
23. 自由軟體鑄造場(Open Source Software Foundry), http://www.openfoundry.org/
24. Codase source code search engine, http://www.codase.com/
25. Google Code Search Engine, http://www.google.com/codesearch/
26. Koders source code search engine, http://www.koders.com/
27. Krugle source code search engine, http://www.krugle.org/
28. SourceForge.net: Open Source Software, http://sourceforge.net/

指導教授

林熙禎(Shi-jen Lin)

審核日期

2009-7-13

推文