藉由資料探勘的排序方式提昇程式碼搜尋品質─以Koders為例; Using Data Mining Technology to Refine Koders Code Search Results

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/13528

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/13528

Title:	藉由資料探勘的排序方式提昇程式碼搜尋品質─以Koders為例;Using Data Mining Technology to Refine Koders Code Search Results
Authors:	廖振傑;Jhen-jie Liao
Contributors:	資訊管理研究所
Keywords:	資料探勘;開放原始碼;程式碼搜尋引擎;階層演算法;群集分析;Cluster Analysis;Code Search Engine;Open Source Code;Data Mining
Date:	2009-06-24
Issue Date:	2009-09-22 15:34:10 (UTC+8)
Publisher:	國立中央大學圖書館
Abstract:	隨著開放原始碼軟體的普及與日益倍增，有愈來愈多的開放原始碼可以從網路上取得。因而興起了一種新的網路服務─程式碼搜尋。程式碼搜尋引擎提供了程式開發者一個便利的管道，幫助程式開發者快速使用一些已經存在的類別或架構所提供的應用程式介面 (Application Programming Interfaces, APIs) ，藉此提昇軟體生產效率。然而這些從網路上所取得的程式碼搜尋結果，往往無法有效的解決程式開發者的需求。主要是因為有許多相似或不相關的檔案出現於程式碼搜尋結果之中，造成程式開發者無法快速取得有用的程式碼。因此本研究提出一個改良搜尋引擎的系統架構，透過自己撰寫的網頁擷取程式將 Koders 的搜尋結果存取至資料庫當中；再透過本研究定義的資料前處理動作，進行資料清理。不只是使用關鍵字搜尋還考慮到程式的結構化特性；之後再透過資料探勘的階層演算法進行分群與重新排序，並且在每一個群集上賦予新的標籤，希冀可以使得搜尋結果更符合使用者的需求。最後本研究使用案例的方式來解釋所提出的系統架構是否可以有效改善搜尋結果，並且與相關的學術研究做比較與分析。 With the popularity of open source software, there are more and more source codes could be downloaded over the Internet. Thus a new Internet service, code search engine emerged. Code search engine provides a convenient way to help developers to reuse existing Application Programming Interfaces (APIs) and improve software productivity. However, these search results obtained from the code search engine cannot effectively satisfy developers’ needs. This is because there are many unrelated files appear in code search results and it makes the developer couldn’t get useful code quickly. Therefore, we propose a system architecture to improve the existing search engine. First, we develop a web program to extract the Koders’ search results and store the data to the local repository. Second, we define a rule to filter unrelated files and parse these files into the database format in the data preprocessing stage. Third, some data mining algorithms were used to cluster and re-rank the Koders’ search results. Fourth, we use some unique tags to identify clusters and expect the search results can satisfy the developers’ needs. Finally, we use a case to explain whether the proposed system architecture can effectively help developers to find out the useful source code, and compare with related prior research.
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Size	Format

社群 sharing

Loading...