運用NGD提升程式碼搜尋品質

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	祝亞琪	zh_TW
DC.creator	Ya-chi Chu	en_US
dc.date.accessioned	2010-6-29T07:39:07Z
dc.date.available	2010-6-29T07:39:07Z
dc.date.issued	2010
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=974203001
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年來，隨著開放原始碼的普及，為了增進軟體生產的效率、縮短開發時程，程式開發者越來越傾向於找尋現存且適當的開放原始碼加以修改，以減少開發的時間與成本，因此成了一種新的網路服務─程式碼搜尋。現今有許多程式碼搜尋引擎提供使用者便利的管道來取得一些已經存在的類別或是架構所提供的應用程式介面 (Application Programming Interfaces, API)，希望藉此幫助開發者尋找更為有幫助的資料。然而透過搜尋引擎在網路上所找回的程式碼結果，往往不能符合程式開發者的需求，過多且複雜的程式碼檔案讓程式開發者難以理解導致程式開發者無法過濾所需，而無法快速的應用的資源。因此，在本研究中針對程式碼搜尋提出一個改良的系統架構，首先針對Koders搜尋引擎經過適當的過濾步驟下載有關查詢的程式碼至儲存庫，下載的程式檔案透過程式碼的抽象語意樹(Abstract Syntax Tree)擷取出程式碼重要的API，即時的利用正規化Google距離(Normalized Google Distance)的概念算出與查詢的相關性並重新排序，另外利用程式的結構性以資料探勘的階層演算法進行分群將搜尋結果重新分群，最後在每一個群集上賦予具有語意的標籤，以利使用者在沒有相關專業背景的情況下也能過濾找到適當的群集快速開發應用。最後，本研究將使用查準率(Precision)和查全率(Recall)及案例的方式當作評估系統是否能提升搜尋結果品質的衡量指標，並且與其他相關的研究進行比較。	zh_TW
dc.description.abstract	With the popularity of open source software, many people have the willing to share their projects via internet. In order to enhance the efficiency of software production, program developers try to search the existing open source software on the web. Therefore a new internet service, code search engine, emerged from the network. Although search engines provide a convenient way to assist developers to reuse the existing Application Programming Interfaces, the search results obtained from the search engines do not always satisfy the requirement of developers. Numerous and complex search results make developers hard to reuse the code quickly. We proposed a system architecture which is able to solve the problem we mention above: First, we store the related data which is extracted from the search results of Koders in the local repository. Second, we convert every file into the abstract syntax tree format to get the structural data. Third, we cluster and compute every file’s normalized Google distance value through the structural data. And then we will re-rank the search results according to the Google distance value. Four, we will give some semantic tags to each cluster and hope it can help user to find the right cluster quickly. Finally, we use precision and recall value as an index to evaluate the proposed system architecture’s performance about clustering. Furthermore, we also use a case to explain whether the proposed system architecture can effectively help developers to find the useful source code, and compare with related academic research.	en_US
DC.subject	程式碼搜尋	zh_TW
DC.subject	程式碼排序	zh_TW
DC.subject	開放原始碼	zh_TW
DC.subject	正規化Google距離	zh_TW
DC.subject	階層演算法	zh_TW
DC.subject	抽象語法樹	zh_TW
DC.subject	Open Source Code.	en_US
DC.subject	Normalized Google Distance	en_US
DC.subject	Abstract Syntax Tree	en_US
DC.subject	Cluster Analysis	en_US
DC.subject	Code Search Engine	en_US
DC.title	運用NGD提升程式碼搜尋品質	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	USING NORMALIZED GOOGLE DISTANCE TO REFINE CODE SEARCH RESULTS	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 974203001 完整後設資料紀錄