博碩士論文 974203001 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator祝亞琪zh_TW
DC.creatorYa-chi Chuen_US
dc.date.accessioned2010-6-29T07:39:07Z
dc.date.available2010-6-29T07:39:07Z
dc.date.issued2010
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=974203001
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract近年來,隨著開放原始碼的普及,為了增進軟體生產的效率、縮短開發時程,程式開發者越來越傾向於找尋現存且適當的開放原始碼加以修改,以減少開發的時間與成本,因此成了一種新的網路服務─程式碼搜尋。現今有許多程式碼搜尋引擎提供使用者便利的管道來取得一些已經存在的類別或是架構所提供的應用程式介面 (Application Programming Interfaces, API),希望藉此幫助開發者尋找更為有幫助的資料。然而透過搜尋引擎在網路上所找回的程式碼結果,往往不能符合程式開發者的需求,過多且複雜的程式碼檔案讓程式開發者難以理解導致程式開發者無法過濾所需,而無法快速的應用的資源。 因此,在本研究中針對程式碼搜尋提出一個改良的系統架構,首先針對Koders搜尋引擎經過適當的過濾步驟下載有關查詢的程式碼至儲存庫,下載的程式檔案透過程式碼的抽象語意樹(Abstract Syntax Tree)擷取出程式碼重要的API,即時的利用正規化Google距離(Normalized Google Distance)的概念算出與查詢的相關性並重新排序,另外利用程式的結構性以資料探勘的階層演算法進行分群將搜尋結果重新分群,最後在每一個群集上賦予具有語意的標籤,以利使用者在沒有相關專業背景的情況下也能過濾找到適當的群集快速開發應用。最後,本研究將使用查準率(Precision)和查全率(Recall)及案例的方式當作評估系統是否能提升搜尋結果品質的衡量指標,並且與其他相關的研究進行比較。 zh_TW
dc.description.abstractWith the popularity of open source software, many people have the willing to share their projects via internet. In order to enhance the efficiency of software production, program developers try to search the existing open source software on the web. Therefore a new internet service, code search engine, emerged from the network. Although search engines provide a convenient way to assist developers to reuse the existing Application Programming Interfaces, the search results obtained from the search engines do not always satisfy the requirement of developers. Numerous and complex search results make developers hard to reuse the code quickly. We proposed a system architecture which is able to solve the problem we mention above: First, we store the related data which is extracted from the search results of Koders in the local repository. Second, we convert every file into the abstract syntax tree format to get the structural data. Third, we cluster and compute every file’s normalized Google distance value through the structural data. And then we will re-rank the search results according to the Google distance value. Four, we will give some semantic tags to each cluster and hope it can help user to find the right cluster quickly. Finally, we use precision and recall value as an index to evaluate the proposed system architecture’s performance about clustering. Furthermore, we also use a case to explain whether the proposed system architecture can effectively help developers to find the useful source code, and compare with related academic research. en_US
DC.subject程式碼搜尋zh_TW
DC.subject程式碼排序zh_TW
DC.subject開放原始碼zh_TW
DC.subject正規化Google距離zh_TW
DC.subject階層演算法zh_TW
DC.subject抽象語法樹zh_TW
DC.subjectOpen Source Code.en_US
DC.subjectNormalized Google Distanceen_US
DC.subjectAbstract Syntax Treeen_US
DC.subjectCluster Analysisen_US
DC.subjectCode Search Engineen_US
DC.title運用NGD提升程式碼搜尋品質zh_TW
dc.language.isozh-TWzh-TW
DC.titleUSING NORMALIZED GOOGLE DISTANCE TO REFINE CODE SEARCH RESULTSen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明