English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 78818/78818 (100%)
造訪人次 : 34731074      線上人數 : 798
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/89742


    題名: 基於分散式運算的知識圖譜建置系統;Distributed Computing for Building Knowledge Graph System
    作者: 李昱德;Li, Yu-De
    貢獻者: 資訊工程學系在職專班
    關鍵詞: 知識圖譜建置;分散式運算;知識圖譜;RDB-to-RDF;knowledge graphs building;distributed computing;knowledge graphs;RDB-to-RDF;spark
    日期: 2022-07-04
    上傳時間: 2022-10-04 11:58:15 (UTC+8)
    出版者: 國立中央大學
    摘要: 在知識圖譜的快速發展,如今相關應用越來越豐富,但伴隨著資料量日漸龐大,使圖譜建立上的複雜度也提高,如果建立圖譜的時間成本過高,則會影響服務的實時性,所以如何在大資料量的情境下,高效的建立知識圖譜是一個重要的議題。本論文以基於Hadoop + Spark分散式運算的架構來針對RDB-to-RDF的情境來建立知識圖譜建置系統,並搭建了實驗環境,與Antidot公司開源DB2Triples系統作為比較的對象。在實驗環境中,採用了圖書館借閱開放資料,一共約有近九千三百萬筆的資料,來模擬現實中大量資料的情境,比較時採以漸進式的方式,從小的資料量開始累進至大的資料量,來比較在不同的資料量時的圖譜建置效能,論文最後則依據實驗數據來做綜合性的評比。從實驗結果可以得知,在千筆資料以內時,DB2Triples擁有比較快速的圖譜建置時間,但約莫到兩千筆資料時,本論文所實現的分散式系統已實現反超,到達一萬筆時,已經快了約六倍,且隨著資料量的累進,差距則越來越明顯。;With the rapid development of knowledge graphs, related applications are becoming more and more abundant, but with the increasing amount of data, the complexity of graph establishment has also increased. If the time cost of establishing graphs is too long, it will affect the real-time performance of services. Therefore, how to efficiently build a knowledge graph in the context of a large amount of data is an important issue. In this paper, according to the distributed computing architecture based on Hadoop + Spark, a knowledge graph construction system is established for the RDF-to-RDF situation, and an experimental environment is built, which is compared with Antidot′s open-source DB2Triples system. In the experiment, the library’s open-source materials were borrowed, with a total of nearly 93 million pieces of materials, to simulate the situation of a large number data source in reality. This experiment adopts a progressive method when comparing, in order to compare the building performance with different amounts of data. Starting from a small amount of data and increasing to a large amount of data. Finally, make a comprehensive evaluation based on the experimental data. From the experimental results, when there is less than 1000 data , DB2Triples has a faster building time. But when it reaches about 2,000 data, distributed computing has surpassed the former. When it reaches about 10,000 data, distributed computing is now 6 times faster. The gap increases as the amount of data increases
    顯示於類別:[資訊工程學系碩士在職專班 ] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML393檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明