博碩士論文 109552008 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系在職專班zh_TW
DC.creator李昱德zh_TW
DC.creatorYu-De Lien_US
dc.date.accessioned2022-7-4T07:39:07Z
dc.date.available2022-7-4T07:39:07Z
dc.date.issued2022
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=109552008
dc.contributor.department資訊工程學系在職專班zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract在知識圖譜的快速發展,如今相關應用越來越豐富,但伴隨著資料量日漸龐大,使圖譜建立上的複雜度也提高,如果建立圖譜的時間成本過高,則會影響服務的實時性,所以如何在大資料量的情境下,高效的建立知識圖譜是一個重要的議題。本論文以基於Hadoop + Spark分散式運算的架構來針對RDB-to-RDF的情境來建立知識圖譜建置系統,並搭建了實驗環境,與Antidot公司開源DB2Triples系統作為比較的對象。在實驗環境中,採用了圖書館借閱開放資料,一共約有近九千三百萬筆的資料,來模擬現實中大量資料的情境,比較時採以漸進式的方式,從小的資料量開始累進至大的資料量,來比較在不同的資料量時的圖譜建置效能,論文最後則依據實驗數據來做綜合性的評比。從實驗結果可以得知,在千筆資料以內時,DB2Triples擁有比較快速的圖譜建置時間,但約莫到兩千筆資料時,本論文所實現的分散式系統已實現反超,到達一萬筆時,已經快了約六倍,且隨著資料量的累進,差距則越來越明顯。zh_TW
dc.description.abstractWith the rapid development of knowledge graphs, related applications are becoming more and more abundant, but with the increasing amount of data, the complexity of graph establishment has also increased. If the time cost of establishing graphs is too long, it will affect the real-time performance of services. Therefore, how to efficiently build a knowledge graph in the context of a large amount of data is an important issue. In this paper, according to the distributed computing architecture based on Hadoop + Spark, a knowledge graph construction system is established for the RDF-to-RDF situation, and an experimental environment is built, which is compared with Antidot′s open-source DB2Triples system. In the experiment, the library’s open-source materials were borrowed, with a total of nearly 93 million pieces of materials, to simulate the situation of a large number data source in reality. This experiment adopts a progressive method when comparing, in order to compare the building performance with different amounts of data. Starting from a small amount of data and increasing to a large amount of data. Finally, make a comprehensive evaluation based on the experimental data. From the experimental results, when there is less than 1000 data , DB2Triples has a faster building time. But when it reaches about 2,000 data, distributed computing has surpassed the former. When it reaches about 10,000 data, distributed computing is now 6 times faster. The gap increases as the amount of data increasesen_US
DC.subject知識圖譜建置zh_TW
DC.subject分散式運算zh_TW
DC.subject知識圖譜zh_TW
DC.subjectRDB-to-RDFzh_TW
DC.subjectknowledge graphs buildingen_US
DC.subjectdistributed computingen_US
DC.subjectknowledge graphsen_US
DC.subjectRDB-to-RDFen_US
DC.subjectsparken_US
DC.title基於分散式運算的知識圖譜建置系統zh_TW
dc.language.isozh-TWzh-TW
DC.titleDistributed Computing for Building Knowledge Graph Systemen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明