dc.description.abstract | With the rapid development of knowledge graphs, related applications are becoming more and more abundant, but with the increasing amount of data, the complexity of graph establishment has also increased. If the time cost of establishing graphs is too long, it will affect the real-time performance of services. Therefore, how to efficiently build a knowledge graph in the context of a large amount of data is an important issue. In this paper, according to the distributed computing architecture based on Hadoop + Spark, a knowledge graph construction system is established for the RDF-to-RDF situation, and an experimental environment is built, which is compared with Antidot′s open-source DB2Triples system. In the experiment, the library’s open-source materials were borrowed, with a total of nearly 93 million pieces of materials, to simulate the situation of a large number data source in reality. This experiment adopts a progressive method when comparing, in order to compare the building performance with different amounts of data. Starting from a small amount of data and increasing to a large amount of data. Finally, make a comprehensive evaluation based on the experimental data. From the experimental results, when there is less than 1000 data , DB2Triples has a faster building time. But when it reaches about 2,000 data, distributed computing has surpassed the former. When it reaches about 10,000 data, distributed computing is now 6 times faster. The gap increases as the amount of data increases | en_US |