dc.description.abstract | In today′s knowledge economy, the management and classification of patents are crucial for protecting innovative results. With the number of patent applications increasing, traditional manual classification methods are inefficient and costly. Thus, developing accurate and efficient automated methods for patent classification has become imperative. In recent years, advancements in artificial intelligence′s natural language processing, particularly pre-trained language models like BERT and SBERT, have shown excellent performance in text classification tasks, opening new opportunities for automated patent classification. This study aims to explore how advanced AI technologies, based on the SBERT model, can be utilized to construct machine learning and deep learning methods to enhance the accuracy of patent document classification. We assessed the effectiveness of SBERT in handling the complexity and large volume of patent texts and explored the performance of various pre-trained models in patent classification tasks. To validate the effectiveness of the proposed methods, this study utilized publicly available patent data from Taiwan from 2015 to 2023, totaling 136,013 patent cases. We used 115,008 of these as a training set and the remaining 21,005 as a test set. In our experiments, we employed 10 different pre-trained models to extract features from various textual components of patents, such as titles, abstracts, claims, and descriptions. Subsequently, we used cosine similarity-based classification methods and machine learning classifiers to predict the International Patent Classification (IPC) codes at multiple levels. The effectiveness of various models and classification strategies was comprehensively assessed using metrics such as accuracy, recall, and F1 score. The experimental results show that using the SBERT-based DBMC_V1 model, combined with the complete descriptive text of patents as features, and employing a cosine similarity-based optimistic approach for classification, achieves the best performance in the three-level IPC classification tasks. Additionally, the study found that adopting different model combination strategies for classification tasks with different data can further enhance classification effectiveness. The SBERT-based approach demonstrated significant superiority in patent classification tasks, but there are still some limitations worth noting, such as imbalanced data categories and a lack of specialized model optimization, which need to be further explored and improved in future work. | en_US |