非監督式異常偵測方法之比較研究— 以經費報銷流程為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：2

、訪客IP：3.147.242.19

姓名

孫逸群(YI-CHUN SUN) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

非監督式異常偵測方法之比較研究— 以經費報銷流程為例
(Unsupervised Anomaly Detection in Reimbursement Processes: A Comparative Evaluation of Algorithms)

相關論文

★ 運用資料探勘法探討台灣老年人口全民健保醫療資源利用之研究	★ 運用地理資訊系統與資料探勘技術於基層診所選址分析與研究─以台北市為例
★ 以醫師觀點探討看診輔助系統建置之研究	★ 以創新抗拒觀點探討消費者對客服機器人使用意圖之研究
★ 網路拍賣頁面相關的服務品質對賣家經營績效之影響	★ 多重商品類別的線上再購行為預測模型
★ 以使用與滿足理論與科技接受模式探討人機介面對網購意願之影響	★ 整合網路口碑之個人化醫療院所推薦系統-以牙醫診所為例
★ 網路口碑影響智慧型手機銷售量的時間動態分析	★ 運用資料探勘技術於建置招生決策支援系統之研究
★ 評估臨床決策支援系統對候診時間與醫病關係之影響	★ 高等教育招生決策支援系統建構之研究
★ 以社會網路分析觀點探討巨量資料在健康保健領域之研究發展	★ 醫療App人機互動設計對使用者滿意度之研究
★ 社群媒體粉絲頁經營之研究─ 以Facebook某健康粉絲頁為例	★ 基於網路口碑與醫療利用理論之混合式推薦系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-1以後開放)

摘要(中)

在現代商業流程中，資訊系統所產生的有價值的事件紀錄檔在各種研究和調查中扮演著關鍵的角色。這些資料在識別異常值、檢測詐騙、以及支持流程改進和風險管理方面都有著至關重要的作用。而在異常偵測的不同類別中，非監督式異常檢測技術因其較少的要求所帶來的高實用性在現實世界應用中顯著出色。過去有關商業流程異常檢測的研究大多主要集中在利用特定的異常偵測技術，較少比較性研究，並通常未使用公共資料集。

因此，本研究的目標是通過訓練和比較五種較具有代表性的非監督式異常偵測方法所建立之模型效能，為未來研究人員在利用事件紀錄檔進行相關非監督式異常檢測時建立可靠的參考基礎。為了確保結果的可靠性，本研究使用了三個分別記錄了不同的報銷流程的真實世界事件紀錄檔以進行模型訓練並回答研究問題。此外，本研究基於事件紀錄檔常見的三個基本元素（時間、資源、活動），定義了七個異常情境，以便比較不同模型之間的性能表現。通過實驗結果的評估和比較後，我們發現局部異常因子偵測方法（ local outlier factor, LOF）是以報銷流程的事件紀錄檔進行異常偵測時，最適用之非監督式異常檢測方法。

摘要(英)

Information systems generate valuable log data in modern business processes crucial in various investigations. Anomaly detection using this data is essential for identifying outliers, detecting fraud, and supporting process improvement and risk management. Among the different categories of anomaly detection, unsupervised anomaly detection techniques stand out for their practicality in real-world applications, thanks to their minimal requirements. Previous research on anomaly detection in business processes has predominantly concentrated on
utilizing specific anomaly detection techniques, which lack comparison between models and are often conducted without employing public datasets. Therefore, this research aims to establish a reliable foundation for future researchers interested in utilizing log data for unsupervised anomaly detection in business processes.

This is achieved by training and comparing five representative unsupervised anomaly detection algorithms. To ensure the reliability and robustness of the results, three real live event log datasets, capturing distinct reimbursement processes, are utilized to address the research questions. Additionally, seven anomaly scenarios, based on the three essential elements (time, resource, activity) commonly found in event logs, are defined to facilitate the comparison of performance between different models. Through evaluation and comparison, it is revealed that the local outlier factor (LOF) is the most suitable unsupervised algorithm for detecting anomalies in reimbursement process event logs.

關鍵字(中)

★ 非監督式異常偵測算法
★ 事件紀錄檔
★ 報銷流程

關鍵字(英)

★ Unsupervised anomaly detection
★ Event log
★ Reimbursement process

論文目次

Table of Contents

Chinese Abstract i
English Abstract ii
Acknowledgments iii
Table of Contents iv
List of Figures vi
List of Tables vii
Chapter I. Introduction 1
1-1 Research background 1
1-2 Research purpose 2
Chapter II Literature Review 4
2-1 Anomaly detection 4
2-1-1 Anomalies 5
2-1-2 Types of anomalies 6
2-1-3. Anomaly detection algorithm categories 7
2-2 Business process and event log 10
2-3 Unsupervised anomaly detection algorithms 12
2-3-1 Nearest-neighbor-based techniques 12
2-3-2 Cluster-based techniques 15
2-3-3 Statistical techniques 16
2-3-4 Other techniques 17
2-4 Related work 18
Chapter III Research Methodology 19
3-1 Research procedure 19
3-2 Dataset and data mutation 22
3-2-1 Dataset 22
3-2-2 Data mutation 23
3-3 Algorithm performance evaluation 25
Chapter IV Research Result 26
4-1 Comparison of models’ performance 26
4-1-1 Algorithms’ performance in the scenario of advanced time 26
4-1-2 Algorithms’ performance in the scenario of delayed time 28
4-1-3 Algorithms’ performance in the scenario of shifted time 30
4-1-4 Algorithms’ performance in the scenario of mixed mutation on time 32
4-1-5 Algorithms’ performance in the scenario of misplaced resource 34
4-1-6 Algorithms’ performance in the scenario of misplaced event 36
4-1-7 Algorithms’ performance in the scenario of all mixed mutation 39
4-2 Comparative evaluation 42
4-2-1 Model performance compared over algorithms and scenarios 42
4-2-2 Model performance compared over different anomaly rate 45
4-2-3 Model performance compared over expense data’s existence 49
4-3 Discussion 52
4-3-1 Research findings 52
4-3-2 Comparison with related work. 54
Chapter V. Conclusion 56
5-1 Contributions 56
5-2 Limitations 59
5-3 Future direction 59
Bibliography 61
Appendix A 64
Hardware environment 64
Software environment 64

參考文獻

Abe, N., Zadrozny, B., & Langford, J. (2006). Outlier detection by active learning. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining,
Aggarwal, C. C., & Aggarwal, C. C. (2017). An introduction to outlier analysis. Springer.
Aleskerov, E., Freisleben, B., & Rao, B. (1997). Cardwatch: A neural network based database mining system for credit card fraud detection. Proceedings of the IEEE/IAFE 1997 computational intelligence for financial engineering (CIFEr),
Anscombe, F. J. (1960). Rejection of outliers. Technometrics, 2(2), 123-146.
Böhmer, K., & Rinderle-Ma, S. (2016a). Multi-perspective anomaly detection in business process execution events. OTM Confederated International Conferences" On the Move to Meaningful Internet Systems",
Böhmer, K., & Rinderle-Ma, S. (2016b). Multi-perspective anomaly detection in business process execution events. On the Move to Meaningful Internet Systems: OTM 2016 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24-28, 2016, Proceedings,
Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Special issue on learning from imbalanced data sets. ACM SIGKDD explorations newsletter, 6(1), 1-6.
De Leoni, M., van der Aalst, W. M., & Dees, M. (2016). A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Information Systems, 56, 235-257.
Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). Deeplog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC conference on computer and communications security,
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., & Stolfo, S. (2002). A geometric framework for unsupervised anomaly detection. In Applications of data mining in computer security (pp. 77-101). Springer.
Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS one, 11(4), e0152173.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
Joshi, M. V., Agarwal, R. C., & Kumar, V. (2001). Mining needle in a haystack: classifying rare classes via two-phase rule induction. Proceedings of the 2001 ACM SIGMOD international conference on Management of data,
Joshi, M. V., Agarwal, R. C., & Kumar, V. (2002). Predicting rare classes: Can boosting make any weak learner strong? Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,
Kiymaz, H. (2020). Types of corporate fraud. In Corporate Fraud Exposed (pp. 19-38). Emerald Publishing Limited.
Kumar, V. (2005). Parallel and distributed computing for cybersecurity. IEEE Distributed Systems Online, 6(10).
Mohammadi, B., Fathy, M., & Sabokrou, M. (2021). Image/video deep anomaly detection: A survey. arXiv preprint arXiv:2103.01739.
Nolle, T., Seeliger, A., & Mühlhäuser, M. (2018). BINet: multivariate business process anomaly detection using deep learning. International Conference on Business Process Management,
Pang-Ning, T., Steinbach, M., & Kumar, V. (2005). Introduction to data mining Addison-Wesley. Princeton, USA: Independence Way, 4, 76-79.
Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2021). Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2), 1-38.
Pauly, M. V. (2000). Insurance reimbursement. In Handbook of health economics (Vol. 1, pp. 537-560). Elsevier.
Pauwels, S., & Calders, T. (2019). An anomaly detection technique for business processes based on extended dynamic bayesian networks. Proceedings of the 34th ACM/SIGAPP symposium on applied computing,
Spence, C., Parra, L., & Sajda, P. (2001). Detection, synthesis and compression in mammographic image analysis with a hierarchical image probability model. Proceedings IEEE workshop on mathematical methods in biomedical image analysis (MMBIA 2001),
Steinwart, I., Hush, D., & Scovel, C. (2005). A Classification Framework for Anomaly Detection. Journal of Machine Learning Research, 6(2).
Thill, M., Konen, W., Wang, H., & Bäck, T. (2021). Temporal convolutional autoencoder for unsupervised anomaly detection in time series. Applied Soft Computing, 112, 107751.
Van Der Aalst, W. M., Reijers, H. A., Weijters, A. J., van Dongen, B. F., De Medeiros, A. A., Song, M., & Verbeek, H. (2007). Business process mining: An industrial application. Information Systems, 32(5), 713-732.
Van Dongen, B. (2020). BPI Challenge 2020 (Version 1). 4TU.Centre for Research Data. https://doi.org/10.4121/UUID:52FB97D4-4588-43C9-9D04-3604D4613B51
Vilalta, R., & Ma, S. (2002). Predicting rare events in temporal domains. 2002 IEEE International Conference on Data Mining, 2002. Proceedings.,
Weiss, G. M., & Hirsh, H. (1998). Learning to Predict Rare Events in Event Sequences. KDD,

指導教授

許文錦(Wen-Chin Hsu)

審核日期

2023-7-18

推文