博碩士論文 107322093 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:16 、訪客IP:3.93.74.227
姓名 曾翊昇(I-Sheng Tseng)  查詢紙本館藏   畢業系所 土木工程學系
論文名稱 高效率異質性時序資料表示法辨別系統
(An Adaptive System for Effectively and Efficiently Representing Heterogeneous Time Series Data)
相關論文
★ 物聯網制動功能之互操作性解決方案★ 地理網路爬蟲:具擴充及擴展性之地理網路資源爬行架構
★ TDR監測資訊平台之改善與 感測器觀測服務之建立★ 利用高解析衛星立體像對產製近岸水底地形
★ 整合oneM2M 及OGC SensorThings API 標準建立開放式物聯網架構★ 巨量物聯網資料之多重屬性索引架構
★ A TOA-reflectance-based Spatial-temporal Image Fusion Method for Aerosol Optical Depth Retrieval★ An Automatic Embedded Device Registration Procedure for the OGC SensorThings API
★ GeoRank: A Geospatial Web Ranking Algorithm for a GeoWeb Search Engine
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2020-12-31以後開放)
摘要(中) 時間序列資料為按時間順序儲存的一連串測量相同事件類型的資料,時間序列資料存在於許多領域中,例如股票市場的波動、感測器的數據、醫學和生物資訊等。由於時間序列資料的特性包含(1)資料持續產製、(2)高維度、及(3)龐大的資料量,若直接使用原始時序資料進行分析及儲存,其效率低且成本高。因此,為了有效管理時間序列資料,採用時序資料表示法(representation)取代原始時間序列,可以減少原始時間序列的資料量及維度,但同時保留其時序資料特徵。然而,針對時序資料表示法的壓縮效率及資訊損失表現而言,不同時序資料表示法適合於某些特定時序資料類型,且時間序列資料類型廣泛且多樣,如溫度、溼度、速度、位置、震動、壓力等,這代表無法僅使用單一種表示法有效管理所有類型的時間序列資料。為了解決這個問題,本研究旨在提出一系統,該系統可以有效率地判斷不同類型的時間序列最合適的表示方法。具體而言,本研究針對每個訓練時序資料進行不同表示法的效能評估,進而確定每個訓練時序資料最合適的表示法。為了進一步提升系統效率,將訓練資料進行群聚並選出各群聚最具代表性時序資料。爾後,每當獲取未辨識之時序資料,系統將計算此時間序列與每個群聚代表的相似性,用以間接識別此時序資料最合適的表示法。最後,實驗結果顯示,所提出的系統在不同的參數設定下,能夠為46%至76%的時間序列數據辨別出最合適的表示法。對於其餘的時序資料,系統所選表示法與實際上最合適表示法相比差異僅小於2.19%。此外,實驗成果顯示,所提系統在辨識最合適的表示法上,較傳統方法快17至300倍的效率。
摘要(英) A time series data is a collection of measurements obtained sequentially, which is common in many application domains, e.g., fluctuations of stock market, observations from sensor networks, medical and biological signals. Since time series data usually contains large number of data points, i.e., high-dimensionality, directly dealing with such data in its raw format is very expensive in terms of processing and storage loading. To effectively and efficiently manage time series data, several representation methods were proposed. Representation methods can reduce the dimensionality of a time series data while preserving its fundamental characteristics. However, each representation method is most suitable for certain time series data types in terms of compression rate and information loss, which means no single method is effective enough for all possible types. Therefore, this study aims at proposing a system that can identify the most suitable representation method for different types of time series data. To be specific, this study first conducts an extensive performance evaluation to identify the most suitable representation methods for each training time series data. Afterward, by computing similarities between a new time series and training time series, the system can determine the most suitable representation method for the new time series data. Finally, our experimental result shows that the proposed system can identify the most suitable representation method for 46% to 76% of time series data. For the remaining time series data, the evaluation results also show that the selected representation can produce acceptable results with only less than 2.19% difference comparing to the best representation method. In addition, the experimental result demonstrates that the proposed system can identify the most suitable representation 17 to 300 times faster than the naïve solution.
關鍵字(中) ★ 時間序列資料
★ 時序資料表示法
★ 效能評估
★ 群聚
關鍵字(英) ★ time series
★ representation
★ performance evaluation
★ clustering
論文目次 摘要 i
ABSTRACT ii
致謝 iii
Table of Contents iv
List of Figures vi
List of Tables vii
1. Introduction 1
1.1. Background 1
1.2. Problem and Objective 2
2. Literature Review 5
2.1. Related Work 5
2.2. Time Series Representation 6
2.2.1. Piecewise Aggregate Approximation (PAA) 6
2.2.2. Adaptive Piecewise Constant Approximation (APCA) 7
2.2.3. Piecewise Linear Aggregate Approximation (PLAA) 8
2.2.4. Discrete Fourier Transformation (DFT) 8
2.2.5. Discrete Cosine Transformation (DCT) 9
2.2.6. Discrete Wavelet Transform (DWT) 10
2.3. Time Series Distance Measure 11
2.3.1. Lp-norm Distance 11
2.3.2. Dynamic time warping (DTW) 12
3. Methodology 14
3.1. System Architecture 14
3.2. Model Training 15
3.2.1. Training Data 15
3.2.2. Representation Determination 18
3.2.3. Clustering 21
3.2.4. Prototype Extraction 23
3.2.4.1. Using Medoid as Prototype 24
3.2.4.2. Using Averaging Prototype 24
3.3. Data Classification 25
4. Experimental Results 27
4.1. Model Training Result 27
4.1.1. Representation Determination Result 27
4.1.2. Clustering and Prototype Extraction Result 29
4.2. Accuracy Analysis 31
4.2.1. Testing data from the UEA & UCR Time Series Repository 31
4.2.2. Testing data from Civil IoT Taiwan 33
4.3. Efficiency Analysis 35
5. Conclusions and Future Work 38
References 40
參考文獻 1. L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,” Comput. Networks, vol. 54, no. 15, pp. 2787–2805, 2010.
2. T. C. Fu, “A review on time series data mining,” Eng. Appl. Artif. Intell., vol. 24, no. 1, pp. 164–181, 2011.
3. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann, 2012.
4. H. Ding, G. Trajcevski, P. Scheurmann, X. Wang, and E. Keogh, “Querying and mining of time series data: experimental comparison of representations and distance measures,” in VLDB Endowment, 2008, vol. 1, no. 2, pp. 1542–1552.
5. R .Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Databases,” Int. Conf. Found. Data Organ. Algorithms, pp. 69–84, 1993.
6. K. P. Chan and A. W. C. Fu, “Efficient Time Series Matching by Wavelets,” in 15th International Conference on Data Engineering, 1999, pp. 126–133.
7. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases,” Knowl. Inf. Syst., vol. 3, no. 3, pp. 263–286, 2001.
8. K.Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally adaptive dimensionality reduction for indexing large time series databases,” ACM Trans. Database Syst., vol. 27, no. 2, pp. 188–228, 2002.
9. F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently supporting ad hoc queries in large datasets of time sequences,” ACM SIGMOD Rec., vol. 26, no. 2, pp. 289–300, 1997.
10. N. Q. V. Hung and D. T. Anh, “An Improvement of PAA for Dimensionality Reduction in Large Time Series Databases,” in PRICAI 2008: Trends in Artificial Intelligence, 2008, pp. 698–707.
11. P. Esling and C. Agon, “Time Series Data Mining,” ACM Comput. Surv., vol. 45, no. 1, 2012.
12. M. Milanović and M. Stamenković, “Data Mining in Time Series,” Econ. Horizons (Ekonomski Horizonti), vol. 13, no. 1, pp. 5–25, 2011.
13. Y. L. Wu, D. Agrawal, and A. El Abbadi, “A comparison of DFT and DWT based similarity search in time-series databases,” in 9th International Conference on Information and Knowledge Management, 2000, pp. 488–495.
14. T. Kahveci and A. Singh, “Variable length queries for time series data,” in 17th International Conference on Data Engineering, 2001, pp. 273–282.
15. I.Popivanov and R. J. Miller, “Similarity Search Over Time-Series Data Using Wavelets,” in 18th International Conference on Data Engineering, 2002, pp. 212–221.
16. K. Kawagoe and T. Ueda, “A similarity search method of time series data with combination of Fourier and wavelet transforms,” in 9th International Symposium on Temporal Representation and Reasoning, 2002, pp. 86–92.
17. E. Keogh andS. Kasetty, “On the need for time series data mining benchmarks: A Survey and Empirical Demonstration,” Data Min. Knowl. Discov., vol. 7, no. 4, pp. 349–371, 2003.
18. S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series clustering - A decade review,” Inf. Syst., vol. 53, no. C, pp. 16–38, 2015.
19. V. Bettaiah and H. S. Ranganath, “An Analysis of Time Series Representation Methods Data Mining Applications Perspective,” in 2014 ACM Southeast Regional Conference, 2014.
20. X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. Keogh, “Experimental comparison of representation methods and distance measures for time series data,” Data Min. Knowl. Discov., vol. 26, no. 2, pp. 275–309, 2013.
21. E. J. Keogh and M. J. Pazzani, “A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases,” in 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, 2000, pp. 122–133.
22. B. K. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary Lp norms,” in 26th International Conference on Very Large Data Bases, 2000, pp. 385–394.
23. C. Cassisi, P. Montalto, M. Aliotta, A. Cannata, and A. Pulvirenti, “Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining,” in Advances in Data Mining Knowledge Discovery and Applications, InTech, Ed.2012.
24. D. J. Berndt and J. Clifford, “Using Dynamic Time Warping to Find Patterns in Time Series,” in 3rd International Conference on Knowledge Discovery and Data Mining, 1994, pp. 359–370.
25. A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data Min. Knowl. Discov., vol. 31, no. 3, pp. 606–660, 2017.
26. G. E. A. P. A. Batista, X. Wang, and E. J. Keogh, “A Complexity-Invariant Distance Measure for Time Series,” in 11th SIAM International Conference on Data Mining, 2011, pp. 699–710.
27. C. Ratanamahatana, E. Keogh, A. J. Bagnall, and S. Lonardi, “A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering,” Adv. Knowl. Discov. Data Min., pp. 771–777, 2005.
28. G. M. Church and J. Aach, “Aligning gene expression time series with time warping algorithms,” Bioinformatics, vol. 17, no. 6, pp. 495–508, 2001.
29. S. Chu, E. Keogh, D. Hart, and M. Pazzani, “Iterative Deepening Dynamic Time Warping for Time Series,” in 2nd SIAM International Conference on Data Mining, 2002, pp. 195–212.
30. M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering Similar Multidimensional Trajectories,” Encycl. GIS, pp. 1–8, 2002.
31. L. Chen, M. T. Özsu, and V. Oria, “Robust and fast similarity search for moving object trajectories,” in 2005 ACM SIGMOD international conference on Management of data, 2005, pp. 491–502.
32. L. Rokach and O. Maimon, “Clustering Methods,” in Data Mining and Knowledge Discovery Handbook, Springer US, 2005, pp. 321–352.
33. E. Keogh and M. Pazzani, “An enhanced representation of time series which allows fast and accurate classification, clustering,” in 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 239–243.
34. A. J. Bagnall and G. J. Janacek, “Clustering time series from ARMA models with clipped data,” in 10th ACM SIGKDD international conference on Knowledge discovery and data mining, 2004, pp. 49–58.
35. E. Pȩkalska, R. P. W. Duin, and P. Paclík, “Prototype selection for dissimilarity-based classifiers,” Pattern Recognit., vol. 39, no. 2, pp. 189–208, 2006.
36. V. Vuori and J. Laaksonen, “A comparison of techniques for automatic clustering of handwritten characters,” in 16th International Conference On Pattern Recognition, 2002, pp. 168–171.
37. T. W. Liao, C. F. Ting, and P. C. Chang, “An adaptive genetic clustering method for exploratory mining of feature vector and time series data,” Int. J. Prod. Res., vol. 44, no. 14, pp. 2731–2748, 2006.
38. V. Hautamaki, P. Nykänen, and P. Fränti, “Time-series clustering by approximate prototypes,” in 19th International Conference on Pattern Recognition, 2008.
39. V. Niennattrakul and C. A. Ratanamahatana, “Inaccuracies of Shape Averaging Method Using Dynamic Time Warping for Time Series Data,” in International Conference on Computational Science, 2007, pp. 513–520.
40. S. Salvador and P. Chan, “Toward accurate dynamic time warping in linear time and space,” Intell. Data Anal., vol. 11, no. 5, pp. 561–580, 2007.
指導教授 黃智遠(Chih-Yuan Huang) 審核日期 2019-8-8
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明