dc.description.abstract | Anomaly detection implies that the problem is found at the moment of occurrence or in advance. It is a common problem in data analysis in reality, such as credit fraud and medical problems. In the manufacturing process, maintenance personnel often meet the event of machine failure, damaged parts, broken consumables, etc., resulting in defects or interruption of the process. We are not willing the maintenance or replacement of equipment and consumables will be found out after the problem occurs. Among the common data for abnormal error detection, there are only a very small amount of abnormal data and a large amount of normal data, which causes it is difficult to distinguish the characteristics of abnormal data. Sampling is a common method to solve the problem by adjust the number and characteristics of data extraction, feature selection, and distance similarity. In past researches, the sampling method based on the distance between sequences did not consider the similarity of sequences at different time lengths in time series. Therefore, in this study, we propose to use DTW as a calculation method in time series sampling method.
For measuring the similarity of time series, the method we use is Dynamic Time Warping (DTW). The smaller the DTW distance between two time series, the more similar they are. Compared with Euclidean distance, DTW can be used to calculate different time lengths, so in the case of known anomalies, the length of the sample can be relaxed or limited according to the corresponding performance. In the experiment of this paper, we first define the period before the abnormality occurs as the original abnormal data, and use DTW to take the most similar data as samples, so that we can consider not only data before the abnormality occurs, but also possible to find anomalous fragments hidden in different lengths of time.
The data used in this paper is from one of the process data of a semiconductor company. Since it is difficult to predict the time when consumables are damaged in the process, the recall rate of abnormal detection is relatively low. We hope to improve the performance by the sampling method we propose in the study while comparing the proposed sampling method, oversampling with random location, and sampling with Euclidean distance, in two classification models, LSTM and SVC. Finally, it can be observed from our experimental results that our proposed sampling method performs better than the other two methods in the model. | en_US |