dc.description.abstract | Landslide susceptibility assessment is one of the most fundamental and essential tasks in work related to the mitigation of damage caused by natural disasters. Continuing improvements in geo-spatial data have increased the veracity of data-driven approaches for evaluating regional landslide susceptibility. In particular, data mining algorithms with geo-spatial data and event-based landslide inventories have been proposed and discussed in recent years. From a geotechnical or geological point of view, there are three common features typical of natural terrain landslides, source, trail and deposition. The term run-out, generally used to describe the downslope displacement of failed geo-materials by landslides, is used in this study to represent the combination of the landslide trail and deposition. In general, the area of a landslide detected by means of automatic or semi-automatic algorithms from remotely sensed images might contain the run-out area, unless manually removed by the geologist or expert using aerial stereo-photos or other auxiliary data. However, the run-out area should be excluded in a strict definition of real landslides because it is caused by different mechanisms. This might produce biases and reduce the reliability of a landslide susceptibility model constructed from impure training data (i.e., landslide samples including run-outs).
The purpose of this study is to develop a procedure combing geo-spatial data and data mining algorithms (especially Random Forests) for multi-temporal and event-based landslide susceptibility assessments at a regional scale. This study also employs three commonly used algorithms (i.e., Decision Tree, Bayes Network and Logistical Regression) of landslide susceptibility assessments to compare with the Random Forests algorithm. Two strategies are investigated for model verification, i.e., space- and time-robustness. The former is designed to separate samples into training and checking data based on a single event or the same dataset. The latter is aimed at predicting subsequent or different landslide events or periods by constructing a landslide susceptibility model based on a specific event or period. This study also employs a cost-sensitive analysis to adjust the decision boundary of the data mining algorithms to improve the prediction capabilities for samples of equal and unequal sample proportions. The Shimen reservoir and Laonong river watersheds in northern and southern Taiwan are selected as the study sites. A total of more than ten landslide related factors in the two study sites were collected, including topographic, geo-environmental, vegetative, rainfall and man-made information. The landslide inventories used for training were generated by a change detection algorithm in the Shimen reservoir watershed case and produced manually in the Laonong river watershed case. This study also explores the influence of sampling strategies and run-out on modeling process based on the landslide inventory of the second study area which has a specific topological and complete relationship (i.e., polygon-based features).
For space-robustness verification, the experimental results obtained from multi-temporal and event-based landslide susceptibility assessments indicate that the Random Forests (RF) algorithm outperforms the Decision Tree, Bayes Network and Logistic Regression methods in the Shimen reservoir watershed case. Specifically, the RF accuracies are all better than 0.93. In the Laonong river watershed case, the event-based RF results based on the hybrid sampling strategy, where the run-out area is modeled as an individual, landslide or non-landslide class, also have accuracies higher than 0.7 with cost-sensitive analysis.
In terms of time-robustness verification, in most cases, the results of multi-temporal and event-based models for the Shimen reservoir watershed indicate that the Random Forests algorithm is more accurate and more stable for cost-sensitive analysis. Reducing omission errors is emphasized in this study because landslide susceptibility assessment is a forward part of landslide risk assessment and management framework. The commission error is expected to decrease by consequent works. To further examine the developed models, the landslide susceptibility distributions of true occurrence samples and the generated landslide susceptibility maps are compared with each other. The results to reveal that using cost-sensitive analysis can provide more reasonable results than the original RF algorithm. For generating landslide susceptibility map using Random Forests with cost-sensitive analysis, the results show that the multi-temporal models are unaffected by sample ratios but the extremely unbalanced sample ratio is not suggested for event-based modeling due to over-fitting issue. In the Laonong river watershed case, the results of RF with cost-sensitive analysis show that quantitative measures obtaining by treating the run-out area as an individual class are feasible. In addition, treating the run-out area as a non-landslide area can improve the User’s Accuracy (UA) for the landslide source by range from 5% to 10%. The landslide susceptibility maps generated for this study demonstrate the effects of cost-sensitive analysis and different sampling strategies as well as the impact of run-out areas on modeling in the Laonong river watershed case. According to the above results, this study suggests that the run-out area should be considered during landslide inventory generation and susceptibility modeling analysis. | en_US |