整合空間資料與資料探勘演算法精進廣域崩塌潛勢評估;Improving Regional Landslide Susceptibility Assessments by Integrating Geo-spatial Data and Data Mining Algorithms

NCU Institutional Repository > 工學院 > 土木工程研究所 > 博碩士論文 > Item 987654321/75811

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/75811

題名:	整合空間資料與資料探勘演算法精進廣域崩塌潛勢評估;Improving Regional Landslide Susceptibility Assessments by Integrating Geo-spatial Data and Data Mining Algorithms
作者:	賴哲儇;Lai, Jhe-Syuan
貢獻者:	土木工程學系
關鍵詞:	崩塌潛勢;資料探勘;空間資料;隨機森林;Landslide Susceptibility;Data Mining;Geo-spatial Data;Random Forests
日期:	2018-03-21
上傳時間:	2018-04-13 10:47:54 (UTC+8)
出版者:	國立中央大學
摘要:	崩塌潛勢評估(landslide susceptibility assessment)是崩塌災害研究領域中基礎卻重要的任務之一。相關文獻指出空間資料(geo-spatial data)整合資料導向(data-driven)演算法評估廣域(regional)尺度的崩塌潛勢值得重視，尤其是資料探勘(data mining)演算法結合空間資料與事件型崩塌目錄(event-based landslide inventory)引起相當多的關注。另一方面，就崩塌機制而言，自然坡地崩塌可概分成源頭(source)、拖曳帶(trail)及堆積(deposition)等三個重要特徵，後兩者是崩塌的後續反應，統稱為Run-out。一般而言，從遙測影像中以自動或半自動方法偵測崩塌地並沒有分離源頭與Run-out，僅是評估崩塌的影響範圍。若要探討崩塌發生的主因，利用以上偵測成果建置崩塌潛勢模型恐造成偏差。本研究發展空間資料結合資料探勘演算法(尤其是隨機森林演算法)評估多時期(multi-temporal)與事件型(event-based)崩塌潛勢流程，並以不同空間與時間樣本驗證，稱為space-與time-robustness verification。前者意指訓練與檢核資料(check data)皆出自於相同的資料集或事件；後者表示驗證樣本(本文又稱作預測資料，prediction data)來自於其他事件或資料集。此外，研究成果將比較決策樹、貝氏網路和羅吉斯迴歸等三種常用的崩塌潛勢方法，證實隨機森林演算法的成效。針對失衡的預測結果，例如極高漏授(omission)或誤授(commission)，本研究以成本敏感度分析(cost-sensitive analysis)調整決策邊界(decision boundary)改善不同樣本比例模型的預測能力。北臺灣石門水庫集水區與南臺灣荖濃溪流域為本研究測試區，並蒐集數十種地形、環境、植被、降雨量、人造物等崩塌相關因子，以及利用變遷偵測(石門水庫案例)和人工數化(荖濃溪案例)方式製作崩塌目錄，作為研究資料。針對荖濃溪案例，本研究進一步探討崩塌目錄的取樣方式(sampling strategy)及Run-out對於崩塌潛勢模式化的影響。以不同空間樣本驗證顯示，石門水庫的多時期與事件型崩塌潛勢模型以隨機森林(Random Forests)演算法產生之結果最佳，所有檢核精度皆大於0.93，優於決策樹(Decision Tree)、貝氏網路(Bayes Network)與羅吉斯迴歸(Logistic Regression)等方法。荖濃溪流域案例中，將Run-out視為獨立、崩塌、非崩塌等三個類別配合混合(hybrid)取樣策略，並以隨機森林演算法結合成本敏感度分析所建構的事件型崩塌潛勢模型可達0.7以上的檢核精度。以不同時間樣本驗證顯示，隨機森林搭配成本敏感度分析能建構較佳且穩定的石門水庫多時期與事件型崩塌潛勢模型。崩塌潛勢評估為崩塌風險評估與管理架構的前端任務，因此本研究優先考量降低漏授誤差，而誤授誤差可藉由後續任務減低。進一步檢視模型，將實際崩塌樣本代入，其輸出潛感值較原始隨機森林演算法靠近高潛勢區間，提供較符合實況的預測。若以極度不均的崩塌與非崩塌樣本數量建構事件型崩塌潛勢模型會造成嚴重的過度擬合(over-fitting)。荖濃溪案例中，以隨機森林演算法結合成本敏感度分析與混合取樣策略建立崩塌潛勢模型，證實獨立Run-out的可行性。由於現有崩塌目錄大多未獨立Run-out類別，通常視為崩塌的一部份或除去(即非崩塌)，故本研究測試此兩種作法對模式化的影響。成果顯示，視Run-out為非崩塌類別的精度較高，且成本敏感度分析提升5~10%驗證精度。根據上述模型產製的崩塌潛勢圖除了顯示成本敏感度分析與樣本比例的差異外，於荖濃溪流域案例更突顯Run-out類別對於崩塌潛勢評估的影響，因此本研究建議劃設崩塌目錄時應考慮之。;Landslide susceptibility assessment is one of the most fundamental and essential tasks in work related to the mitigation of damage caused by natural disasters. Continuing improvements in geo-spatial data have increased the veracity of data-driven approaches for evaluating regional landslide susceptibility. In particular, data mining algorithms with geo-spatial data and event-based landslide inventories have been proposed and discussed in recent years. From a geotechnical or geological point of view, there are three common features typical of natural terrain landslides, source, trail and deposition. The term run-out, generally used to describe the downslope displacement of failed geo-materials by landslides, is used in this study to represent the combination of the landslide trail and deposition. In general, the area of a landslide detected by means of automatic or semi-automatic algorithms from remotely sensed images might contain the run-out area, unless manually removed by the geologist or expert using aerial stereo-photos or other auxiliary data. However, the run-out area should be excluded in a strict definition of real landslides because it is caused by different mechanisms. This might produce biases and reduce the reliability of a landslide susceptibility model constructed from impure training data (i.e., landslide samples including run-outs). The purpose of this study is to develop a procedure combing geo-spatial data and data mining algorithms (especially Random Forests) for multi-temporal and event-based landslide susceptibility assessments at a regional scale. This study also employs three commonly used algorithms (i.e., Decision Tree, Bayes Network and Logistical Regression) of landslide susceptibility assessments to compare with the Random Forests algorithm. Two strategies are investigated for model verification, i.e., space- and time-robustness. The former is designed to separate samples into training and checking data based on a single event or the same dataset. The latter is aimed at predicting subsequent or different landslide events or periods by constructing a landslide susceptibility model based on a specific event or period. This study also employs a cost-sensitive analysis to adjust the decision boundary of the data mining algorithms to improve the prediction capabilities for samples of equal and unequal sample proportions. The Shimen reservoir and Laonong river watersheds in northern and southern Taiwan are selected as the study sites. A total of more than ten landslide related factors in the two study sites were collected, including topographic, geo-environmental, vegetative, rainfall and man-made information. The landslide inventories used for training were generated by a change detection algorithm in the Shimen reservoir watershed case and produced manually in the Laonong river watershed case. This study also explores the influence of sampling strategies and run-out on modeling process based on the landslide inventory of the second study area which has a specific topological and complete relationship (i.e., polygon-based features). For space-robustness verification, the experimental results obtained from multi-temporal and event-based landslide susceptibility assessments indicate that the Random Forests (RF) algorithm outperforms the Decision Tree, Bayes Network and Logistic Regression methods in the Shimen reservoir watershed case. Specifically, the RF accuracies are all better than 0.93. In the Laonong river watershed case, the event-based RF results based on the hybrid sampling strategy, where the run-out area is modeled as an individual, landslide or non-landslide class, also have accuracies higher than 0.7 with cost-sensitive analysis. In terms of time-robustness verification, in most cases, the results of multi-temporal and event-based models for the Shimen reservoir watershed indicate that the Random Forests algorithm is more accurate and more stable for cost-sensitive analysis. Reducing omission errors is emphasized in this study because landslide susceptibility assessment is a forward part of landslide risk assessment and management framework. The commission error is expected to decrease by consequent works. To further examine the developed models, the landslide susceptibility distributions of true occurrence samples and the generated landslide susceptibility maps are compared with each other. The results to reveal that using cost-sensitive analysis can provide more reasonable results than the original RF algorithm. For generating landslide susceptibility map using Random Forests with cost-sensitive analysis, the results show that the multi-temporal models are unaffected by sample ratios but the extremely unbalanced sample ratio is not suggested for event-based modeling due to over-fitting issue. In the Laonong river watershed case, the results of RF with cost-sensitive analysis show that quantitative measures obtaining by treating the run-out area as an individual class are feasible. In addition, treating the run-out area as a non-landslide area can improve the User’s Accuracy (UA) for the landslide source by range from 5% to 10%. The landslide susceptibility maps generated for this study demonstrate the effects of cost-sensitive analysis and different sampling strategies as well as the impact of run-out areas on modeling in the Laonong river watershed case. According to the above results, this study suggests that the run-out area should be considered during landslide inventory generation and susceptibility modeling analysis.
顯示於類別:	[土木工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	267	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....