Indexed by:
Abstract:
[Objectives] The quality of training samples significantly impacts model performance and prediction accuracy. In regions with limited sample data, the small number of samples and their uneven spatial distribution may prevent the model from effectively learning the features of disaster-inducing factors. This increases the risk of overfitting and ultimately affects the accuracy of model predictions. Therefore, it is crucial to collect and optimize training samples based on regional characteristics. [Methods] To address this issue, this study proposes a sampling optimization method for training samples. The method combines the Prototype Sampling (PBS) approach for selecting landslide-positive samples with an unsupervised clustering model for training sample selection. This results in a screened and expanded positive sample dataset and an objectively extracted negative sample dataset, forming an optimized training sample dataset. Subsequently, the Random Forest (RF) and Support Vector Machine (SVM) models, which are well suited for handling small sample data, were employed to construct a landslide susceptibility evaluation model. Comparative experiments were conducted using Raw Data (RD), a dataset with only Data Augmentation (DA), and the optimized dataset. Model prediction performance was assessed using metrics such as the Area Under the Curve (AUC). Additionally, the frequency ratio method was applied to optimize the results of landslide susceptibility zoning. Finally, a case study was conducted in Putian City, where landslide sample data is relatively scarce, to verify the effectiveness and generalization capability of the proposed sampling optimization method. [Results] The results indicate that models trained on the SO dataset achieved AUC improvements of 10.69% and 18.23% compared to those trained on the RD and DA datasets, respectively, demonstrating a significant enhancement in predictive performance. This suggests that selecting and expanding positive samples while objectively extracting negative samples can improve model accuracy and mitigate the overfitting problem during training. Furthermore, the frequency ratio analysis revealed that the SO-RF model achieved higher frequency ratios in regions with extremely high and high susceptibility than the SO-SVM model, indicating that SO-RF is more suitable for evaluating landslide susceptibility in regions with limited landslide sample data, such as Putian City. [Conclusions] The proposed training sample optimization approach, combined with machine learning evaluation methods, demonstrates high applicability and accuracy. Therefore, the findings of this study provide valuable insights into machine learning-based sampling strategies for landslide susceptibility assessment. © 2025 Science Press. All rights reserved.
Keyword:
Reprint 's Address:
Email:
Source :
Journal of Geo-Information Science
ISSN: 1560-8999
Year: 2025
Issue: 5
Volume: 27
Page: 1113-1128
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 4
Affiliated Colleges: