Spatial modelling of storm dust provenance is essential to mitigate its on-site and off-site effects in the arid and
semi-arid environments of the world. Therefore, the main aim of this study was to apply eight data mining algorithms including random forest (RF), support vector machine (SVM), bayesian additive regression trees (BART), radial basis function (RBF), extreme gradient boosting (XGBoost), regression tree analysis (RTA), Cubist model and boosted regression trees (BRT) and an ensemble modelling (EM) approach for generating spatial maps of dust provenance in the Khuzestan province, a main region with active sources for producing dust in southwestern Iran. This study is the first attempt at predicting storm dust provenance by applying individual data mining models and ensemble modelling. We identified and mapped in a geographic information system (GIS), 12 potential effective factors for dust emissions comprising two for climate (wind speed, precipitation), five soil characteristics (texture, bulk density, Ec, organic matter (OM), available water capacity (AWC)), a normalized difference vegetation index (NDVI), land use, geology, a digital elevation model (DEM) and land type, and used a mean decrease accuracy measure (MDAM) to determine the corresponding importance scores (IS). A multicollinearity
test (including the variance inflation factor (VIF) and tolerance coefficient (TC)) was applied to assess relationships between the effective factors, and an existing map of dust provenance was randomly categorized into two groups consisting of training (70%) and validation (30%) data. The individual data mining models were validated using the area under the curve (AUC). Based on the TC and VIF results, no collinearity was detected among the 12 effective factors for dust emissions. The prediction accuracies of the eight data mining models and an EM assessed by the AUC were as follows: EM (with AUC=99.8%) > XGBoost>RBF > Cubist>RF > BART>SVM > BRT > RTA (with AUC=79.1%). Among all models, the EM was found to provide the highest accuracy for predicting storm dust provenance. Using the EM, areas classified as being low, moderate, high and very high susceptibility for storm dust provenance comprised 36, 13, 23 and 28% of the total mapped area, respectively. Based on MDAM results, the highest and lowest IS were obtained for the wind speed (IS=23) and geology (IS=6.5) factors, respectively. Overall, the modelling techniques used in this research are helpful for predicting storm dust provenance and thereby targeting mitigation. Therefore, we recommend applying data mining EM approaches to the spatial mapping of storm dust provenance worldwide.