Spatial mapping of the provenance of storm dust: Application of data mining and ensemble modelling

A - Papers appearing in refereed journals

Gholami, G., Mohamadifar, A. and Collins, A. L. 2020. Spatial mapping of the provenance of storm dust: Application of data mining and ensemble modelling. Atmospheric Research. 233 (104716).

AuthorsGholami, G., Mohamadifar, A. and Collins, A. L.

Spatial modelling of storm dust provenance is essential to mitigate its on-site and off-site effects in the arid and
semi-arid environments of the world. Therefore, the main aim of this study was to apply eight data mining algorithms including random forest (RF), support vector machine (SVM), bayesian additive regression trees (BART), radial basis function (RBF), extreme gradient boosting (XGBoost), regression tree analysis (RTA), Cubist model and boosted regression trees (BRT) and an ensemble modelling (EM) approach for generating spatial maps of dust provenance in the Khuzestan province, a main region with active sources for producing dust in southwestern Iran. This study is the first attempt at predicting storm dust provenance by applying individual data mining models and ensemble modelling. We identified and mapped in a geographic information system (GIS), 12 potential effective factors for dust emissions comprising two for climate (wind speed, precipitation), five soil characteristics (texture, bulk density, Ec, organic matter (OM), available water capacity (AWC)), a normalized difference vegetation index (NDVI), land use, geology, a digital elevation model (DEM) and land type, and used a mean decrease accuracy measure (MDAM) to determine the corresponding importance scores (IS). A multicollinearity
test (including the variance inflation factor (VIF) and tolerance coefficient (TC)) was applied to assess relationships between the effective factors, and an existing map of dust provenance was randomly categorized into two groups consisting of training (70%) and validation (30%) data. The individual data mining models were validated using the area under the curve (AUC). Based on the TC and VIF results, no collinearity was detected among the 12 effective factors for dust emissions. The prediction accuracies of the eight data mining models and an EM assessed by the AUC were as follows: EM (with AUC=99.8%) > XGBoost>RBF > Cubist>RF > BART>SVM > BRT > RTA (with AUC=79.1%). Among all models, the EM was found to provide the highest accuracy for predicting storm dust provenance. Using the EM, areas classified as being low, moderate, high and very high susceptibility for storm dust provenance comprised 36, 13, 23 and 28% of the total mapped area, respectively. Based on MDAM results, the highest and lowest IS were obtained for the wind speed (IS=23) and geology (IS=6.5) factors, respectively. Overall, the modelling techniques used in this research are helpful for predicting storm dust provenance and thereby targeting mitigation. Therefore, we recommend applying data mining EM approaches to the spatial mapping of storm dust provenance worldwide.

KeywordsDust provenance; Spatial modelling; Data mining algorithms; Multicollinearity; Receiver operator characteristic; Ensemble modelling; R software
Year of Publication2020
JournalAtmospheric Research
Journal citation233 (104716)
Digital Object Identifier (DOI)
Open accessPublished as green open access
FunderBiotechnology and Biological Sciences Research Council
Funder project or codeS2N - Soil to Nutrition - Work package 3 (WP3) - Sustainable intensification - optimisation at multiple scales
Output statusPublished
Publication dates
Online24 Nov 2019
Publication process dates
Accepted19 Oct 2019

Permalink -

Restricted files

Publisher's version

Under embargo indefinitely

Accepted author manuscript

Under embargo until 24 Nov 2021

34 total views
2 total downloads
0 views this month
0 downloads this month