Building a Near-infrared (NIR) Soil Spectral Dataset and Predictive Machine Learning Models using a Handheld NIR Spectrophotometer

A - Papers appearing in refereed journals

Partida, C., Safanelli, J. L., Mitu, S. M., Murad, M. O. F., Ge, Y., Ferguson, R., Shepherd, K. and Sanderman, J. 2025. Building a Near-infrared (NIR) Soil Spectral Dataset and Predictive Machine Learning Models using a Handheld NIR Spectrophotometer. Data in Brief. 58 (111229). https://doi.org/10.1016/j.dib.2024.111229

AuthorsPartida, C., Safanelli, J. L., Mitu, S. M., Murad, M. O. F., Ge, Y., Ferguson, R., Shepherd, K. and Sanderman, J.
Abstract

This near-infrared spectral dataset consists of 2,106 diverse mineral soil samples scanned, on average, on six different units of the same low-cost commercially available handheld
spectrophotometer. Most soil samples were selected from the USDA NRCS National Soil Survey Center-Kellogg Soil Survey Laboratory (NSSC-KSSL) soil archives to represent the diversity of mineral soils (0-30 cm) found in the United States, while 90 samples were selected from Ghana, Kenya, and
Nigeria to represent available African soils in the same archive. All scanning was performed on dried and sieved (<2 mm) soil samples. Machine learning predictive models were developed for soil organic carbon (SOC), pH, bulk density (BD), carbonate (CaCO3), exchangeable potassium (Ex. K),
sand, silt, and clay content from their spectra in the R programming language using most of this dataset (1,976 US soils) and are included in this data release. Two model types, Cubist and partial least squares regression (PLSR) were developed using two strategies: (1) using an average of the spectral scans across devices for each sample and, (2) using the replicate spectral scans across devices for each sample. We present the internal performance of these models here. The dry spectra and Cubist models for these soil properties are available for download from 10.5281/zenodo.7586621. An example of detailed code used to produce these models is hosted at the Open Soil Spectral Library, a free service of the Soil Spectroscopy for the Global Good Network (soilspectroscopy.org), enabling broad use of these data for multiple soil monitoring applications.

KeywordsSoil spectroscopy; Soil organic carbon; Pedometrics; Chemometrics; Soil analysis
Year of Publication2025
JournalData in Brief
Journal citation58 (111229)
Digital Object Identifier (DOI)https://doi.org/10.1016/j.dib.2024.111229
Web address (URL)https://doi.org/10.1016/j.dib.2024.111229
Open accessPublished as ‘gold’ (paid) open access
Publisher's version
Output statusPublished
Publication dates
Online16 Dec 2024
Publication process dates
Accepted09 Dec 2024
PublisherElsevier
ISSN2352-3409

Permalink - https://repository.rothamsted.ac.uk/item/992z5/building-a-near-infrared-nir-soil-spectral-dataset-and-predictive-machine-learning-models-using-a-handheld-nir-spectrophotometer

2 total views
0 total downloads
2 views this month
0 downloads this month
Download files as zip