DGCNN approach links metagenome-derived taxon and functional information providing insight into global soil organic carbon

Gardiner, L., Marshall, M., Reusch, K., Dearden, C., Birmingham, M., Carrieri, A. P., Pyzer-Knapp, E. O., Krishna, R. and Neal, AndyORCID logo (2024) DGCNN approach links metagenome-derived taxon and functional information providing insight into global soil organic carbon. NPJ Biofilms and Microbiomes, 10. p. 113. 10.1038/s41522-024-00583-9
Copy

Metagenomics can provide insight into the microbial taxa present in a sample and, through gene identification, the functional potential of the community. However, taxonomic and functional information are typically considered separately in downstream analyses. We develop interpretable machine learning (ML) approaches for modelling metagenomic data, combining the biological representation of species with their associated genetically encoded functions within models. We apply our methods to investigate soil organic carbon (SOC) stocks. First, we combine a diverse global set of soil microbiome samples with environmental data, improving the predictive performance of classic ML and providing new insights into the role of soil microbiomes in global carbon cycling. Our network analysis of predictive taxa identified by classical ML models provides context for their ecological significance, extending the focus beyond just the most predictive taxa to ‘hidden’ features within the model that might be considered less predictive using standard methods for explainability. We next develop unique graph representations for individual microbiomes, linking microbial taxa to their associated functions directly, enabling predictions of SOC via deep graph convolutional neural networks (DGCNNs). Interpretation of the DGCNNs distinguished between the importance of functions of key individual species, providing genome sequence differences, e.g., gene loss/ acquisition, that associate with SOC. These approaches identify several members of the Verrucomicrobiaceae family and a range of genetically encoded functions, e.g., related to carbohydrate metabolism, as important for SOC stocks and effective global SOC predictors. These relatively understudied but widespread organisms could play an important role in SOC dynamics globally


picture_as_pdf
s41522-024-00583-9.pdf
subject
Published Version
Available under Creative Commons: Attribution 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads