DGCNN approach links metagenome-derived taxon and functional information providing insight into global soil organic carbon

A - Papers appearing in refereed journals

Gardiner, L., Marshall, M., Reusch, K., Dearden, C., Birmingham, M., Carrieri, A. P., Pyzer-Knapp, E. O., Krishna, R. and Neal, A. L. 2024. DGCNN approach links metagenome-derived taxon and functional information providing insight into global soil organic carbon. NPJ Biofilms Microbiomes. 10, p. 113. https://doi.org/10.1038/s41522-024-00583-9

AuthorsGardiner, L., Marshall, M., Reusch, K., Dearden, C., Birmingham, M., Carrieri, A. P., Pyzer-Knapp, E. O., Krishna, R. and Neal, A. L.
Abstract

Metagenomics can provide insight into the microbial taxa present in a sample and, through gene identification, the functional potential of the community. However, taxonomic and functional information are typically considered separately in downstream analyses. We develop interpretable machine learning (ML) approaches for modelling metagenomic data, combining the biological representation of species with their associated genetically encoded functions within models. We apply our methods to investigate soil organic carbon (SOC) stocks. First, we combine a diverse global set of soil microbiome samples with environmental data, improving the predictive performance of classic ML and providing new insights into the role of soil microbiomes in global carbon cycling. Our network analysis of predictive taxa identified by classical ML models provides context for their ecological
significance, extending the focus beyond just the most predictive taxa to ‘hidden’ features within the model that might be considered less predictive using standard methods for explainability. We next develop unique graph representations for individual microbiomes, linking microbial taxa to their associated functions directly, enabling predictions of SOC via deep graph convolutional neural
networks (DGCNNs). Interpretation of the DGCNNs distinguished between the importance of functions of key individual species, providing genome sequence differences, e.g., gene loss/ acquisition, that associate with SOC. These approaches identify several members of the Verrucomicrobiaceae family and a range of genetically encoded functions, e.g., related to carbohydrate metabolism, as important for SOC stocks and effective global SOC predictors. These relatively understudied but widespread organisms could play an important role in SOC dynamics globally

Year of Publication2024
JournalNPJ Biofilms Microbiomes
Journal citation10, p. 113
Digital Object Identifier (DOI)https://doi.org/10.1038/s41522-024-00583-9
Open accessPublished as ‘gold’ (paid) open access
FunderBiotechnology and Biological Sciences Research Council
Funder project or codeGrowing Health [ISP]
Growing Health (WP2) - bio-inspired solutions for healthier agroecosystems: Understanding soil environments
Publisher's version
Output statusPublished
Publication dates
Online26 Oct 2024
Publication process dates
Accepted11 Oct 2024

Permalink - https://repository.rothamsted.ac.uk/item/9922q/dgcnn-approach-links-metagenome-derived-taxon-and-functional-information-providing-insight-into-global-soil-organic-carbon

83 total views
4 total downloads
21 views this month
0 downloads this month
Download files as zip