Identification of universal grass genes and estimates of their monocot-/ commelinid-/ grass-specificity

A - Papers appearing in refereed journals

Mitchell, R. A. C. 2024. Identification of universal grass genes and estimates of their monocot-/ commelinid-/ grass-specificity. Bioinformatics Advances. p. vbaf079. https://doi.org/10.1093/bioadv/vbaf079

AuthorsMitchell, R. A. C.
Abstract

The evolutionary success of grasses is due to characteristics of resilience and fast growth in open habitats that led to their underpinning of agriculture and is attributable to many grass-specific traits. Genes responsible for these traits are likely specific to grasses, highly conserved and present in all grasses (universal genes) as they perform essential functions for fitness. A bioinformatics pipeline was developed to identify such genes using 16 grass full genomes in Ensembl Plants release 56. The first steps used existing gene models to generate groups of grass orthologs to rice and maize genes present in most grass species and refined membership of these groups such as to optimise the Hidden Markov Model (HMM) profile score from the HMMER package. These were then supplemented using new gene models found in grass genomes with the genBlastG tool; this step increased the number of universal groups by >2-fold to give 12,855 highly conserved, universal groups. Specificity for these groups was assessed using closest matching gene models from non-monocot species. Possible cut-off values were tested with sets of known genes expected to be either of common function for all plants, or of commelinid- / grass-specific function. A specificity metric based on HMM score from grass group profiles performed better than % identity as a means of discriminating between these common and specific function test sets. Using an appropriate cut-off for this metric, 5,701 of the groups were identified as monocot- / commelinid- / grass-specific of which 72% appeared to be grass specific. These results comprise the universal_grass_peps database available at DOI doi.org/10.23637/rothamsted.98ywz. This database can be searched by researchers to determine whether their experimentally identified grass genes match universal groups and, for those that do, to obtain systematic estimates of monocot- / commelinid- / grass-specificity.

KeywordsMonocot; Grass evolution; Gene model; Functional orthologs; Genomics
Year of Publication2024
JournalBioinformatics Advances
Journal citationp. vbaf079
Digital Object Identifier (DOI)https://doi.org/10.1093/bioadv/vbaf079
Open accessPublished as ‘gold’ (paid) open access
FunderBiotechnology and Biological Sciences Research Council
Funder project or codeXylan arabinosyl transferases: identification and characterisation of their role in determining properties of grass cell walls
Designing Future Wheat (DFW) [ISPG]
Publisher's version
Accepted author manuscript
Supplemental file
Supplemental file
Output statusPublished
Publication dates
Online07 Apr 2025
Publication process dates
Accepted04 Apr 2025
ISSN1367-4803
PublisherOxford University Press (OUP)

Permalink - https://repository.rothamsted.ac.uk/item/99007/identification-of-universal-grass-genes-and-estimates-of-their-monocot-commelinid-grass-specificity

8 total views
24 total downloads
0 views this month
5 downloads this month
Download files as zip