Empowering biological knowledgebases: advances in human-in-the-loop AI-driven literature curation

Wood, ValerieORCID logo, Jeffryes, MattORCID logo, Green, Andrew F.ORCID logo, Blum, MatthiasORCID logo, Orchard, SandraORCID logo, Panni, SimonaORCID logo, Quaglia, FedericaORCID logo, Rodriguez-Esteban, RaulORCID logo, Seager, JamesORCID logo, Tosatto, Silvio C. E.ORCID logo, +2 more...Wittig, UlrikeORCID logo and Harrison, MelissaORCID logo (2026) Empowering biological knowledgebases: advances in human-in-the-loop AI-driven literature curation. Bioinformatics Advances, 6 (1): vbag028. 10.1093/bioadv/vbag028
Copy

Biological knowledgebases facilitate discovery across the life sciences by structuring experimental findings into human-readable and computable formats. These essential resources are maintained by a small number of professional biocurators worldwide and face combined chronic underfunding and the exponential growth of the literature. In this perspective, we review how artificial intelligence, particularly large language models and agentic systems, can augment literature-curation workflows. Applications include literature recommendation, entity recognition, data extraction, summarization, ontology development, and quality control with emphasis on published use cases at Global Core BioData Resources and ELIXIR Core Data Resources. We identify key challenges, including the scarcity of training data, difficulty in extracting complex relationships, and concerns about error propagation. To address these challenges, we propose a human-in-the-loop framework where generative artificial intelligence approaches accelerate routine tasks while curators provide critical evaluation and domain expertise. We also propose practical recommendations for the community, including the creation of shared benchmark datasets, harmonized evaluation frameworks, and best-practice guidelines for transparent human-in-the-loop AI deployment in biocuration. These synergistic partnerships will be critical to ensure biological rigour, accelerating knowledge integration while maintaining the quality essential for trusted biological resources.


picture_as_pdf
vbag028.pdf
subject
Published Version
Creative Commons Attribution
Available under Creative Commons: Attribution 4.0

View Download

EndNote BibTeX Reference Manager Refer Atom Dublin Core MODS OPENAIRE METS OpenURL ContextObject Data Cite XML OpenURL ContextObject in Span MPEG-21 DIDL HTML Citation RIOXX2 XML ASCII Citation
Export

Downloads