Poster
Jana Sperschneider
CSIRO
Canberra, ACT, AUSTRALIA
Taj Arndell
Queensland University of Technology
Brisbane, Queensland, Australia
Camilla Langlands-Perry
CSIRO
Canberra, Australian Capital Territory, Australia
Jian Chen
CSIRO
Canberra, Australian Capital Territory, Australia
Eva Henningsen
CSIRO
Canberra, Australian Capital Territory, Australia
David Lewis
CSIRO
Canberra, Australian Capital Territory, Australia
Cheryl Blundell
CSIRO
Canberra, Australian Capital Territory, Australia
Thomas Vanhercke
CSIRO
Canberra, Australian Capital Territory, Australia
Melania Figueroa
CSIRO
Canberra, Australian Capital Territory, Australia
Peter Dodds, PhD
CSIRO Agriculture and Food
Canberra, ACT, AUSTRALIA
Gene annotation is crucial for accurate inference of biological knowledge from genomes but relies on decades-old methods biased towards model species and conserved genes. Especially non-canonical genes such as those lacking homologs, those residing in rapidly evolving genomic regions or single-exon genes are still routinely dismissed in annotation pipelines as transcriptional noise. In fungal pathogen genomes, this disproportionately affects the accuracy of effector gene annotation. We introduce EffectorGeneP, a data-driven machine learning approach that does not rely on homology evidence for gene annotation and annotates over 95% of known effectors correctly, while other state-of-the-art methods annotate only 15%-75% depending on the species. In line with this, EffectorGeneP annotates ~40% more genes encoding secreted proteins in pathogen genomes all of which are supported by transcriptional evidence. We show that correction of gene models with EffectorGeneP in flax rust identifies AvrM3 and AvrN which encode unusually large effector proteins. We demonstrate the utility of EffectorGeneP as a tool for effector library design and show that pooled effector library screening in plant protoplasts uncovers the previously poorly annotated AvrSr26 gene in wheat stem rust. EffectorGeneP addresses the current bottleneck of accurate gene annotation and will be invaluable in high-priority pathogens for which not a single effector has been found.