Advancing structural phylogenetics approach

 

Understanding the evolutionary relationships between species, genes, or proteins is a cornerstone of evolutionary biology. Traditionally, these relationships, represented by phylogenetic trees, have been inferred using DNA or protein sequences. However, protein structures, which are more conserved than sequences, may be particularly useful for phylogenetic reconstruction in cases where sequence similarity has substantially reduced [1].

Recent advances in AI-based protein structure predictions have made structural data readily available. Motivated by this, multiple structural phylogenetics methods, such as [1] and [2], have been proposed to incorporate AI-predicted protein structures into phylogenetic tree reconstruction, complementing traditional sequence-based approaches. Most of these methods are based on the structural alphabet proposed in Foldseek [3], which treat structural features as letters in DNA or protein sequences, allowing the use of traditional sequence-based phylogenetic methods. While effective, these methods rely heavily on specific assumptions and may not fully capture the stochastic nature of structural evolution.

I will empirically assess one of these methods [1] using simulations and multiple real datasets, including one from [4]. I will explore potential areas for improvement, particularly focusing on the development of a comprehensive stochastic model that encapsulates the complexities of structural evolution. Such a model could significantly advance our understanding of how structural features evolve and diversify.

Aims and Timeline

  • Literature review – 1.5 week
  • Assess the performance of the method from [1] using simulations and multiple real datasets, including one from [4] and identify potential areas for improvement – 2.5 weeks
  • Implement, apply and assess an improved approach – 2 weeks

References

  1. Puente-Lelievre, Caroline, et al. “Tertiary-interaction characters enable fast, model-based structural phylogenetics beyond the twilight zone.” bioRxiv(2023): 2023-12.
  2. Moi, David, et al. “Structural phylogenetics unravels the evolutionary diversification of communication systems in gram-positive bacteria and their viruses.” BioRXiv(2023): 2023-09.
  3. Van Kempen, Michel, et al. “Fast and accurate protein structure search with Foldseek.” Nature biotechnology2 (2024): 243-246.
  4. Mifsud, Jonathon CO, et al. “Mapping glycoprotein structure reveals Flaviviridae evolutionary history.” Nature8030 (2024): 695-703.

 

LI FU ZHANG

The University of Melbourne

Li Fu Zhang, is currently studying Bachelor of Science and marjoring in Statistics at the
University of Melbourne, Australia. With a weighted average mark of 95, he has
demonstrated exceptional academic prowess, earning a spot on the Dean’s Honors List in
both his first and second years, ranking within the top 3% of his cohort. His coursework
spans essential topics such as probability theory, statistical inference, stochastic process,
and data science.
Li Fu’s technical competencies include proficiency in Python, C, R, MATLAB, Microsoft
Word, and Excel. He gained industry experience during an internship as a software engineer
at Hisense Communications from December 2023 to January 2024 and as a machine
learning engineer at Institute of Chinese Academy of Sciences from July 2024 to August
2024 . Beyond academics, he actively contributes to the community as a participating the
VCE summer school program.

You may be interested in

Benjamin Solomon

Benjamin Solomon

Super Box-Ball Systems
Rebecca Rasmussen

Rebecca Rasmussen

What Kind of Random Walk are these Biological Cells Doing?
Matthew Cochran

Matthew Cochran

Low-diameter Networks for Applications on High Performance Computing and Communication Networks
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text.