Understanding the evolutionary relationships between species, genes, or proteins is a cornerstone of evolutionary biology. Traditionally, these relationships, represented by phylogenetic trees, have been inferred using DNA or protein sequences. However, protein structures, which are more conserved than sequences, may be particularly useful for phylogenetic reconstruction in cases where sequence similarity has substantially reduced [1].
Recent advances in AI-based protein structure predictions have made structural data readily available. Motivated by this, multiple structural phylogenetics methods, such as [1] and [2], have been proposed to incorporate AI-predicted protein structures into phylogenetic tree reconstruction, complementing traditional sequence-based approaches. Most of these methods are based on the structural alphabet proposed in Foldseek [3], which treat structural features as letters in DNA or protein sequences, allowing the use of traditional sequence-based phylogenetic methods. While effective, these methods rely heavily on specific assumptions and may not fully capture the stochastic nature of structural evolution.
I will empirically assess one of these methods [1] using simulations and multiple real datasets, including one from [4]. I will explore potential areas for improvement, particularly focusing on the development of a comprehensive stochastic model that encapsulates the complexities of structural evolution. Such a model could significantly advance our understanding of how structural features evolve and diversify.
Aims and Timeline
References
The University of Melbourne
Li Fu Zhang, is currently studying Bachelor of Science and marjoring in Statistics at the
University of Melbourne, Australia. With a weighted average mark of 95, he has
demonstrated exceptional academic prowess, earning a spot on the Dean’s Honors List in
both his first and second years, ranking within the top 3% of his cohort. His coursework
spans essential topics such as probability theory, statistical inference, stochastic process,
and data science.
Li Fu’s technical competencies include proficiency in Python, C, R, MATLAB, Microsoft
Word, and Excel. He gained industry experience during an internship as a software engineer
at Hisense Communications from December 2023 to January 2024 and as a machine
learning engineer at Institute of Chinese Academy of Sciences from July 2024 to August
2024 . Beyond academics, he actively contributes to the community as a participating the
VCE summer school program.