This project explores the topic capacity of document embeddings, numerical representations of text produced by large language models. While embeddings are widely used for tasks such as information retrieval and clustering, their ability to represent multiple topics accurately remains uncertain.
The research focuses on three components:
– Literature Review: establishing a theoretical foundation on how embeddings are constructed across models.
– Experimental Evaluation: systematically testing how retrieval accuracy changes as the number of topics in a document increases.
– Alternative Strategies: investigating whether multiple topic-focused embeddings improve retrieval performance.
By examining the relationship between topic count and embedding quality, the project will generate new insights into the limitations and strengths of embeddings. These findings will have practical value for retrieval and clustering applications, while preparing the student for advanced research at Honours or Masters level.
Western Sydney University
Tristan Trieu is a second-year Bachelor of Data Science student at Western Sydney University, driven by curiosity and a deep fascination with uncovering hidden patterns through mathematics and data science. Although new to the field, he finds genuine joy and fulfillment in the process of learning — especially in those moments of realization when a complex idea finally clicks or a new perspective emerges. For Tristan, these moments give meaning to his studies and inspire him to keep exploring how mathematics and the right models can be used to reveal insights, guide decisions, and make a real impact.
He aims to continue developing his skills and knowledge, with ambitions to pursue postgraduate studies — a master’s or even a PhD — in machine learning and data-driven research. Outside of his academic pursuits, he enjoys playing football and listening to music, finding in them the same rhythm, flow, and sense of discovery that fuel his passion for data.
![]()
![]()