Investigating the Topic Capacity of Document Embeddings

This project explores the topic capacity of document embeddings, numerical representations of text produced by large language models. While embeddings are widely used for tasks such as information retrieval and clustering, their ability to represent multiple topics accurately remains uncertain.

The research focuses on three components:
– Literature Review: establishing a theoretical foundation on how embeddings are constructed across models.
– Experimental Evaluation: systematically testing how retrieval accuracy changes as the number of topics in a document increases.
– Alternative Strategies: investigating whether multiple topic-focused embeddings improve retrieval performance.

By examining the relationship between topic count and embedding quality, the project will generate new insights into the limitations and strengths of embeddings. These findings will have practical value for retrieval and clustering applications, while preparing the student for advanced research at Honours or Masters level.

Tristan Trieu

Western Sydney University

Tristan Trieu is a second-year Bachelor of Data Science student at Western Sydney University, driven by curiosity and a deep fascination with uncovering hidden patterns through mathematics and data science. Although new to the field, he finds genuine joy and fulfillment in the process of learning — especially in those moments of realization when a complex idea finally clicks or a new perspective emerges. For Tristan, these moments give meaning to his studies and inspire him to keep exploring how mathematics and the right models can be used to reveal insights, guide decisions, and make a real impact.

He aims to continue developing his skills and knowledge, with ambitions to pursue postgraduate studies — a master’s or even a PhD — in machine learning and data-driven research. Outside of his academic pursuits, he enjoys playing football and listening to music, finding in them the same rhythm, flow, and sense of discovery that fuel his passion for data.

You may be interested in

Jaco van Tonder

Jaco van Tonder

Special Solutions to the Ricci Flow on 4-Dimensional Principal Bundles
Shirley Wang

Shirley Wang

Post Quantum Cryptography
Patrick Grave

Patrick Grave

Accurate identification of splice junctions using nanopore direct RNA sequencing
Emma Naumann

Emma Naumann

A Mathematical Exploration of the Role of Feral Pigs in the Transmission of Japanese Encephalitis Virus
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text.