Investigating the Topic Capacity of Document Embeddings

This project explores the topic capacity of document embeddings, numerical representations of text produced by large language models. While embeddings are widely used for tasks such as information retrieval and clustering, their ability to represent multiple topics accurately remains uncertain.

The research focuses on three components:
– Literature Review: establishing a theoretical foundation on how embeddings are constructed across models.
– Experimental Evaluation: systematically testing how retrieval accuracy changes as the number of topics in a document increases.
– Alternative Strategies: investigating whether multiple topic-focused embeddings improve retrieval performance.

By examining the relationship between topic count and embedding quality, the project will generate new insights into the limitations and strengths of embeddings. These findings will have practical value for retrieval and clustering applications, while preparing the student for advanced research at Honours or Masters level.

Tristan Trieu

Western Sydney University

Tristan Trieu is a second-year Bachelor of Data Science student at Western Sydney University, driven by curiosity and a deep fascination with uncovering hidden patterns through mathematics and data science. Although new to the field, he finds genuine joy and fulfillment in the process of learning — especially in those moments of realization when a complex idea finally clicks or a new perspective emerges. For Tristan, these moments give meaning to his studies and inspire him to keep exploring how mathematics and the right models can be used to reveal insights, guide decisions, and make a real impact.

He aims to continue developing his skills and knowledge, with ambitions to pursue postgraduate studies — a master’s or even a PhD — in machine learning and data-driven research. Outside of his academic pursuits, he enjoys playing football and listening to music, finding in them the same rhythm, flow, and sense of discovery that fuel his passion for data.

You may be interested in

Pu Ti Dai

Pu Ti Dai

Properties of Brownian Motion
Rong Xu

Rong Xu

Optimal control of air-conditioning with thermal storage
Yixiang Wang

Yixiang Wang

Supersymmetric quantum mechanics and the Witten index
Hanyi Wang

Hanyi Wang

Infilling Missing Data in Time Series
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text.