Deeply Embedded

Abstract:

This article reflects on my changing perception of research through the lens of a project investigating the topic capacity of document embeddings. What began as a stereotypical view of research as rigid and boring gradually transformed into an appreciation for the strive for knowledge and curiosity. Technically, the project examines how increasing topic complexity affects retrieval performance in single-vector document embeddings. Using established information retrieval metrics, the findings reveal a measurable degradation as topics accumulate, highlighting inherent representational limits. Ultimately, this journey demonstrates that research is not about knowing everything, but about learning to ask better questions.

Blog:

Research was never what I imagined myself pursuing as a career. I used to associate it with a familiar stereotype: researchers as bookworms, spending endless hours solving equations, writing journal articles, and staying confined to a single room. Brilliant, perhaps — but distant from the world. When I was younger, the only version of research that seemed worthwhile was the extreme one: producing groundbreaking innovations that would change the world. Anything less felt insignificant.

That perception began to shift during my first few weeks working under the supervision of Dr Laurence Park. I realised that research is not about forcing brilliance at a desk. Insight does not arrive on command. Sometimes it emerges during a short walk, or while half-asleep on a train. Research is less about constant pressure and more about sustained curiosity.

The fascination did not come from writing sections of my report or reading countless papers. It came from small moments: finally understanding a definition of Average Precision, correctly interpreting a performance curve, or debugging a cosine similarity function. Bit by bit, those small victories accumulated. Confidence grew quietly. So did my appreciation for research.

My project investigates a fundamental question in modern information retrieval:

How many topics can a single document embedding meaningfully represent?

Early web search systems relied heavily on keyword matching. If someone searched “jaguar”, the engine might retrieve pages about the animal while missing results about the car. This lexical approach struggled with context and ambiguity. Modern systems now use document embeddings — dense vector representations generated by large language models. Instead of matching words, they compare meaning in high-dimensional space using cosine similarity. Documents that are semantically similar are positioned closer together, allowing retrieval based on conceptual understanding rather than surface-level overlap.

Tristan Trieu
Western Sydney University