ExamVault: Overcoming Challenges with AI and Embeddings

Daniel Arbabian

Developing ExamVault, a platform designed to transform exam preparation, was a deeply technical journey. While it might seem like embeddings are primarily for similarity searches, I wanted to push the boundaries. My aim was to use embeddings to create a time-aware cognitive model that could adapt to each student's learning progress. This post explores the challenges and innovative solutions I implemented while building ExamVault.

The Power of Neural Networks in Education

The backbone of ExamVault lies in its ability to match user-uploaded exam questions with similar ones in a database. This involves using neural networks, which are the core of most AI systems.

Newton's
Output:

A neural network learns by passing inputs, like text, through layers of interconnected nodes. At ExamVault, this process begins by breaking down questions into smaller units (tokens) and converting them into embeddings—a numerical representation of their meaning. The layers then process these embeddings to predict connections and surface relevant questions. For example, if a user uploads a question about Newton’s laws, the system identifies related concepts and recommends questions on forces, motion, or energy. This capability comes from meticulously trained networks that can capture both semantic and contextual relationships.

Tokenization: Breaking Text into Learnable Units

Before the neural network can process any input, it must first tokenize the text. Tokenization breaks down text into smaller, model-friendly units.

Newton’s laws of motion
Original Text

For ExamVault, tokenization was crucial because exam questions often contain symbols, formulae, and unique phrases. Using subword tokenization methods, words like "thermodynamics" are split into ["thermo", "dynamics"], preserving their meaning even if the model hasn’t encountered the full term. This allowed ExamVault to handle unseen questions gracefully.

Self-Attention: Understanding Context in Questions

One of the most complex challenges was enabling the system to understand the relationships between different parts of a question. A question might refer to variables, equations, or diagrams introduced earlier in the text. Self-attention mechanisms were critical here.

A car
0.00
accelerates
0.00
at 2 m/s²
0.00
for 5 seconds
0.00
velocity?
0.00
Input Sequence

The self-attention mechanism in ExamVault’s architecture allowed the model to focus on specific parts of a question when determining its relevance. For example, in the question "If a car accelerates at 2 m/s², what is its velocity after 5 seconds?" the model prioritises the relationship between "accelerates," "2 m/s²," and "velocity" over unrelated parts like "car."

Embeddings: Mapping Knowledge in High-Dimensional Space

ExamVault doesn’t just stop at matching questions. It builds a multidimensional map of knowledge using embeddings. These embeddings capture both semantic similarity and contextual metadata like difficulty and student performance history.

Word Embeddings

For example, embeddings ensure that questions about "kinetic energy" are grouped closely with those about "work-energy principles" rather than unrelated topics like "atomic structure." This clustering is key to tailoring recommendations to a student’s needs.

Dynamic Recommendations Using Temporal Embeddings

One of the most innovative aspects of ExamVault is its ability to adapt recommendations based on time. By integrating temporal embeddings, the system tracks how a student’s performance evolves and adjusts its suggestions dynamically.

Let’s say a student struggles with electromagnetism in physics. Over time, as they improve, the system gradually introduces more complex questions to ensure steady progress. This time-aware model transforms ExamVault from a static question bank into a personalised tutor.

Bringing It All Together

The combination of neural networks, tokenization, self-attention, and embeddings powers ExamVault’s adaptive learning capabilities. By leveraging these advanced AI techniques, I’ve been able to create a platform that not only matches questions but also fosters genuine understanding and growth. For students, this means studying smarter, not harder—and for me, it’s the culmination of an incredible journey into the depths of AI.