Algebrakit Automatic Short-Answer Grading System

Transformative Technology:

AI & Data

Semester programme:

Artificial Intelligence

Partner

Project group members:

Irina Ahamad
Vanesa Taneva
Vedant Kulkarni

Transformative Technology:

AI & Data

Semester programme:

Artificial Intelligence

Partner

Project group members:

Irina Ahamad
Vanesa Taneva
Vedant Kulkarni

Previous project PSV Player Popularity index & visualiser Next projectRats In The Kitchen

Project description

How can a cost-efficient, interpretable AI system combine machine learning concepts to accurately and fairly assess open-ended algebra answers at secondary-school level, including partial credit based on key points and teacher-defined rubrics? The task of ASAG requires many smart, efficient moves - how can we compose such a system?

Context

The project is situated in the domain of educational technology, specifically automated assessment for secondary-level mathematics education. It focuses on supporting teachers by automatically grading open-ended algebra answers using natural language processing while preserving interpretability, fairness, and pedagogical alignment.

Results

The main outcome of this project is an automated system for grading short algebra answers at secondary school level. The system combines semantic similarity, rubric-based retrieval, natural language inference tactics, vector database caching, and limited use of large language models (LLMs) to grade student answers. Using this hybrid approach, the system achieves 92% accuracy, calculated using synthetic student answers.

An important insight from the project is that LLMs are not needed for every answer. Most student responses can be graded using embeddings and key-point matching from the rubric. The LLM is only used when the system is uncertain. This design significantly reduces computational cost while keeping grading accuracy high, making the system more practical for large-scale educational use.

The system was validated using an evaluation framework with synthetic annotated student answers, showing a 92% agreement with expected grades. This provides strong evidence that the grading logic aligns well with teacher expectations.

In terms of Technology Readiness Level (TRL), our system is positioned at TRL 5-6. A working prototype has been developed and tested with data in a relevant environment. While it has not yet been deployed in schools, the architecture and evaluation results indicate that the system is ready for pilot studies and further real-world testing.

About the project group

Hello, we are Irina, Vanesa and Vedant. We all have a background in AI and are eager to learn more. Our knowledge is mostly in NLP, transformers, LLMs and MLOps. We have been working on the ASAG project for 5 months, following the AGILE methodology.

Repository

https://github.com/vtanevva/answer-evaluator