User Intent Classification using fine tuned T5 Small model
Project description
In this project, we tackled the challenge of optimizing intent classification in BDO’s internal chatbot system. Traditionally, this task was handled by powerful but costly large language models like GPT-4o. Our team developed a lightweight alternative by fine-tuning a T5-small model, significantly reducing both latency and cost. We generated a synthetic dataset using GPT-4o and enhanced it with prompt engineering and error injection techniques. The result is a fast, efficient, and reusable classification model — benchmarked and tested using Azure infrastructure, Databricks, and Python-based ML tooling.
Context
BDO, a global consultancy and accountancy firm, is exploring AI-driven solutions to streamline internal workflows. Our assignment was to investigate whether an in-house model could reliably replace a cloud-based LLM for understanding user requests. Through rigorous testing, benchmarking, and model evaluation, we’ve shown that this transition is not only possible — it’s highly practical. This project reflects how smaller, specialized models can deliver impactful results when combined with smart data strategies and collaborative problem-solving.
Results
This project delivered both technical products and strategic insights that directly support BDO’s ambition to reduce dependency on large, external AI models while maintaining performance and improving cost-efficiency.
1. Fine-Tuned T5 Model (T5-ft-2)
The core product is a fine-tuned T5-small model (T5-ft-2), developed to replace GPT-4o in classifying user intents in BDO’s internal chatbot. Trained on a synthetic dataset of ~40,000 samples (generated and augmented by the team), the model reached 97.8% accuracy, surpassing GPT-4o-mini (74.7%) and an earlier version of T5 (60.4%). This performance was validated on a shared benchmark dataset and statistically confirmed via a McNemar’s test (p < 0.001), proving T5-ft-2’s superiority in both accuracy and response time.
2. Cost & Efficiency Gains
While GPT-4o costs scale with usage (~€0.01 per prediction), the total cost of training T5-ft-2 was under €2.50, with near-zero inference cost on internal infrastructure. This enables significant long-term savings for BDO, especially in high-traffic internal use cases. The model also achieved the lowest average response time (0.89s) — nearly twice as fast as GPT-4o-mini — making it a viable solution for real-time applications.
3. Synthetic Data Generation & Tooling
To support training, the team built two reusable tools:
- A data generator, using GPT-4o and prompt chaining
- A data augmenter, adding realistic user-like mistakes for better generalization
These tools can be extended by non-technical users and used to generate new intent samples at scale — increasing the system’s adaptability over time.
4. Evaluation Framework & Reporting
A modular benchmarking framework was created to evaluate new models against baseline performance, including confusion matrices, runtime metrics, and statistical tests. This supports transparency and continuous improvement.
TRL Assessment: TRL 6 → 7
The project has reached TRL 6: the solution was validated in a relevant environment (Azure, Databricks, real datasets), with robust benchmarking and system-level evaluation. With minimal additional integration work and end-user testing, it could be promoted to TRL 7 (system prototype in operational setting).
Strategic Impact
- Demonstrates that fine-tuned open-source models can replace GPT-based solutions
- Provides reusable pipelines and insights for future chatbot automation tasks
- Lowers operational dependency and cost for BDO's internal AI infrastructure
This project proves the power of lean AI development: through data, design, and discipline, even a compact model like T5-small can deliver production-level results.
About the project group
We are 4 students, 3 from ICT & Software profile and 1 from ICT & Business profile. We have been working on this project since February. During the first half of the semester we mainly focused getting requirements from the company and domain understanding. In later phases, we had time to focus more on building the solution.
We had bi-weekly meetings with the client and weekly meeting with our mentors to keep everything tidy and on track.