Equine Integration B.V. (Fontys ICT Partner in Innovation)
Project description
This project addresses the challenge of generating structured, explainable training feedback for sport horses using raw sensor data — without relying on a single domain expert to manually review every session.
The core research question was: how can fragmented data sources be transformed into automated, transparent feedback that supports expert judgment rather than replacing it?
The design challenge was twofold. First, expert labels were scarce, requiring a two-phase labeling strategy combining direct annotation extraction and weak supervision. Second, explainability was non-negotiable — a previous AI attempt had been rejected precisely because its decisions could not be traced.
The solution is a hybrid system: three XGBoost classifiers evaluated under Leave-One-Horse-Out cross-validation for accuracy, paired with a white-box rule layer that distills expert knowledge into readable IF-THEN rules. The expert can inspect, verify, and retrain the system through an interactive dashboard.
Context
Domain: Equine Sports Medicine / Veterinary Data Science
This project operates in the domain of professional equine sports management, where performance horses undergo structured training programs that must be carefully monitored to prevent overtraining, injury, and welfare issues.
A professional yard managing a large number of sport horses requires individualised training feedback after every session. This feedback is currently produced manually by a single domain expert who reviews heart rate data, workload metrics, gait measurements, and rider notes for each horse. The process does not scale, and the expertise is difficult to transfer or document consistently.
The technical context involves integrating four heterogeneous data sources — workload logs, session annotations, clinical expert comments in Dutch, and horse metadata — into a unified dataset suitable for machine learning. The data is unlabelled by default, making supervised learning a non-trivial challenge that required a custom labeling strategy grounded in the expert's own annotation behaviour.
The broader challenge is one of trust and transparency. The sport horse industry is a high-stakes welfare environment where decisions directly affect animal health. Any automated system must not only perform accurately but must produce outputs the domain expert can read, question, and override. This placed explainability at the centre of the design, rather than treating it as an optional layer added after modelling.
The project sits at the intersection of applied machine learning, domain expert collaboration, and responsible AI design in a specialised, welfare-sensitive professional context.
Results
Product Outcomes
The project delivered a fully functional hybrid AI system consisting of three components: a machine learning layer, a white-box rule engine, and a six-page interactive dashboard.
Three XGBoost classifiers were trained and evaluated using Leave-One-Horse-Out cross-validation across 60 horses — the strictest possible evaluation strategy for this dataset, ensuring no data leakage between training and test sets:
- Status Classifier: 82.2% accuracy / Macro F1 = 0.800
- Risk Classifier: 81.7% accuracy / Macro F1 = 0.794
- Load Recommendation: 79.4% accuracy / Macro F1 = 0.761
The safety-critical class (recovery_required) achieved 93% recall — meaning 94 of 102 sessions requiring recovery were correctly identified. In a welfare-sensitive context, this is the metric that matters most.
The white-box rule engine produced 6 explicit IF-THEN rules derived exclusively from the domain expert's own annotation behaviour. The primary rule threshold (stress score > 2.5) was independently confirmed by SHAP feature attribution — the algorithm surfaced the expert's implicit clinical reasoning without being told what to look for.
The dashboard delivers six interactive pages: Fleet Overview, Horse Detail, Session Deep Dive (with SHAP waterfall chart), Expert Annotation interface, Decision Rules viewer, and an automated Feedback Generator producing natural language training summaries.
Key Insights
Stress score was the single most predictive feature, outperforming all workload volume ratios. Heart rate recovery ranked above raw effort metrics. These findings align with current sports science literature and were validated by the domain expert when reviewing the extracted rules.
The hardest classification boundary — on_track versus monitor — produced F1 = 0.60, which reflects genuine clinical ambiguity rather than model failure. These sessions are surfaced for expert review rather than assigned a forced verdict, which is the correct design response.
Validation
The system was validated against real session data across the full horse population. Expert review of the extracted decision rules confirmed that the thresholds matched established clinical judgment. The active learning loop has been tested end-to-end: new expert annotations automatically trigger rule retraining, and updated IF-THEN rules appear on the Decision Rules page without developer intervention.
TRLevel Positioning
This project sits at TRL 4-5. The core technology has been validated in a laboratory environment using real operational data (TRL 4), and the complete system — including the dashboard and feedback pipeline — has been demonstrated in a relevant environment with real domain expert interaction (TRL 5). Deployment to a live production environment with continuous expert use would move it to TRL 6-7, which requires integration with the yard's existing stable management infrastructure.
The system is not yet in daily production use, but all components are functional, tested, and documented sufficiently for handover and further development.
About the project group
Former Howest Bruges graduate with a Bachelor's degree in Computer Science, majoring in Software Engineering. I took on this project individually, working on it full-time from March to June 2026 — a four-month intensive research and development cycle.
I chose this project because it aligned naturally with both my technical background and my personal interests. AI and machine learning are reshaping every industry, and the opportunity to apply that in a real-world, welfare-sensitive context — working with actual sensor data from horses and a genuine domain expert — made this feel like meaningful work rather than an academic exercise.
I worked independently throughout the project, taking ownership of every stage: data integration, label engineering, model development, explainability design, and the dashboard implementation. The workload was significant but the process was rewarding. Seeing the system move from raw, fragmented data sources to a working, expert-readable AI pipeline — and having the domain expert validate the rule thresholds it extracted — made every long session worth it.
I am proud of what this project produced and confident it lays a solid foundation for real deployment at Equine Integration B.V.