CareBuddy
AI & Data
Semester programme:Master of Applied IT
Project group members:Davis
Project description
CareBuddy is a research-driven conversational healthcare assistant designed to explore how structured dialogue control can improve the safety, reliability, and usability of AI in medical interactions. Unlike generic chatbots, CareBuddy uses a Conversation Automaton & Dialogue Manager (CADM) to guide conversations through explicit states and actions, reducing ambiguity and unsafe responses. The system supports audio-first interaction, proactive questioning, and structured data capture while maintaining strong safety guardrails and auditability. CareBuddy is not intended to replace healthcare professionals but to act as a supportive, controlled conversational assistant and a research platform for studying safe, patient-centric conversational AI in healthcare contexts.
Context
CareBuddy is developed within a research and innovation context focused on the safe application of conversational AI in healthcare. The project builds on earlier phases that validated a structured, automaton-controlled dialogue approach for managing AI conversations. The current phase extends this work toward more realistic healthcare scenarios by introducing audio interaction, persistent patient context, and enhanced safety controls, while remaining strictly research-oriented and non-clinical.
Results
This section presents the consolidated results of all nine experimental runs derived from the unified evaluation table, covering three conversational scenarios (Case 1: Fever, Case 2: Periodic Blood Pressure Checkup, Case 3: Phone Purchase) and three system configurations: v1 (no system prompt), v2 (medical system prompt), and v3 (CADM-governed dialogue). Performance is evaluated using five conversational alignment metrics—Turn Balance, Utterance Length, Latency, Conversational Flow, and Contextual Recall—aggregated through a weighted scoring scheme to determine overall conversational alignment.
Across all experiments, Turn Balance remains constant, indicating stable turn-taking behavior independent of system configuration or scenario. Latency values are similarly consistent across v1, v2, and v3, demonstrating that the introduction of dialogue governance does not introduce measurable performance overhead.
Clear differences emerge in higher-level conversational metrics. For both Conversational Flow and Contextual Recall, v1 and v2 consistently exhibit low or zero values across all cases, indicating limited dialogue progression and weak retention of contextual information. In contrast, v3 demonstrates consistently higher values for both metrics across all three scenarios, reflecting smoother conversational transitions and more reliable preservation of user-provided information.
Utterance Length also increases markedly in v3 across all cases, suggesting more complete and informative responses. Importantly, this increase does not coincide with higher latency, indicating that the added dialogue structure improves response quality without degrading responsiveness.
The weighted aggregation of metrics results in a consistent pattern across all nine experiments. Both v1 and v2 yield Low conversational alignment in every case, with only minor variation between scenarios. In contrast, v3 achieves Medium conversational alignment in all three cases, representing a clear and systematic improvement over the prompt-based and unguided configurations.
Overall, the unified results demonstrate that prompt-based medical guidance alone (v2) does not provide meaningful improvement over the baseline model (v1). The consistent transition from Low to Medium alignment observed exclusively in the CADM-governed system confirms that explicit dialogue governance—rather than prompt engineering—is the primary driver of improved conversational quality, coherence, and contextual consistency in the CareBuddy system.