A research team has developed SynDocDis, a framework that generates realistic synthetic physician-to-physician conversations using large language models, offering a privacy-compliant path to a type of clinical training data that has been largely inaccessible to AI researchers.

Doctor-to-doctor case discussions contain some of medicine's most nuanced clinical reasoning — the kind that unfolds when specialists debate a diagnosis or weigh competing treatment options. That knowledge has remained largely locked away from AI systems, not because it lacks value, but because privacy regulations and ethical obligations make real conversations nearly impossible to access at scale. SynDocDis, described in a preprint published on ArXiv, proposes a structured alternative.

Why Doctor-to-Doctor Dialogue Has Been Overlooked

Most synthetic medical data research has concentrated on patient-physician interactions or structured records such as clinical notes and discharge summaries. The gap in physician-to-physician dialogue synthesis is significant: these conversations are where diagnostic hypotheses get stress-tested, where treatment trade-offs are negotiated, and where institutional clinical knowledge is transmitted between colleagues.

The researchers behind SynDocDis argue that filling this gap matters not just for academic completeness, but for practical applications in medical education and clinical decision support systems — two areas where AI could have meaningful near-term impact.

The framework achieved 91% clinical relevance ratings while maintaining doctors' and patients' privacy.

How SynDocDis Works

The framework combines two core elements: structured prompting techniques applied to a large language model, and de-identified case metadata — anonymised patient case information stripped of any identifying details. Rather than feeding raw clinical records into a model and hoping for coherent output, the approach uses the metadata as a scaffold, guiding the model to produce dialogues that are grounded in realistic clinical scenarios without exposing real patient or physician data.

This metadata-driven approach is designed to preserve privacy by construction. Because the inputs are de-identified before the generation process begins, the synthetic outputs carry no traceable link to real individuals — a meaningful design choice given the regulatory environment surrounding healthcare data.

Physician Evaluation Across Oncology and Hepatology

The team evaluated SynDocDis using five practicing physicians who assessed outputs across nine scenarios drawn from oncology and hepatology — two specialties characterised by complex, multi-disciplinary decision-making. The evaluation criteria focused on two dimensions: communication effectiveness and medical content quality.

Results, as reported by the authors, were strong. Mean communication effectiveness scored 4.4 out of 5, while medical content quality averaged 4.1 out of 5. Interrater reliability — a measure of how consistently the five physicians agreed in their assessments — reached a kappa score of 0.70 (95% confidence interval: 0.67–0.73), which falls in the range typically described as substantial agreement. The 91% clinical relevance rating indicates that the vast majority of generated dialogue was judged useful and medically appropriate by the evaluating clinicians.

It is important to note that these benchmarks are self-reported by the research team and have not yet been independently replicated or peer-reviewed, as the paper currently exists as a preprint.

Limitations and What Comes Next

The evaluation, while encouraging, is limited in scope. Five physicians assessing nine scenarios represents a small sample, and both specialties tested — oncology and hepatology — involve a particular style of case complexity that may not generalise to primary care, emergency medicine, or other clinical contexts. Broader validation across specialties, institutions, and physician populations would be necessary before the framework could be considered production-ready for sensitive applications.

The research does not detail which underlying large language model powers the generation process, nor does it provide a granular breakdown of failure cases — instances where the generated dialogue was clinically inaccurate or misleading. Understanding where the system falls short matters at least as much as knowing where it succeeds, particularly in a medical context where errors carry real consequences.

The team identifies medical education and clinical decision support as the primary near-term application areas. In medical education, synthetic physician dialogues could serve as training material for junior doctors learning how to structure case discussions or consult with specialists. In decision support, AI systems trained on such dialogues might eventually participate in — or at least inform — real clinical conversations.

What This Means

SynDocDis offers an early framework for one of medical AI's most persistent data access problems, and if its results hold under broader independent scrutiny, it could expand the training resources available to developers building AI tools for clinical environments.