Experts Discuss Inference Challenges in Synthetic Data at JSM 2025

Synthetic data is increasingly used to reduce disclosure risk, but JSM 2025 speakers argued the industry still lacks reliable ways to do statistical inference on synthetic datasets. The message for practitioners: without better bias and uncertainty tooling, synthetic data can produce confident-looking answers that are simply wrong.

Westat-led JSM 2025 panel calls for new inference methods for synthetic data

At the 2025 Joint Statistical Meetings (JSM) on Nov. 10, experts convened to discuss the challenge of ensuring valid statistical inference from synthetic data as privacy concerns push more organizations toward synthetic alternatives. The roundtable was led by Westat’s Tom Krenzke (Vice President for Statistics and Data Science) and organized by Minsun Riddles (Principal Statistical Associate), with participants emphasizing cross-disciplinary collaboration to develop methods that reduce bias and improve accuracy when analyzing synthetic datasets.

Westat also points to a related proceedings paper by Krenzke and Riddles, “Inference Using Synthetic Data: Balancing Privacy, Bias, and Variance in Modern Statistical Practice,” which frames the core tension: stronger privacy protections can change the statistical properties of the released data, complicating standard workflows for estimation and confidence intervals. The authors conclude with a call for ongoing collaboration to refine tools, share insights, and increase confidence in synthetic data products.

Analytics teams risk biased estimates and misleading confidence intervals. If the synthetic generation process shifts distributions or relationships, “business as usual” inference can quietly break—especially when teams reuse standard models and uncertainty estimates without synthetic-aware adjustments.
Privacy/utility trade-offs need measurable uncertainty, not just intuition. The discussion highlights a practical gap for privacy engineers and compliance stakeholders: organizations need methods that quantify uncertainty introduced by synthesis while still controlling disclosure risk.
Synthetic data procurement and governance should include inference requirements. Beyond utility metrics and privacy claims, buyers should ask vendors (or internal teams) how they validate inference: what bias checks exist, how variance is estimated, and what caveats apply to downstream statistical conclusions.

Daily BriefJul 17, 20262 min