Gartner’s forecast that synthetic data will make up 80% of AI data by 2028 is less a prediction about tools and more a warning about operational maturity. If that mix shift happens, the bottleneck moves from data access to synthetic data governance: quality, drift, and privacy risk at scale.
Gartner projects 80% of AI data will be synthetic by 2028
Gartner predicts that by 2028, 80% of the data used for AI will be synthetic—a major swing away from relying primarily on collected “real-world” datasets. The same write-up notes that many organizations are still early in evaluating synthetic data, despite the promise of faster model development, improved data quality, and lower costs when compared with sourcing and labeling production data.
The IBM perspective frames synthetic data as a practical response to real-data constraints: scarcity, expense, bias, and privacy exposure. It also highlights applied use cases such as insurance fraud simulation, where synthetic scenarios can expand coverage of rare events and accelerate model iteration without waiting for enough real examples to accumulate.
- Expect new “synthetic ops” work, not just new vendors. If synthetic data becomes the default input to AI pipelines, data leaders will need repeatable generation, validation, lineage, and monitoring workflows—not one-off dataset creation.
- Privacy and security review must shift left. Synthetic data can reduce exposure to personal data, but it still carries re-identification and memorization risks depending on generation method and seed data; teams will need clear release criteria and audits.
- Quality failures will look like model failures. Temporal gaps, outdated generators, or biased seed data can silently degrade model performance; monitoring has to cover both the model and the synthetic data process that feeds it.
- Coverage beats volume. The win is targeted scenario expansion (edge cases, rare classes, counterfactuals) with measurable impact—otherwise teams risk producing large synthetic corpora that don’t improve downstream outcomes.
Gartner’s forecast: by 2028, 80% of data used in AI will be synthetic.
