Study calls for clearer synthetic data guidelines

A University of Exeter study argues synthetic data needs clearer rules if teams want it to remain transparent, accountable, and fair. The point is not that synthetic data is unusable; it is that governance around how it is generated and processed is still too loose.

Clear guidelines needed for synthetic data to ensure transparency, accountability and fairness, study says

A study from the University of Exeter, reported by ScienceDaily, says synthetic data generation and processing should be governed by clearer guidelines to support transparency, accountability, and fairness. The researchers position this as a practical requirement for ethical AI development, especially as synthetic datasets are increasingly used to train, test, and share AI systems when real-world data is sensitive or restricted.

The study’s central warning is that synthetic data can create a false sense of safety if organizations treat it as automatically privacy-safe or bias-free. In practice, teams still need to explain how the data was produced, what assumptions shaped the generation process, whether important population characteristics were preserved, and who is accountable if downstream decisions produce harmful or misleading results.

That matters because synthetic data now sits in the middle of product, research, and compliance workflows. If standards remain vague, teams may struggle to defend privacy claims, document provenance, or show regulators and customers that fairness risks were reviewed before a model was deployed.

Data teams need process documentation alongside model documentation, because governance questions start with how the synthetic dataset was generated, filtered, and validated.
Privacy claims around synthetic data are weaker without auditable generation rules, since organizations may be asked to prove that sensitive patterns were not simply reproduced in a new format.
Fairness reviews should cover the synthetic data pipeline as well as the trained model, because representational distortions can be introduced before model development even begins.
Compliance and risk teams will likely push for clearer provenance, testing, and accountability controls as synthetic data moves from experimentation into production use.

Daily BriefJul 2, 20263 min