The UK FCA is warning financial firms that synthetic data adoption needs to sit inside Model Risk Management, with clear accountability, auditability, and validation against real outcomes—not just “looks similar” statistics.
FCA urges firms to fold synthetic data into Model Risk Management
The UK Financial Conduct Authority (FCA) published a report on synthetic data in financial services that frames synthetic data generation and use as an MRM issue, not merely a data engineering choice. The report, informed by the FCA’s Synthetic Data Expert Group (SDEG), emphasizes that synthetic data can support privacy-preserving development and testing for ML/AI use cases, but introduces its own model risks that must be actively governed.
Key principles highlighted include accountability across the synthetic data lifecycle, fairness and reliability expectations, and transparency via documentation. The FCA’s message is operational: firms should integrate synthetic-data-specific controls into existing MRM frameworks so that ownership, sign-off, and escalation paths are as clear for synthetic datasets as they are for models deployed into decisioning processes.
- MRM scope expands: If your MRM currently stops at model code and traditional datasets, the FCA is signaling that synthetic data generators, configurations, and release processes belong in the same control plane.
- Governance must be provable: Auditability and “chain of evidence” documentation (how synthetic data was generated from real data, with methods and assumptions) becomes a compliance-ready artifact, not optional internal hygiene.
- Fairness is not automatic: Privacy gains don’t remove the need to test for bias and failure modes; synthetic data can reproduce or amplify real-world skews if governance and evaluation are weak.
“Simply demonstrating statistical similarity between synthetic and real data is inadequate” for confirming efficacy; the FCA points firms toward validation using real-world data.
The report also draws a bright line on validation: matching marginal distributions or summary statistics is not sufficient evidence that downstream models will behave correctly. The FCA points to a Train-Synthetic-Test-Real (TSTR) approach—train on synthetic data, then validate on a holdout set of real data—to reduce the risk that models learn synthetic artifacts that won’t survive contact with production conditions.
- TSTR becomes a practical baseline: Data teams should plan for real-data holdout access, approvals, and secure evaluation environments even when training is largely synthetic.
- Controls need to cover the generator: Treat the synthetic data generation process as a model with parameters, drift, and versioning—because it can change the risk profile of every model trained on its output.
- Evidence over aesthetics: Expect internal reviewers (and eventually regulators) to ask for outcome-based validation, not “synthetic fidelity” dashboards alone.
