Oracle has added a native synthetic data generation engine to Database 24ai, letting teams create realistic synthetic datasets directly in-database via SQL. The practical shift: faster test/training data creation without exporting sensitive production data into downstream AI and analytics workflows.
Oracle Database 24ai adds a built-in synthetic data engine
Oracle integrated a synthetic data generation engine into Database 24ai. The feature is designed to generate realistic synthetic datasets using SQL, positioning synthetic data as an in-database capability rather than an external tool or pipeline step.
Oracle says the engine uses AI to infer schema relationships and data distributions, aiming to make the resulting synthetic datasets behave like the original data while reducing exposure of sensitive production records during development, testing, and AI training.
- Less data movement, fewer leak paths: Generating synthetic data inside the database can reduce the need to copy production data into dev/test environments—one of the most common sources of accidental exposure.
- Faster iteration for AI and analytics teams: A SQL-level workflow lowers friction for spinning up representative datasets, which can shorten model prototyping and QA cycles when access to raw data is restricted.
- Privacy and compliance controls shift “left”: Embedding synthetic generation into the core data platform gives privacy engineers and governance teams a more enforceable control point than ad hoc scripts or one-off exports.
- New operational questions for data leads: Teams will still need validation practices (utility vs. privacy risk), dataset lineage, and clear policies on when synthetic data is acceptable for training, testing, or sharing.
