SeedTable highlights recent funding for three synthetic data vendors targeting VR training data, fraud/risk modeling, and privacy-focused data-sharing APIs. The common thread: synthetic data is being positioned as a practical path to faster model development without exposing sensitive records under tightening compliance expectations.
Sky Engine AI, Hazy, and DataGen: funding rounds tied to synthetic data use cases
SeedTable’s roundup of synthetic data startups spotlights three companies and their reported funding totals as they scale different privacy-oriented data generation products. The list includes Sky Engine AI ($11.1M), Hazy ($28.3M), and DataGen Technologies ($72M), framed as examples of synthetic data being used to expand AI development while reducing exposure of sensitive source records.
SeedTable links each company to a specific application area: Sky Engine AI to deep learning for virtual reality (VR) and computer vision; Hazy to statistically controlled synthetic data for fraud detection and risk modeling; and DataGen Technologies to APIs for anonymizing and securely sharing data. While the write-up is high level, the throughline is that synthetic data is increasingly being marketed as both an acceleration lever (more training data, faster iteration) and a governance lever (lower privacy and compliance risk when sharing or analyzing data).
- Vendor evaluation is shifting from “can you generate synthetic data?” to “can you meet a domain KPI?” VR/computer vision, fraud/risk, and privacy APIs have different failure modes—data teams should ask for task-level validation (model lift, detection performance, error profiles), not just distributional similarity claims.
- Synthetic data is being sold as a compliance enabler, but the burden of proof still lands on the buyer. If you plan to use synthetic datasets for sharing, testing, or model training, you’ll still need internal controls: privacy risk assessment, documentation, and clear rules for when synthetic data is acceptable versus when real data is required.
- APIs for “anonymize and share” raise integration and governance questions. Treat synthetic data generation as part of your data pipeline: versioning, lineage, access control, and reproducibility matter—especially when synthetic outputs are used downstream for audits, risk models, or regulated reporting.
