Privacy-preserving AI is moving from best practice to policy expectation. Today's signals span federal guidance and state-level rules that put training data disclosure, model governance, and synthetic data controls squarely on the operating agenda.
AI Guidance on Data Handling and Privacy-Preserving Techniques
The Centers for Medicare & Medicaid Services says AI development should use privacy-preserving techniques such as federated learning and homomorphic encryption when handling sensitive data. The guidance also points to synthetic data as a way to reduce exposure tied to Personally Identifiable Information and Protected Health Information in model development and testing.
For teams working with healthcare or other regulated datasets, the message is straightforward: privacy controls need to be designed into the pipeline, not added after deployment. CMS is effectively reinforcing that synthetic data belongs in the compliance toolkit alongside technical safeguards that limit direct access to raw records.
- Healthcare-adjacent AI teams now have clearer support for using synthetic data to reduce PII and PHI handling risk.
- Privacy-preserving methods such as federated learning and homomorphic encryption are being framed as practical governance controls, not just research concepts.
- Vendors selling AI into regulated environments should expect buyers to ask how synthetic data and privacy-preserving training are implemented.
California's Assembly Bill 2013 Mandates AI Training Data Transparency
California's Assembly Bill 2013 requires developers of generative AI systems to publicly disclose information about the data used to train their models, with the requirement taking effect on January 1, 2026. The law is aimed at improving transparency and accountability around how generative systems are built.
That matters well beyond California. Public disclosure requirements can force developers to tighten documentation on data sourcing, licensing, provenance, and risk review. For synthetic data teams, the bill raises a practical question: can you explain not only what data was used, but how it was transformed, filtered, or generated before training?
- Training data inventories and documentation processes are becoming operational necessities, not optional governance artifacts.
- Generative AI developers may need stronger provenance tracking for both original and synthetic training inputs.
- Transparency mandates could expose weak data sourcing practices before they become legal or reputational problems.
Texas Enacts the Responsible Artificial Intelligence Governance Act (TRAIGA)
Texas has enacted the Responsible Artificial Intelligence Governance Act, or TRAIGA, with an effective date of January 1, 2026. The act regulates AI development and deployment and prohibits systems intended to incite harm or engage in unlawful discrimination.
While the measure is broader than data handling alone, it adds another state-level compliance layer for teams building or deploying AI systems in the U.S. market. The immediate takeaway is that governance work now has to connect model intent, deployment context, and risk controls, especially where outputs could affect protected groups or public safety.
- AI governance is becoming state-specific, which complicates deployment for companies operating across multiple jurisdictions.
- Teams will need auditable controls showing that systems are not designed for harmful or discriminatory use cases.
- Model review can no longer stop at accuracy; intended use, misuse risk, and downstream impact are now part of the compliance scope.
