US States Tighten AI Governance: California’s Training-Data Disclosures and Texas’s TRAIGA Framework

Two state-level efforts—California’s AB 2013 and Texas’s TRAIGA—signal where AI governance is heading: mandated transparency, defined prohibited uses, and new compliance obligations that will land on engineering and data teams.

This Week in One Paragraph

California and Texas have each advanced distinct governance approaches for AI systems that will matter to teams building or deploying generative and other AI capabilities in the US. California’s Assembly Bill 2013 requires developers of generative AI systems to publicly disclose information about the data used to train their models, with an effective date of January 1, 2026. Texas’s Responsible Artificial Intelligence Governance Act (TRAIGA), also effective January 1, 2026, sets a framework for AI development and deployment, prohibits harmful uses, and creates a Texas Artificial Intelligence Council. Together, these moves underscore a compliance trajectory: more documentation, more accountability, and more pressure to operationalize data lineage and governance well before 2026.

Top Takeaways

Public-facing training-data disclosure requirements are no longer theoretical in the US: California AB 2013 sets a concrete compliance deadline (January 1, 2026) for generative AI developers.
Texas’s TRAIGA pairs governance with enforcement posture by defining prohibited/harmful uses and establishing a state AI council—raising the likelihood of ongoing guidance and scrutiny.
Data teams should treat “what went into the model” as a product requirement: dataset inventories, provenance, licensing/permissions, and retention policies must be auditable and explainable.
Multi-state operations will face policy fragmentation: different disclosure formats, definitions of “harm,” and oversight mechanisms may force a “highest common denominator” governance program.
Synthetic data won’t automatically sidestep obligations: if synthetic datasets are derived from regulated or sensitive sources, teams still need defensible documentation and risk assessments.

California AB 2013: Training-data transparency becomes a deliverable

California’s Assembly Bill 2013 targets generative AI systems and mandates that developers publicly disclose information about the data used to train their models. The key operational detail is the effective date: January 1, 2026. That timeline matters because training-data disclosure is not a single document—it’s the output of a chain of governance work: data sourcing decisions, vendor contracts, dataset versioning, filtering steps, and documentation of what was included or excluded.

For teams shipping foundation models, fine-tuned models, or internal generative systems, AB 2013 should be read as a forcing function for end-to-end lineage. If your organization cannot reliably answer “which datasets trained this model version?” and “under what rights and constraints?”, the gap is not only legal—it’s an engineering and tooling gap. Even when the disclosure requirement is “information about the data” rather than raw datasets, generating a defensible public statement typically requires internal evidence that can survive audit and discovery.

Watch for emerging norms on what qualifies as sufficient “information about the data” (granularity, taxonomy, and whether summaries vs. dataset-level listings are expected).
Expect procurement and legal teams to push for stronger dataset vendor attestations, particularly around licensing and downstream disclosure readiness.

Texas TRAIGA: A state framework with prohibited uses and a new council

Texas’s Responsible Artificial Intelligence Governance Act (TRAIGA) is positioned as a broader framework for AI development and deployment. Per the source summary, it prohibits harmful uses and establishes the Texas Artificial Intelligence Council, with an effective date of January 1, 2026. The combination of “prohibited uses” plus a standing body is significant: it suggests governance won’t be a one-off statute, but an evolving interpretation environment where guidance, priorities, and enforcement attention can shift.

For compliance and product teams, the practical question is how “harmful uses” are defined and operationalized in controls: what risk assessments are required, what testing/monitoring is expected, and what documentation must be maintained to demonstrate good-faith compliance. For ML engineers, the work often lands in measurable safeguards—access controls, abuse monitoring, evaluation protocols, and incident response pathways. For data leads, it reinforces the need to map where sensitive attributes and high-impact use cases intersect with model behavior.

Track how the Texas Artificial Intelligence Council frames “harmful uses” and whether it publishes model governance guidance that becomes de facto standard for vendors selling into Texas.
Look for sector-specific pressure (public sector procurement, education, healthcare) where “framework + oversight body” tends to translate into checklists and contract clauses.

What this means for synthetic data programs

Neither item in the provided source text is a “synthetic data law,” but both influence synthetic data strategy because synthetic datasets are often used to reduce privacy risk, accelerate development, or enable sharing. If California requires public disclosure about training data, teams may lean harder on synthetic data to limit exposure to sensitive or proprietary sources. That can be sensible—but it increases the burden to document how synthetic data was generated, what it represents, and what source data (if any) it was derived from.

In practice, synthetic data governance needs to look like traditional data governance: provenance, purpose limitation, quality metrics, and clear separation between (a) fully synthetic data generated without direct dependence on regulated records and (b) synthetic data generated from sensitive inputs where re-identification or memorization concerns may still be relevant. If Texas’s TRAIGA emphasizes harmful uses, synthetic data won’t be a shield if downstream deployment creates harm; you still need model evaluation, monitoring, and controls.

Expect buyers to ask for “synthetic data spec sheets” that connect generation method, privacy properties, and intended use—especially for model training.
Watch for internal governance convergence: synthetic data review boards and model risk committees increasingly share the same artifacts (lineage, approvals, and monitoring plans).

How to prepare before 2026: the minimum viable compliance stack

With both measures pointing to January 1, 2026, the immediate takeaway is scheduling: governance work that touches data inventories and model lineage rarely compresses well at the end. A practical approach is to build a “minimum viable compliance stack” that can scale across jurisdictions: a canonical dataset registry, model registry with training runs linked to dataset versions, and a disclosure-ready narrative that can be adapted to different legal requirements.

For founders and platform owners, this is also a product question. If you sell AI capabilities, customers will increasingly demand evidence that you can (1) describe training inputs at a meaningful level and (2) demonstrate controls against prohibited or harmful uses. Treat disclosure and governance artifacts as part of your enterprise readiness—alongside security questionnaires and SOC 2. For privacy and compliance professionals, the opportunity is to turn “documentation debt” into a managed pipeline with owners, SLAs, and review cadence.

Look for “disclosure automation” features to show up in MLOps platforms: dataset lineage exports, training-data summaries, and policy mapping by jurisdiction.
Expect contract language to tighten around training data representations and warranties—especially for companies fine-tuning or hosting models for clients.