California AB 2013 and Frontier Model Transparency: What Changed
California's AB 2013 forced training-data disclosure for frontier model providers. What is now public, what is not, and what other states are following.
What AB 2013 Requires
California's AB 2013, signed in 2024 and enforced beginning in 2026, requires developers of generative AI models trained or substantially modified in California to publish a high-level summary of training data used to develop their models. The bill was crafted narrowly compared to broader proposals (like the vetoed SB 1047) but has had outsized effect because California is where most US frontier-model developers operate.
This piece walks through what AB 2013 actually requires, what providers have published, and what remains unclear.
The Specific Disclosures
flowchart TB
AB[AB 2013] --> D1[Sources of data]
AB --> D2[Whether copyrighted material]
AB --> D3[Whether personal information]
AB --> D4[Whether collected via web crawl]
AB --> D5[Whether purchased or licensed]
AB --> D6[Whether synthetic]
For each model, providers must publish on their website:
- High-level data sources
- Whether the data set includes copyrighted, trademarked, or patented material
- Whether the data set includes personal information
- Whether collected via web crawling
- Whether purchased or licensed from third parties
- Whether synthetic
- Date the data was collected
- Time period the data covers
- Whether modifications were made post-collection
The level of detail is "summary," not item-by-item disclosure. Importantly, the law does not require revealing trade secrets — a key concession during drafting.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
What Got Published
Frontier providers (OpenAI, Anthropic, Google, Meta, others) have updated their model documentation to include AB 2013 sections. The published summaries are typically:
- A few paragraphs describing categories of sources (web crawl, licensed datasets, code repositories, synthetic data)
- Acknowledgment that copyrighted material is in the dataset
- Statement that personal data may be in publicly-accessible web data
- Date ranges
- High-level filtering / curation notes
These summaries are generally less detailed than the EU AI Act training-data summaries that the same providers also publish, because EU and CA expectations differ.
What's Still Held Back
- Specific dataset names (most are categorical only: "licensed news data" not "licensed news data from XYZ")
- Proportion or weight of each source
- Internal evaluation datasets used for training-time decisions
- Synthetic data generation pipelines (treated as trade secrets)
The Enforcement Picture
AB 2013 is civil; the California AG enforces. By April 2026 there have been no public enforcement actions, but the AG has solicited public comments on whether disclosures are adequate. Several civil-society groups have filed complaints arguing that several public summaries are too sparse.
The interpretive direction is unsettled: if the AG decides "high-level" requires more granularity than providers currently give, future updates will need more detail.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
What Other States Are Following
flowchart TD
CA[California AB 2013<br/>signed 2024, enforced 2026] --> Spread
Spread --> CO[Colorado AI Act 2024]
Spread --> NY[New York AI deepfake law 2025]
Spread --> TX[Texas data-disclosure proposals]
Spread --> WA[Washington follow-on bill]
Several states have passed or are considering similar transparency provisions. The patchwork is real; most providers have chosen to publish a single global disclosure that satisfies the strictest applicable jurisdiction.
Federal Picture
There is no federal training-data disclosure law in 2026. Several proposals exist (including provisions in pending AI bills) but none has passed. The practical pressure from EU AI Act and California AB 2013 has produced de facto federal transparency without explicit federal action.
What This Means for Open-Source Providers
Open-source frontier model providers (Llama 4, Mistral, DeepSeek for the parts shipped from CA) are in scope. Their AB 2013 disclosures tend to be substantially more detailed than closed-model providers, partly because their training process is more visible anyway.
The interesting outcome: open-source providers are setting a higher transparency bar that may pressure closed-model providers in future regulatory cycles.
Practical Steps
If you train a model in California in 2026:
- Identify whether you cross the "developer" threshold (commercial deployment + substantial training)
- Build the data-source summary using the AB 2013 categories
- Publish on your website with each release
- Maintain an internal data-source register for legal review
- Coordinate with EU AI Act compliance — overlapping but not identical
Sources
- California AB 2013 full text — https://leginfo.legislature.ca.gov
- California AG enforcement guidance — https://oag.ca.gov
- OpenAI training-data disclosures — https://openai.com
- Anthropic training-data disclosures — https://www.anthropic.com
- "State AI laws tracker" Future of Privacy Forum — https://fpf.org
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.