New York’s AI ad labels, and Clarifai’s OkCupid data purge

Two stories, one theme: AI is moving from experimentation into regulated public use, and the data trail behind model training is getting harder to ignore.

Ads in New York must now label AI-generated “synthetic performers”

New York has put a new disclosure rule into effect for advertisements that feature AI-generated people. Under the law, those digital figures must be labeled as “synthetic performers,” making the use of generated humans visible to viewers rather than buried in production notes. The measure gives brands, agencies, and publishers a bright-line requirement at the point where AI imagery reaches the public, rather than treating disclosure as a voluntary best practice.

The policy lands as marketing teams increasingly use generative tools to produce photorealistic people for social ads, video spots, and campaign assets. For compliance teams, the practical shift is that synthetic talent is now a regulated representation issue, not just a creative production choice. It also gives regulators a clearer standard for when AI-generated likenesses cross from internal experimentation into consumer-facing media.

Marketing teams need a disclosure workflow for AI-generated talent before launch, because labeling cannot be left to a final legal check once assets are already in circulation.
Creative approval now has a compliance layer, which means asset metadata, trafficking instructions, and version control need to capture whether a depicted person is synthetic.
Brands operating across states should expect disclosure rules to diverge, raising the cost of reusing one campaign package everywhere without jurisdiction-specific review.

Clarifai deletes 3 million OkCupid photos after FTC scrutiny

Clarifai deleted 3 million photos that OkCupid had provided for facial recognition training, according to the TechCrunch report. The deletion followed FTC scrutiny over whether the images had been used without proper authorization, putting data provenance—not just model output—at the center of the case. The headline number matters because it shows how large a training corpus can become before basic rights questions are fully resolved.

For AI teams, the episode is a reminder that downstream model utility does not cure upstream collection problems. If training data was collected or repurposed without a clear legal basis, companies may be forced to unwind part of the pipeline long after ingestion and model development. That creates operational cost, legal exposure, and hard questions about what happens to models, benchmarks, and derived systems built on disputed data.

Data governance teams need auditable provenance records for training sets, including source, consent basis, permitted use, and retention terms, because regulators will ask for more than a vendor assurance.
AI vendors should assume investigators will examine not only what a model can do, but whether the underlying data was lawfully obtained and used for the stated purpose.
Deleting millions of records after the fact is operationally expensive and can undermine confidence in downstream models, especially where facial recognition and other sensitive biometric uses are involved.