GDIT and AWS described a proof-of-concept that uses synthetic disability-claim records—built from public information and seeded with known fraud patterns—to train and test AI fraud detection without touching real claimant files. The subtext: synthetic data is being positioned as a practical workaround for government data access constraints like FOIA exposure, retention rules, and cross-agency sharing limits.
GDIT and AWS outline a synthetic-data POC for disability-claims fraud detection
General Dynamics Information Technology (GDIT) and AWS published a synthetic-data proof-of-concept aimed at training AI models to detect fraud in government disability claims. The approach synthesizes artificial claim records using publicly available information, then injects controlled fraud samples so teams can train and validate detection methods against known patterns.
The core claim is operational: agencies can develop and stress-test fraud analytics without requiring access to sensitive claimant data that may be hard to share or use for model development due to Freedom of Information Act (FOIA) considerations, records retention policies, or inter-agency data-sharing friction. The write-up also frames the work as part of a broader push to accelerate AI adoption in government for national security and efficiency goals, with system integrators and cloud providers packaging synthetic data as a modernization enabler.
- Data access is the bottleneck, not modeling. For many public-sector fraud programs, the limiting factor is lawful, auditable access to training data. Synthetic datasets built from public info plus injected fraud patterns offer a way to start model development when real records are restricted or slow to obtain.
- Better adversarial testing without privacy exposure. Seeding “known bad” patterns into synthetic claims supports repeatable red-team style evaluation (did the model catch the planted fraud?) without risking disclosure of real claimant details.
- Procurement signal for vendors. GDIT’s involvement suggests large integrators are willing to operationalize synthetic data inside federal workflows—an opening for specialized synthetic data tooling that plugs into established primes and cloud ecosystems.
- Compliance still needs definition. Even if synthetic data reduces direct exposure, teams will still need governance around provenance (public inputs), documentation of injected patterns, and validation that synthetic records don’t recreate sensitive attributes in ways that create re-identification or policy risk.
