Definition

Differential privacy is a mathematical privacy framework that limits the information any single individual's data contributes to a published dataset or model output, providing a formal, quantifiable privacy guarantee.

Key Takeaways

  • Formalized by Dwork et al. (2006); the standard framework for formal privacy guarantees.
  • A mechanism is ε-differentially private if the probability of any output changes by at most e^ε when any single record is added or removed.
  • DP-SGD extends differential privacy to neural network training.
  • Smaller ε = stronger privacy; trade-off exists between privacy and model utility.

Differential Privacy — Definition and Explained

Differential privacy provides a mathematical guarantee that individual records cannot be identified from published datasets or model outputs. Learn the definition, epsilon parameter, DP-SGD, and applications to AI.

DP-SGD and Neural Network Training

DP-SGD (Differentially Private Stochastic Gradient Descent), introduced by Abadi et al. (2016), applies differential privacy to neural network training. It clips per-sample gradients to bound sensitivity, then adds calibrated Gaussian noise before each parameter update. DP-SGD is used to train models on sensitive data while providing a formal ε-differential privacy guarantee over the training set.

Differential Privacy and Synthetic Data

Synthetic data generation can be combined with differential privacy to produce datasets with formal privacy guarantees. PATE-GAN (Jordon et al., 2019) is a foundational approach combining GAN-based generation with DP. However, DP-constrained synthetic data generation typically introduces a utility cost — higher fidelity synthetic data generally requires larger ε budgets.