What is Synthetic Data
Artificially created training data
Synthetic Data is artificially generated data that mimics real data while preserving its statistical properties and structure.
Advantages
- Privacy — no risk of personal data leakage
- Scalability — can create any volume of data
- Class balance — easy to eliminate dataset imbalance
- Rare scenarios — modeling edge cases
Generation Methods
- Statistical models — based on distributions
- GAN — generative adversarial networks
- VAE — variational autoencoders
- Simulations — physical modeling
Applications
- Training ML models with limited data
- Testing data processing systems
- Application development and debugging
- Augmenting existing datasets