Evaluating Synthetic Data — The Million Dollar Question
towardsdatascience.com·8h
Flag this post

synthetic data generation, we typically create a model for our real (or ‘observed’) data, and then use this model to generate synthetic data. This observed data is usually compiled from real world experiences, such as measurements of the physical characteristics of irises or details about individuals who have defaulted on credit or acquired some medical condition. We can think of the observed data as having come from some ‘parent distribution’ — the true underlying distribution from which the observed data is a random sample. Of course, we never know this parent distribution — it must be estimated, and this is the purpose of our model.

But if our model can produce synthetic data that can be considered to be a random sample from the same parent distribution, then we’ve hit the jackpo…

Similar Posts

Loading similar posts...