News

Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models.
Because AI models cannot effectively train themselves on their own output, known as synthetic data, they require the regular infusion of new training data to evolve and maintain integrity.