MIT Removes AI Training Dataset Over Racist Concerns

MIT has removed a massive dataset after finding it contained racist, misogynistic terms and offensive images.

Artificial intelligence (AI) and machine learning (ML) systems use datasets as training data. MIT created the Tiny Images dataset, which contained some 80 million images.

In an open letter, Bill Freeman and Antonio Torralba, both professors at MIT, as well as NYU professor Rob Fergus, outlined issues they became aware of, and the steps they took to resolve them.

“It has been brought to our attention that the Tiny Images dataset contains some derogatory terms as categories and offensive images,” write the professors. “This was a consequence of the automated data collection procedure that relied on nouns from WordNet. We are greatly concerned by this and apologize to those who may have been affected.

“The dataset is too large (80 million images) and the images are so small (32 x 32 pixels) that it can be difficult for people to visually recognize its content. Therefore, manual inspection, even if feasible, will not guarantee that offensive images can be completely removed.

“We therefore have decided to formally withdraw the dataset. It has been taken offline and it will not be put back online. We ask the community to refrain from using it in future and also delete any existing copies of the dataset that may have been downloaded.”

This has been an ongoing issue with AI and ML training data, with some experts warning that it is far too easy for these systems to inadvertently develop biases based on the data. With their announcement, it appears MIT is certainly doing their share to try to rectify that issue.

MIT Removes AI Training Dataset Over Racist Concerns

More posts

JPMorgan CEO Jamie Dimon Working on First Republic Rescue Plan

Google CEO: ‘Things Will Go Wrong’ With Bard AI

Microsoft Brings DALL-E AI Image Creator to Bing

Oracle Releases Java 20

Amazon Is Shutting DPReview.com

Get Ready for Another Nvidia GPU Crunch

Google CEO Sundar Pichai Accused of Intentionally Deleting Communications

Learn Linux TV Releases Ubuntu Flatpak Remix Distro

FBI & DOJ Investigating ByteDance & TikTok’s Surveillance of Journalists

Text is a Great Way to Increase Your Engagement with Consumers, Says Zipwhip CMO

Test Twitter 2

Twitter Test

FTC and 17 States Sue Amazon For Alleged Antitrust Violations