CryptoForDay

Your daily dose of crypto news

Uncertainty about Sora’s Training Data Origins: OpenAI’s Mira Murati

2 min read

Uncertainty about Sora's Training Data Origins: OpenAI's Mira Murati

OpenAI’s Chief Technology Officer, Mira Murati, admitted in an interview with The Wall Street Journal that the data source of their upcoming video-generating AI model, Sora, is not clear. When asked about the data used to train Sora, Murati only provided vague responses. She mentioned that they used publicly available data and licensed data, but could not confirm if social media platforms like YouTube, Instagram, or Facebook were sources of the data.

Murati also mentioned OpenAI’s partnership with stock image company Shutterstock. Stern from the Journal asked if Shutterstock’s data could be used to train Sora, to which Murati declined to provide detailed information but did confirm that Shutterstock data was indeed used.

AI models, including Sora, rely on training data sets to learn patterns, make predictions, and understand language. Murati has been a key figure at OpenAI since 2018, overseeing popular projects such as DALL-E 3, Whisper, and GPT-4.

OpenAI has faced legal actions related to their AI models’ training data. In one lawsuit, authors Sarah Silverman, Richard Kadrey, and Christopher Golden accused OpenAI of generating summaries of their copyrighted works using ChatGPT. The New York Times also filed a copyright infringement complaint against Microsoft and OpenAI, alleging the unauthorized use of the newspaper’s content to train AI chatbots. A class-action lawsuit in California claims that OpenAI scraped private user information without consent to train ChatGPT.

While OpenAI’s CTO remained vague about the specific data sources for Sora, it is evident that the company has encountered legal challenges regarding the use of copyrighted content and user data in training their AI models.

Leave a Reply

Copyright © All rights reserved.