Uncertainty about Sora’s Training Data Origins: OpenAI’s Mira Murati
2 min readOpenAI’s Chief Technology Officer, Mira Murati, admitted in an interview with The Wall Street Journal that the data source of their upcoming video-generating AI model, Sora, is not clear. When asked about the data used to train Sora, Murati only provided vague responses. She mentioned that they used publicly available data and licensed data, but could not confirm if social media platforms like YouTube, Instagram, or Facebook were sources of the data.
Murati also mentioned OpenAI’s partnership with stock image company Shutterstock. Stern from the Journal asked if Shutterstock’s data could be used to train Sora, to which Murati declined to provide detailed information but did confirm that Shutterstock data was indeed used.
AI models, including Sora, rely on training data sets to learn patterns, make predictions, and understand language. Murati has been a key figure at OpenAI since 2018, overseeing popular projects such as DALL-E 3, Whisper, and GPT-4.
OpenAI has faced legal actions related to their AI models’ training data. In one lawsuit, authors Sarah Silverman, Richard Kadrey, and Christopher Golden accused OpenAI of generating summaries of their copyrighted works using ChatGPT. The New York Times also filed a copyright infringement complaint against Microsoft and OpenAI, alleging the unauthorized use of the newspaper’s content to train AI chatbots. A class-action lawsuit in California claims that OpenAI scraped private user information without consent to train ChatGPT.
While OpenAI’s CTO remained vague about the specific data sources for Sora, it is evident that the company has encountered legal challenges regarding the use of copyrighted content and user data in training their AI models.
It’s disappointing to see OpenAI’s CTO being evasive about the data sources used for Sora. Transparency should be a top priority! 🙅♂️👎
The fact that Murati chose not to disclose detailed information about Shutterstock’s involvement in training Sora is shady. OpenAI, please be more open with your users!
Overall, I remain intrigued by the story behind Sora, and I hope OpenAI continues to address these legal concerns transparently. Innovation and responsible AI development go hand in hand!
This is seriously concerning! OpenAI should be more transparent about their data sources for Sora.
AI models heavily rely on training data sets, so it’s unnerving that OpenAI’s CTO couldn’t provide clear answers about Sora’s data sources. What are they hiding? 🤷♀️🔎
However, it’s concerning to read about the legal challenges OpenAI has faced regarding training data. 📚 Copyright infringement allegations and privacy concerns certainly raise important questions. 🚫 It’s crucial for companies like OpenAI to ensure they follow ethical practices when utilizing copyrighted content and user data. ⚖️
Mira Murati is undeniably a brilliant mind behind various OpenAI projects, such as DALL-E 3, Whisper, and GPT-4. Their expertise and leadership have surely played a significant role in OpenAI’s success.
OpenAI’s legal challenges regarding copyrighted content and user data raise serious questions about their ethical practices. Transparency is crucial! 👀✋
The New York Times filing a copyright infringement complaint against OpenAI and Microsoft? OpenAI should have respected intellectual property rights!
How can we trust OpenAI if they don’t disclose the specific data sources for Sora? This raises serious doubts about their credibility.