OpenAI Launches Sora: A Groundbreaking Text-to-Video AI Model
Just when Google announced its next-genGemini 1.5 Promodel, OpenAI rained on Google’s parade with the surprise announcement of Sora, a breakthrough text-to-video AI model. The new video generation model, Sora, is different from anything we have seen so far in the AI industry. From the examples we’ve seen, video generation models like Runway’s Gen-2 and Pika pale in comparison to the Sora model. Here is everything you need to know about OpenAI’s new Sora model.
Sora Can Generate Videos Up to 1 Minute
OpenAI’s text-to-video AI model, Sora, can generate highlydetailed videos (up to 1080p)from textual prompts. It follows user prompts extremely well and simulates the physical world in motion. The most impressive part is that Sora can generate AI videos up to one minute, which is far longer than existing text-to-video models which generate videos up to three or four seconds.Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”pic.twitter.com/0JzpwPUGPB— OpenAI (@OpenAI)February 15, 2024
Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”pic.twitter.com/0JzpwPUGPB— OpenAI (@OpenAI)February 15, 2024
OpenAI has showcased manyvisual examplesto demonstrate Sora’s powerful capability. The ChatGPT maker says Sora has a deep understanding of language and can generate “compelling characters that express vibrant emotions“. It can also create several different shots in a single video with characters and scenes persisting throughout the video.
That said, Sora has some deficiencies too. Currently, itdoesn’t understand the physicsof thereal world very well. OpenAI explains, “A person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark“.
As for the model architecture, OpenAI says Sora is adiffusion modelbuilt on the transformer architecture. It uses the recaptioning technique introduced withDall -E 3that generates a highly descriptive prompt from a sample user prompt. Apart from text-to-video generation, Sora can also create videos from still images, animate them, and extend the frame in a video format.My take on Open AI Sora:If you are going to create a TON of HQ video from different angles, you need to simulate it. There are a lot of things though that lead me to believe UE5 is being used in part to create the training data.A 🧵— Ralph Brooks (@ralphbrooks)February 15, 2024
My take on Open AI Sora:If you are going to create a TON of HQ video from different angles, you need to simulate it. There are a lot of things though that lead me to believe UE5 is being used in part to create the training data.A 🧵— Ralph Brooks (@ralphbrooks)February 15, 2024
Looking at the breathtaking videos generated using the Sora model, many experts believe that Sora might be trained on synthetically generated data fromUnreal Engine 5given the similarities with UE5 simulations. Sora-generated videos don’t have the usual distortion of hands and characters that we generally see on other diffusion models. It may also be using Neural Radiance Field (NeRF) to generate 3D scenes from 2D images.
Whatever the case, it seems OpenAI has made another breakthrough with Sora, and it’s palpable from OpenAI’s ending remarks on itsblog, stressing on achieving AGI.
Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.
Sora isnot available for regular usersto try at the moment. Currently, OpenAI is red-teaming with experts to evaluate the model for harms and risks. The company is also giving access to Sora to several filmmakers, designers, and artists to get feedback and improve the model before a public release.
Arjun Sha
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.
Add new comment
Name
Email ID
Δ
01
02
03
04
05