sora模型真的很厉害吗?

AI技术经过去年一整年的突飞猛进般的发展，似乎现在进入一个瓶颈期。当然这个瓶颈期只是相对去年那样巨大的飞跃而言的。AI技术的发展还是非常迅速的，每一天都有新的进步。但是和前年底和去年一整年的那种让人惊掉下巴的进展相比，就只是在现有成就的基础上进行提高而已。

比如大语言模型似乎GPT4就是天花板了。去年一整年中开源的大语言模型进展进步神速，包括open i竞争对手谷歌的追赶也非常努力。但是能够在GPT4的基础上领先一大步的模型却是一个都没有最好成绩，最多也就是和GPT4打平而已。于是谷歌另辟蹊径，推出了一个双子星1.5的大模型，大模型的能力并没有比给openAI的GPT4有飞跃性的提升，唯一的大幅提升的地方就在于它的上下文长达一百万个token。而一个token大约就是对于英语中的一个单词，也就是说你完全可以把一股厚厚的大部头扔给他，让他来跟你读。读完之后总结其中的要点，然后你再对你感兴趣的细节进行提问。这在某种意义上实现了真正的量子速读，不得不承认这是一个非常巨大的进步。但这谈不上什么本质上的革命性的革新。

所以虽然谷歌一推出这个模型的时候引起了广泛的关注，但是它的竞争对手open AI手上，显然还有更多的牌。正当谷歌的新模型登上媒体和热搜的头条后，不久，open ai就推出了自己的一个名为sora的文本到视频的生成模型，并且放出了一些让人惊掉下巴的生成视频。其实AI视频生成的模型。在去年下半年已经开始快速发展，我也试用了像stability AI发布的视频生成模型。不过那些模型生成的视频客观来说，只能称作了胜于无。一个是生成的视频，时长非常短，都在十秒左右。另一个就是画面的动作幅度非常小，有时候就是简单的拉近拉远和旋转，这样的效果。和互联网早期的flash动画相比并没有太大的提高。但是这个sora图形模型就让人不可思议了，它可以生成长达一分钟的视频。从它放出的片段来看，视频效果几乎可以以假乱真。

通过玩AI生成图片的经验，AI生成的画面在细节上多多少少都是有一些瑕疵的。当然这些发布出来的视频片段都是经过精挑细选的，从大量生成的片段中选出来效果最好的那一个。但也足够让人惊讶，一个是时间特别长。另一个就是镜头运动的幅度大，动作自然，甚至还带有一帧的物理效果。由于现在的open AI已经变成了完完全全的close AI.所以这个模型的基础细节并不为人所知。据网上有很多猜测说这个视频生成模型是一个巨大的进步。因为它实现了所谓理想中的世界模型，就是关于人类大脑运作方式的一种理论。由于我们所看到的所感受到的并不是外在世界所直接表现出来的，而是通过神经细胞转换成刺激信号，在由大脑进行信息处理之后，展现给我们意识的。也就是说，我们的大脑中存在一个关于外部世界的模型。而我们的意识就是在观察这个模型。这个这个模型能够产出如此令人惊叹的视频，并且可以模拟现实世界中的物理效果，说明在这个模型当中也存在一个世界模型。

当然这个说法也许可能更过于乐观了。在放出来的一些视频中，也还是看到它还是存在一些瑕疵，说明他模型本身并不理解这个世界的运行方式。比如，在一段视频中，一位时尚女郎走过满是霓虹灯的夜晚城市那段视频粗看上去非常逼真。有如真实拍摄的一般，但是仔细观察，就会发现建筑之间的透视关系完全是乱七八糟的。还有一个sora生产的视频一只巨大的可爱的橡皮鸭子，走过一座城市的街道，同时还有两个人从他的脚下经过，但是几秒钟之后，那两个人就从鸭子的脚底下消失了，好像鸭子把他们踩没了一样。

不过不管怎样，如果这项技术能够得到广泛应用，最好是能够激励其他公司，比如像facebook这样的开发出开源的替代品，那将是一项非常伟大的，足以改变人类世界的技术。因为这就意味着人人都可以在自己的电脑上创作影视作品。而这个行业，目前来说是门槛非常高的，一个是需要大量的资金。另一个在有些国家还存在着非常强的审查制度，这都大大限制了影视作品的创作自由。如果一旦人人掌握这项技术，那么门槛就变得非常低。审查制度也会因为应接不暇，过载，而名存实亡，那么将是人类精神和文化的一次巨大的飞跃。当然，那些在油管b站上做自媒体的up主的竞争将变得更加激烈了。

After a year of rapid development of AI technology, it seems that it has now entered a bottleneck period. Of course, this bottleneck is only a huge leap forward from last year. The development of AI technology is still very rapid, and new progress is made every day. But compared to the jaw-dropping progress made at the end of last year and the whole of last year, it is only an improvement on what has been achieved.

For example, large language models seem to be GPT4 the ceiling. Throughout the last year, the progress of open-source large language models has been rapid, including the hard work of Open i's competitor Google. However, none of the models that can take a big step ahead on the basis of GPT4 have the best results, and at most they are tied with GPT4. So Google took a different path and launched a large model of Gemini 1.5, the capabilities of the large model are not as great as that of GPT4 for openAI, and the only significant improvement is that its context is as long as one million tokens. And a token is about a word in English, which means that you can throw a thick tome at him and let him read it to you. After reading it, summarize the main points, and then you can ask questions about the details that interest you. This in a sense achieves true quantum speed reading, and I have to admit that this is a very huge step forward. But this is not fundamentally revolutionary.

So while Google attracted a lot of attention when it first launched this model, its competitor open AI clearly has more cards in its hands. Soon after Google's new model made headlines in the media and hot searches, Open AI launched its own text-to-video generative model called Sora, and released some jaw-dropping generative videos. In fact, the model generated by AI video. In the second half of last year, it started to grow rapidly, and I also tried out video generation models like the one released by Stability AI. However, objectively speaking, the videos generated by those models can only be called better than nothing. One is the generated video, which is very short in length, all around ten seconds. The other is that the range of action of the picture is very small, and sometimes it is simply to zoom in and farther away and rotate, so the effect is so good. Compared to the early flash animation of the Internet, it is not much improved. But this SORA graphic model is incredible, and it can generate videos up to a minute long. Judging from the footage it released, the video effect can almost be fake.

Through the experience of playing with AI-generated pictures, AI-generated pictures have some flaws in the details. Of course, these published video clips are carefully selected, and the best one is selected from the large number of generated clips. But it's also surprising enough that one is that it's been a long time. The other is that the camera movement is large, the action is natural, and there is even a physical effect of one frame. Since the current open AI has become a complete close AI, the basic details of this model are not known. According to a lot of speculation online, this video generation model is a huge step forward. Because it realizes the so-called ideal model of the world, which is a theory about how the human brain works. Because what we see and feel is not directly expressed by the external world, but is converted into stimulus signals by nerve cells, and then processed by the brain, and then shown to our consciousness. That is, there is a model about the external world in our brains. And our consciousness is looking at this model. The fact that this model can produce such amazing videos and simulate real-world physics shows that there is a world model within this model.

Of course, this statement may be more optimistic. In some of the videos released, it is still seen that it still has some flaws, indicating that his model itself does not understand how the world works. For example, in one video, a fashionable girl walks through a city full of neon lights at night, which looks very realistic. It's like a real thing, but if you look closely, you can see that the perspective relationship between the buildings is completely messy. There is also a video produced by Sora of a huge cute rubber duck walking through the streets of a city while two people pass under his feet, but after a few seconds, the two people disappear from under the duck's feet, as if the duck has trampled them over.

Either way, if this technology can be widely used, it would be better to inspire other companies, such as Facebook, to develop open source alternatives, which would be a great technology that could change the human world. Because it means that everyone can create movies and TV on their own computer. And this industry, at present, has a very high threshold, one is that it needs a lot of capital. On the other hand, there is a very strong censorship system in some countries, which greatly restricts the creative freedom of film and television works. If everyone gets their hands on the technology, the barrier to entry becomes very low. Censorship will also be overwhelmed and overloaded, and it will be a huge leap forward in the human spirit and culture. Of course, the competition for those up masters who do self-media on YouTube B station will become more intense.

Sort Order:

Trending

[-]

jswit (69) · last year

Upvoted! Thank you for supporting witness @jswit.

To turn off auto-reply, write a reply to this comment with "@jswit reply-off"
Delegate SP to jsup & receive daily upvote
Preserve your digital art with STEEM.NFT

$0.00

1 vote

aftdxml (25) · last year

科幻世界马上就要到了，哈哈😀

jimobutan (25) · last year

还没用上，感觉很强