GPT-4 是一个推理引擎

核心观点

本文的核心观点是：GPT-4 等大型语言模型的本质是推理引擎，而非知识数据库。 它们的推理能力很强，但受限于其知识的匮乏。未来人工智能的进步，不仅依赖于推理能力的提升，更依赖于获取和利用知识的能力，特别是构建强大的知识库和有效利用个人知识库。

关键论据

类比论证： 文章以天文学家 Lowell 错把火星上的陨石坑当作运河的例子，说明了推理和知识的重要性。没有知识的推理会导致错误的结论，即使推理过程本身是合理的。
GPT-4 的局限性： 作者通过 GPT-4 无法准确回答“你是谁”这个问题，以及在连接网络搜索后答案准确性大幅提升的例子，证明了 GPT-4 的推理能力受限于其知识库。
向量数据库的重要性： 文章指出向量数据库是存储和检索知识的关键工具，并预测向量数据库将在未来的人工智能发展中发挥重要作用。
个人知识库的价值： 作者认为，拥有精心整理的个人知识库的人，能够更好地利用 AI 工具，获得更个性化和更有效的体验。

关键概念

推理引擎 (Reasoning Engine)： 指能够根据逻辑规则和已有知识进行推理和推断的系统，例如 GPT-4。
知识数据库 (Knowledge Database)： 存储大量结构化信息的数据库，可以被 AI 模型用来获取知识。
向量数据库 (Vector Database)： 一种专门用于存储和检索向量的数据库，常用于 AI 应用中存储和检索文本、图像等信息。
个人知识库 (Personal Knowledge Base)： 个人收集、整理和存储的知识库，可以包含笔记、文章、书籍等信息。

未来展望

知识库建设： 构建高质量、易于访问的知识库对于提升 AI 性能至关重要。
个人知识管理： 有效地组织和管理个人知识，将成为 AI 时代的一项重要技能。
AI 工具的个性化： AI 工具将能够更好地利用个人知识库，提供更个性化的服务和体验。

文章中的金句：

“知识没有推理是惰性的——你无法用它做任何事情。但没有知识的推理会变成令人信服、自信的虚构。” 这句话点明了知识和推理之间的辩证关系，强调了知识的重要性，也揭示了缺乏知识的推理可能带来的风险。
“GPT 模型实际上是推理引擎，而不是知识数据库。” 这句话指出了人们对 GPT 模型的常见误解，强调了其推理能力的本质，为理解 GPT 模型的优势和局限性提供了关键视角。
“当今人工智能模型的性能受到其知识匮乏的限制。” 这句话点明了制约当前 AI 发展的一个重要瓶颈，也为未来的研究方向指明了道路。
“如果你想进行一项投资，以衡量构建人工智能的公司整体的成功，那么一个明智的举措就是投资向量数据库提供商，或者投资它们的一揽子股票。” 这句话体现了作者对向量数据库在未来 AI 发展中的重要性的预判，也为投资者提供了一个潜在的投资方向。
“如果你花了很多时间收集和整理你自己的个人笔记、文章、书籍和亮点，那就相当于在欧佩克危机期间在你的卧室里放了一个装满油的油桶。” 这句话用一个生动的比喻，说明了个人知识库在 AI 时代的重要价值，强调了个人知识管理的重要性。
“它的答案在很大程度上取决于我们提供给它进行分析的信息。它的强大程度取决于它的起点。” 这句话再次强调了输入信息对 AI 模型的重要性，指出 AI 的能力和表现最终取决于它所能获取和利用的知识。

总结

GPT-4 等大型语言模型拥有强大的推理能力，但其性能受限于知识的匮乏。未来，我们需要更加重视知识库的建设和个人知识管理，才能充分发挥 AI 的潜力，创造更智能、更个性化的未来。

原文：

GPT-4 Is a Reasoning Engine (2023)
Reason is only as good as the information we give it
by Dan Shipper

Large language models aren’t always right. Their strength—for now—is mimicry and prediction rather than accuracy. But as Dan Shipper writes in this essay from March 2023, these models are only as good as the knowledge they have access to. With OpenAI’s Dev Day set for Oct. 1 and Every taking a quarterly Think Week, we thought this was a great time to republish Dan’s essay about AI and reasoning.

In 1894, a Boston-based astronomer named Percivel Lowell found intelligent life on Mars.

Looking through a telescope from his private observatory he observed dark straight lines running across the Martian surface. He believed these lines to be evidence of canals built by an advanced but struggling alien civilization trying to tap water from the polar ice caps.

He spent years making intricate drawings of these lines, and his findings captured public imagination at the time. But you’ve never heard of him because he turned out to be dead wrong.

In the 1960s, NASA's Mariner missions captured high-resolution images of Mars, revealing that these "canals" were nothing more than an optical illusion caused by the distribution of craters on the planet's surface. With the low resolution available to his telescope at the time, these craters looked to Lowell like straight lines which, through a chain of reasoning, he theorized to be canals built by intelligent life.

Lowell’s story shows that there are at least two important components to thinking: reasoning and knowledge. Knowledge without reasoning is inert—you can’t do anything with it. But reasoning without knowledge can turn into compelling, confident fabrication.

Interestingly, this dichotomy isn’t limited to human cognition. It’s also a key thing that people fundamentally miss about AI.

Even though our AI models were trained by reading the whole internet, that training mostly enhances their reasoning abilities, not how much they know. And so, the performance of today’s AI models is constrained by their lack of knowledge.

I saw Sam Altman speak at a small Sequoia Capital event in San Francisco earlier in March 2023, and he emphasized this exact point: GPT models are actually reasoning engines, not knowledge databases.

This is crucial to understand because it predicts that advances in the usefulness of AI will come from advances in its ability to access the right knowledge at the right time—not just from advances in its reasoning powers.

Knowledge and reasoning in GPT models
Here’s an example to illustrate this point. GPT-4 is the most advanced model on the market today (note: as of this writing in March 2023). Its reasoning capabilities are so good that it can get a 5 on the AP Bio exam. But if I ask it who I am it says the following:

That’s close to being right except for one big problem…I’m the cofounder of a few companies, but neither of them is Superhuman or Fireflies.
AI critics will be quick to say that this proves GPT-4 is nothing more than a stochastic parrot, and that its results should be dismissed offhand. But they’re wrong. Its performance improves dramatically the second it has access to the right information.

For example, I have access to a version of ChatGPT that can use web searches to ground its answers with what it finds on the internet.

In other words, instead of using its reasoning capabilities to come up with a theoretically plausible answer, it does web research to create a knowledge base for itself. It then analyzes the collected information and distills a more accurate answer:

Now, that’s pretty good! The underlying model is the same, but the answer improves significantly because it has the right information to reason over.
What’s going on here? GPT-4’s architecture is not public, but we can make some educated guesses based on previous models that have been released.

When GPT-4 was trained, it was fed a large portion of the available material on the internet. Training transformed that data into a statistical model that is very good at, given a string of words, knowing which words should follow from it—this is called next token prediction.

However, the kind of “knowledge” contained in this statistical model is fuzzy and inexplicit. The model doesn’t have any sort of long-term memory or way to look up the information it has seen—it only remembers what it encountered in its training set in the form of a statistical model.

When it encounters my name it uses this model to make an educated guess about who I am. It draws a conclusion that’s in the ballpark of being right, but is completely wrong in its details because it doesn’t have any explicit way to look up the answer.

But when GPT-4 is hooked up to the internet (or anything that acts like a database) it doesn’t have to rely on its fuzzy statistical understanding. Instead, it can retrieve explicit facts like, “Dan Shipper is the co-founder of Every” and use that to create its answer.

So, what does this mean for the future? I think there are at least two interesting conclusions:

Knowledge databases are as important to AI progress as foundational models.
People who organize, store, and catalog their own thinking and reading will have a leg up in an AI-driven world. They can make those resources available to the model and use it to enhance the intelligence and relevance of its responses.
Let’s take these one at a time.

Knowledge databases are surprisingly important
When it comes to knowledge you want to be able to store a lot of it, and you want to be able to find the right piece of knowledge at the right time. In AI this is typically done with a vector database.

Vector databases allow you to easily index and store large amounts of information, and then quickly query for similar pieces of information to give to your model when you need to. They’re so common in AI apps that it’s likely almost every demo you’ve tried over the last few months has included a vector database for some part of their functionality.

In fact, if you want to make an investment that indexes the success of companies building in AI as a whole, one smart move would be to invest in a vector database provider, or a basket of them. (Alternatives might be to invest in OpenAI, or a basket of large cap software companies like Microsoft and Google that build AI, or chipmakers like NVIDIA that build the GPUs that AIs run on.)

Smarter investors than me seem to agree. Pinecone, the most popular vector database, raised funds at a $700 million valuation. Smaller alternatives like Weaviate (which raised $50 million at a $200 million valuation) and Chroma (which raised $18 million at a $75 million valuation) aren’t far behind.

Interestingly, though, most of these vector databases were originally built before the large language model craze. Vectors are incredibly important for all sorts of previous-generation machine learning algorithms like recommendation systems. As a result, the database tooling from providers like Pinecone isn’t purpose built for large language models like ChatGPT.

We’re already seeing newer alternatives springing up that wrap some business logic around the database layer to make it easier for AI developers to do common tasks. Some of these are developer libraries like Langchain and LlamaIndex. And some seem to be more fully featured developer tools like Metal and Baseplate. Just like Pinecone, they are also likely to raise a lot of money, or already have! AI’s advancement is a rain dance that calls forth capital from Patagonia vest-wearing angels.

I find this very exciting because it will make it a lot easier to make AI apps. There’s a tremendous amount of boilerplate code being written to take, say, a PDF or a webpage with interesting information on it, parse it, break it into chunks, store it, and retrieve it for use in AI apps. The more that can happen with just a line or two of code, the better.

When I talk to people about vector databases—even people who have been following AI closely—they typically say, “What’s that?” I think, over time, that will change significantly as we start to understand how important it is for these models to have access to the knowledge that they contain.

Vector databases are how information gets stored and made available to AI applications. One place that I think they’ll get a lot of valuable information from is private, personal knowledge bases.

Private repositories of knowledge are going to be very valuable
People have been saying that data is the new oil for a long time. But I do think, in this case, if you’ve spent a lot of time collecting and curating your own personal set of notes, articles, books, and highlights it’ll be the equivalent of having a topped-off oil drum in your bedroom during an OPEC crisis.

Why? It’s expensive and time consuming to find information that’s relevant to the things you think about. Even if you give AI access to a search engine so it can make queries to find the right information—it’ll cost you money and time.

If, instead, you’ve spent a lifetime gathering and curating information that’s important to you, you can customize your AI experience so it’s more useful to you right off the bat.

Apps like Readwise, Pocket, and Instapaper that allow you to store articles you’ve read (or articles you want to read) are going to be a gold mine to the extent that they hook up to AI tools. They’ll be extra useful because they record the articles you explicitly bookmarked and read, this will make it easier for AI tools to know which pieces of information to weight in their responses.

But the use of personal knowledge databases will get weirder and more advanced than this.

For example, Rewind is a tool that sits on your computer and records everything you see and everything you type. It’s all stored locally for privacy purposes, and you can already hook it up to ChatGPT.

In one of their demos they show a user asking, “What did I do last week?” The AI is able to summarize all of the tasks they did on their computer:

For my part, I’ve installed Rewind, and I’ve been playing around with building little tools to save more of what I encounter online. I made a little app called Tend that sits open on my browser all day, and I can feed it any articles with interesting information for indexing and storage. Later, I’ll build a little ChatGPT plugin to give me access to all the information I saved with it.
Wrapping up
When we talk about the future of AI, we tend to focus on its output. Given a prompt, it can think through a complex problem, compose an essay, or create a new scientific breakthrough without much human involvement.

We tend to under-appreciate the significance of the input—what information we feed it to produce those results. Its answers are largely dependent on the information we make available to it for analysis. It’s only as powerful as its starting point.

We don’t pay enough attention to the limits of its knowledge—how much information is locked away, inaccessible to these systems. We also forget how expensive (both in time and in compute) it is to crawl through information sources and find relevant facts. And finally, we underestimate the difficulty of surfacing relevant pieces of information for the model at the right time.

But solving these sorts of problems is just as fundamental as solving for the reasoning capabilities of the underlying models. I’m excited to see what people build.