Apple says LLMs can't reason.

The thesis of this paper is that there's a lot of overfitting, pattern matching, and data leakage occurring in AI models up to GPT-4.0. I doubt that's the whole story because I'm seeing evidence of some abstraction happening in the problems I'm selecting for it. But there's certainly a lot of pattern matching, and it's probably the majority of the performance on these human tests.

But I have bad news: this is how the vast majority of humans operate when we think they're reasoning. I believe this is partially a product of our educational system, where we're optimized to push students through as rapidly as possible to achieve test performance, rather than training them extensively in the processes of frustration and discovery, seeking patterns, and working on generalization. This was part of what Common Core was supposed to fix, but they really screwed it up.

I'm not just asserting that most humans don't reason most of the time when we think they are. This is absolutely my experience as a tutor who constantly has to figure out how to get nominally very bright students high grades on the next test. The vast majority of students are not emotionally ready or willing to go through the process of thinking things through and figuring them out. They want procedures and may desire some degree of understanding to conceptually frame things and help remember them, but they see situations that require them to be inventive as abuse—the teacher being unfair.

In their defense, it is kind of unfair. Most teachers don't know how to coach kids to be discoverers and inventors. You need to role-model and teach the willingness to go through trial and error, do test cases, test the extremes, look for breaking points and exceptions, and give yourself latitude to speculate and try things out.

I do think that LLMs as they stand have utility. Being able to pattern match and carry this into new data sets that cover adjacent fields is still extremely important. We don't have enough competent manpower to cover all the nooks and crannies of existing material science, medical biology, drug discovery, synthetic pathways, etc. There's a vast amount of undiscovered science that doesn't really push the frontiers but fills in a lot of important questions that people never got to. In fact, it wouldn't surprise me if this was hundreds of times the amount of factual knowledge that we have right now. It will be extremely valuable not only in new applications but also in increasing the efficiency of current technology.

Nor do I think new architectures built off the base of LLMs and combining other architectures as accessories coordinated by the LLMs will be limited in this way—particularly when we start hooking these association engines into experiential devices that can manipulate the physical world. Yes, that's very dangerous but an obvious pathway in the development of artificial intelligence.