In graduate school, I recall a professor suggesting that the rational expectations revolution would eventually lead to much better models of the macroeconomy. I was skeptical, and in my view, that didn’t happen.
This is not because there is anything wrong with the rational expectations approach to macro, which I strong support. Rather I believe that the advances coming out of this theoretical innovation occurred very rapidly. For instance, by the time I had this discussion (around 1979), people like John Taylor and Stanley Fischer had already grafted rational expectations onto sticky wage and price models, which contributed to the New Keynesian revolution. Since that time, macro seems stuck in a rut (apart from some later innovations from the Princeton School (related to the zero lower bound issue.)
In my view, the most useful applications of a new conceptual approach tend to come quickly in highly competitive fields like economics, science and the arts.
In the past few years, I’ve had a number of interesting conversations with younger people who are involved in the field of artificial intelligence. These people know much more about AI than I do, so I would encourage readers to take the following with more than grain of salt. During the discussions, I sometimes expressed skepticism about the future pace of improvement in large language models such as ChatGPT. My argument was that there were some pretty severe diminishing returns to exposing LLMs to additional data sets.
Think about a person that reads and understood 10 well-selected books on economics, perhaps a macro and micro principles text, as well as some intermediate and advanced textbooks. If you fully absorbed this material, you would actually know quite a bit of economics. Now have them read 100 more well chosen textbooks. How much more economics would they actually know? Surely not 10 times as much. Indeed I doubt they would even know twice as much economics. I suspect the same could be said for other fields like biochemistry or accounting.
This Bloomberg article caught my eye:
OpenAI was on the cusp of a milestone. The startup finished an initial round of training in September for a massive new artificial intelligence model that it hoped would significantly surpass prior versions of the technology behind ChatGPT and move closer to its goal of powerful AI that outperforms humans. But the model, known internally as Orion, didn’t hit the company’s desired performance. Indeed, Orion fell short when trying to answer coding questions that it hadn’t been trained on. And OpenAI isn’t alone in hitting stumbling blocks recently. After years of pushing out increasingly sophisticated AI products, three of the leading AI companies are now seeing diminishing returns from their hugely expensive efforts to build newer models.
Please don’t take this as meaning I’m an AI skeptic. I believe the recent advances in LLMs are extremely impressive, and that AI will eventually transform the economy in some profound ways. Rather, my point is that the advancement to some sort of super general intelligence may happen more slowly than some of its proponents expect.
Why might I be wrong? I’m told that artificial intelligence can be boosted by methods other than just exposing the models to ever larger data sets, and that the so-called “data wall” may be surmounted by other methods of boosting intelligence. But if Bloomberg is correct, LLM development is in a bit of a lull due to the force of diminishing returns from having more data.
Is this good news or bad news? It depends on how much weight you put on risks associated with the development of ASI (artificial super intelligence.)
Update: Tyler Cowen has some closely related views on this topic.
READER COMMENTS
Jim Glass
Nov 13 2024 at 9:31pm
Yes, I’ve seen a lot about this, declining marginal returns to training, depletion of material to train on, explosion of the energy costs of training — to the point of reviving nuclear power just for AI, including reopening Three Mile Island.
I’ve no idea if this is good news or bad news for the future of humanity. But I suspect it’s going to be pretty bad news for most of the many firms that have gotten billion-dollar valuations for themselves on zero net revenue by calling themselves “AI leaders”, just as bad news arrived for the bulk of the dot-coms during that bubble, and for the “electronics” firms of the 1920s, and 97% of the maybe 500 auto makers that were busy creating that industry circa 1910.
Matthias
Nov 14 2024 at 3:33am
There was no dot-com bubble. But there sure was a dot-com bust
(There was no dot-com bubble in the sense that if you bought eg all of the stocks of NASDAQ throughout the alleged bubble years, and just held onto them for the long run, say, 20 years, you’d have a decent return on investment.
Which suggests that the ‘bubble’ valuations were actually fairly reasonable on average.)
tpeach
Nov 14 2024 at 12:51am
This could be the case. AI seemed super impressive less than two years ago. But now the wow factor has passed, like most technologies when we get used to them. I’m no longer amazed by it, and sometimes even unimpressed.
However, when I think of the possibilities of AI, I’m reminded of what David Bowie said about the internet in an interview in 1999:
https://bigthink.com/the-future/david-bowie-internet/
Scott Sumner
Nov 14 2024 at 4:30pm
Very good quote.
David S
Nov 14 2024 at 5:25am
I’ll be more impressed with AI when we (or AI) figure out a way to scale down it’s energy use while continuing to improve the quality of output. When Newton wrote Principia his brain didn’t need a nuclear power plant to sustain his mathematical breakthroughs.
At a more mundane level humans can learn how to operate cars and machinery with a few hours of instruction and then perform with relatively low rates of fatal error for decades. Parts of the AI community seem to be operating like WW1 generals–“we just need a few million more men to achieve a breakthrough!”
David Seltzer
Nov 14 2024 at 1:58pm
Scott: The positive side of marginal diminishing returns is it incentivizes different ways to increase returns. While the intersection of technology and diminishing returns presents challenges, there are opportunities as well.
gwern
Nov 14 2024 at 3:31pm
The key point here is that the ‘severe diminishing returns’ were well-known and had been quantified extensively and the power-laws were what were being used to forecast and design the LLMs. So when you told anyone in AI “well, the data must have diminishing returns”, this was definitely true – but you weren’t telling anyone anything they shouldn’t’ve’d already known in detail. The returns have always diminished, right from the start. There has never been a time in AI where the returns did not diminish. (And in computing in general: “We went men to the moon with less total compute than we use to animate your browser tab-icon now!” Nevertheless, computers are way more important to the world now than they were back then. The returns diminished, but Moore’s law kept lawing.)
The all-important questions are exactly how much it diminishes and why and what the other scaling laws are (like any specific diminishing returns in data would diminish slower if you were able to use more compute to extract more knowledge from each datapoint) and how they inter-relate, and what the consequences are.
The importance of the current rash of rumors about Claude/Gemini/GPT-5 is that they seem to suggest that something has gone wrong above and beyond the predicted power law diminishing returns of data.
The rumors are vague enough, however, that it’s unclear where exactly things went wrong. Did the LLMs explode during training? Did they train normally, but just not learn as well as they were supposed to and they wind up not predicting text that much better, and did that happen at some specific point in training? Did they just not train enough because the datacenter constraints appear to have blocked any of the real scaleups we have been waiting for, like systems trained with 100x+ the compute of GPT-4? (That was the sort of leap which takes you from GPT-2 to GPT-3, and GPT-3 to GPT-4. It’s unclear how much “GPT-5” is over GPT-4; if it was only 10x, say, then we would not be surprised if the gains are relatively subtle and potentially disappointing.) Are they predicting raw text as well as they are supposed to but then the more relevant benchmarks like GPQA are stagnant and they just don’t seem to act more intelligently on specific tasks, the way past models were clearly more intelligent in close proportion to how well they predicted raw text? Are the benchmarks better, but then the endusers are shrugging their shoulders and complaining the new models don’t seem any more useful? Right now, seen through the glass darkly of journalists paraphrasing second-hand simplifications, it’s hard to tell.
Each of these has totally different potential causes, meanings, and implications for the future of AI. Some are bad if you are hoping for continued rapid capability gains; others are not so bad.
Scott Sumner
Nov 14 2024 at 4:39pm
Thanks for that very helpful comment. It seems my skepticism about the pace of improvement may have been correct, but perhaps for the wrong reason.
But I do recall one or two people I spoke with claiming that more data alone would produce big gains. So my sense is I was more pessimistic than some even on the specific topic of diminishing returns.
In case you read this reply, I was very interested in your tweet about the low price of some advanced computer chips in wholesale Chinese markets. Is your sense that this mostly reflects low demand, or the widespread evasion of sanctions?
gwern
Nov 14 2024 at 6:26pm
My guess is that when they said more data would produce big gains, they were referring to the Chinchilla scaling law breakthrough. They were right but there might have been some miscommunications there.
First, more data produced big gains in the sense that cheap small models suddenly got way better than anyone was expecting in 2020 by simply training them on a lot more data, and this is part of why ChatGPT-3 is now free and a Claude-3 or GPT-4 can cost like $10/month for unlimited use and you have giant context windows and can upload documents and whatnot. That’s important. In a Kaplan-scaling scenario, all the models would be far larger and thus more expensive, and you’d see much less deployment or ordinary people using them now. (I don’t know exactly how much but I think the difference would often be substantial, like 10x. The small model revolution is a big part of why token prices can drop >99% in such a short period of time.)
Secondly, you might have heard one thing when they said ‘more data’ when they were thinking something entirely different, because you might reasonably have thought that ‘more data’ had to be something small. While when they said ‘more data’, what they might have meant, because this was just obvious to them in a scaling context, was that ‘more’ wasn’t like 10% or 50% more data, but more like 1000% more data. Because the datasets being used for things like GPT-3 were really still very small compared to the datasets possible, contrary to the casual summary of “training on all of the Internet” (which gives a good idea of the breadth and diversity, but is not even close to being quantitatively true). Increasing them 10x or 100x was feasible, so that will lead to a lot more knowledge.
It was popular in 2020-2022 to claim that all of the text had already been used up and so scaling had hit a wall and such dataset increases were impossible, but it was just not true if you thought about it. I did not care to argue about it with proponents because it didn’t matter and there was already too much appetite for capabilities rather than safety, but I thought it was very obviously wrong if you weren’t motivated to find a reason scaling had already failed. For example, a lot of people seemed to think that Common Crawl contains ‘the whole Internet’, but it doesn’t – it doesn’t even contain basic parts of the Western Internet like Twitter. (Twitter is completely excluded from Common Crawl.) Or you could look at the book counts: the papers report training LLMs on a few million books, which might seem like a lot, but Google Books has closer to a few hundred million books-worth of text and a few million books get published each year on top of that. And then you have all of the newspaper archives going back centuries, and institutions like the BBC, whose data is locked up tight, but if you have billions of dollars, you can negotiate some licensing deals. Then you have millions of users each day providing unknown amounts of data. Then also if you have a billion dollars cash and you can hire some hard-up grad students or postdocs at $20/hour to write a thousand high-quality words, that goes a long way. And if your models get smart enough, you start using them in various ways to curate or generate data. And if you have more raw data, you can filter it more heavily for quality/uniqueness so you get more bang per token. And so on and so forth.
There was a lot of stuff you can do if you wanted to hard enough. If there was demand for the data, supply would be found for it. Back then, LLM creators didn’t invest much in creating data because it was so easy to just grab Common Crawl etc. If we ranked them on a scale of research diligence from “student making stuff up in class based on something they heard once” to “hedge fund flying spy planes and buying cellphone tracking and satellite surveillance data and hiring researchers to digitize old commodity market archives”, they were at the “read one Wikipedia article and looked at a reference or two” level. These days, they’ve leveled up their data game a lot and can train on far more data than they did back then.
My sense is that it’s sort of a mix of multiple factors but mostly an issue of demand side at root. So for the sake of argument, let me sketch out an extreme bear case on Chinese AI, as a counterpoint to the more common “they’re just 6 months behind and will leapfrog Western AI at any moment thanks to the failure of the chip embargo and Western decadence” alarmism.
It is entirely possible that the sanctions hurt, but counterfactually their removal would not change the big picture here. There is plenty of sanctions evasion – Nvidia has sabotaged it as much as they could and H100 GPUs can be exported or bought many places – but the chip embargo mostly works by making it hard to create the big tightly-integrated high-quality GPU-datacenters owned by a single player who will devote it to a 3-month+ run to create a cutting-edge model at the frontier of capabilities. You don’t build that datacenter by smurfs smuggling a few H100s in their luggage. There are probably hundreds of thousands of H100s in mainland China now, in total, scattered penny-packet, a dozen here, a thousand there, 128 over there, but as long as they are not all in one place, fully integrated and debugged and able to train a single model flawlessly, for our purposes in thinking about AI risk and the frontier, those are not that important. Meanwhile in the USA, if Elon Musk wants to create a datacenter with 100k+ GPUs to train a GPT-5-killer, he can do so within a year or so, and it’s fine. He doesn’t have to worry about GPU supply – Huang is happy to give the GPUs to him, for divide-and-conquer commoditize-your-complement reasons.
With compute-supply shattered and usable just for small models or inferencing, it’s just a pure commodity race-to-the-bottom play with commoditized open-source models and near zero profits. The R&D is shortsightedly focused on hyperoptimizing existing model checkpoints, borrowing or cheating on others’ model capabilities rather than figuring out how to do things the right scalable way, and not on competing with GPT-5, and definitely not on finding the next big thing which could leapfrog Western AI. No exciting new models or breakthroughs, mostly just chasing Western taillights because that’s derisked and requires no leaps of faith. (Now they’re trying to clone GPT-4 coding skills! Now they’re trying to clone Sora! Now they’re trying to clone MJv6!) The open-source models like DeepSeek or Llama are good for some things… but only some things. They are very cheap at those things, granted, but there’s nothing there to really stir the animal spirits. So demand is highly constrained. Even if those were free, it’d be hard to find much transformative economy-wide scale uses right away.
And would you be allowed to transform or bestir the animal spirits? The animal spirits in China need a lot of stirring these days. Who wants to splurge on AI subscriptions? Who wants to splurge on AI R&D? Who wants to splurge on big datacenters groaning with smuggled GPUs? Who wants to pay high salaries for anything? Who wants to start a startup where if it fails you will be held personally liable and forced to pay back investors with your life savings or apartment? Who wants to be Jack Ma? Who wants to preserve old Internet content which becomes ever more politically risky as the party line inevitably changes? Generative models are not “high-quality development”, really, nor do they line up nicely with CCP priorities like Taiwan. Who wants to go overseas and try to learn there, and become suspect? Who wants to say that maybe Xi has blown it on AI? And so on.
Put it all together, and you get an AI ecosystem which has lots of native potential, but which isn’t being realized for deep hard to fix structural reasons, and which will keep consistently underperforming and ‘somehow’ always being “just six months behind” Western AI, and which will mostly keep doing so even if obvious barriers like sanctions are dropped. They will catch up to any given achievement, but by that point the leading edge will have moved on, and the obstacles may get more daunting with each scaleup. It is not hard to catch up to a new model which was trained on 128 GPUs with a modest effort by one or two enthusiastic research groups at a company like Baidu or at Tsinghua. It may be a lot harder to catch up with the leading edge model in 4 years which was trained in however they are being trained then, like some wild self-play bootstrap on a million new GPUs consuming multiple nuclear power plants’ outputs. Where is the will at Baidu or Alibaba or Tencent for that? I don’t see it.
I don’t necessarily believe all this too strongly, because China is far away and I don’t know any Mandarin. But until I see the China hawks make better arguments and explain things like why it’s 2024 and we’re still arguing about this with the same imminent-China narratives from 2019 or earlier, and where all the indigenous China AI breakthroughs are which should impress the hell out of me and make me wish I knew Mandarin so I could read the research papers, I’ll keep staking out this position and reminding people that it is far from obvious that there is a real AI arms race with China right now or that Chinese AI is in rude health.
David S
Nov 15 2024 at 7:34am
Gwern, thank you for your lengthy and descriptive commentary–it’s more than the grimy readers of Econlog like me deserve. I don’t know who you are or what your credentials are but judging from the details and reasoning you provide, I’m inclined to trust you on this subject. Your descriptions of how China is probably approaching AI have a historical analogy in the history of computer development from the 1960’s through the 1980’s. The U.S. had a rich ecosystem of software and hardware development that generated lots of success through lots of failures, and had deep pocketed sponsors in the military as well as large businesses. The Soviets, and other Eastern Bloc countries, stagnated for reasons very similar to your speculations about how things are going in China—i.e. no one wants to take risks because failure will probably be punished.
gwern
Nov 15 2024 at 10:34am
Oh, I’m just a guy online who reads a lot, you shouldn’t trust me. You don’t need to take my word for it or trust me on anything – everything I’ve said here is public information you can find in Sixth Tone or NYT or Financial Times etc.
Instead, just keep these points in mind as you watch events unfold. 6 months from now, are you reading research papers written in Mandarin or in English, and where did the latest and greatest research result everyone is rushing to imitate come from? 12 months from now, is the best GPU/AI datacenter in the world in mainland China, or somewhere else (like in America)? 18 months now, are you using a Chinese LLM for the most difficult and demanding tasks because it’s substantially, undeniably better than any tired Western LLM? As time passes, just ask yourself, “do I live in the world according to Gwern’s narrative, or do I instead live in the ‘accelerate or die’ world of an Alexandr Wang or Beff Jezos type? What did I think back in November 2024, and would what I see, and don’t see, surprise me now?” If you go back and read articles in Wired or discussions on Reddit in 2019 about scaling and the Chinese threat, which arguments predicted 2024 better?
Scott Sumner
Nov 16 2024 at 12:39am
Gwern, Sorry for the slow reply, I sometimes forget to check comment sections. This is a very helpful reply. Honestly, it’s probably the best quality reply I’ve seen in 10 years of blogging at Econlog. I’ll continue to watch developments in the field and try to occasionally circle back and review your perspective.
Thank you.
Jim Glass
Nov 16 2024 at 1:45pm
Declining marginal returns are occurring across all of science. Sabine is very upset about it.
Warren Platts
Nov 21 2024 at 1:14pm
Thanks for the link. I see Miss Sabine cited John Horgan’s “The End of Science.” I read that book back in early 1990s when it first came out and passed it around the philosophy department at the time (or at least tried to). The basic point is true, I think. If reality really is a thing, there’s only so much you can learn about it. So you must hit diminishing returns..
Spencer Hall
Nov 16 2024 at 3:44pm
AI’s scope is wider, richer, indeed, more comprehensive. It has the ability to solve complex problems in shorter time frames. But it still suffers from GIGO. It will copy bad logic.
Ted
Nov 16 2024 at 4:19pm
“Rather, my point is that the advancement to some sort of super general intelligence may happen more slowly than some of its proponents expect.”
Sentences like these are interesting to me.
On the one hand, I agree with your sentence 100%. In fact, I wrote a 100+ page paper discussing why the economy is not likely to be fully automated by 2043: https://arxiv.org/abs/2306.02519
On the other hand, I find sentences like this to be almost tautological. On any factual question with a distribution of beliefs, almost all beliefs are less extreme than some. So making the point that you believe things will go more slowly than its most extreme proponents is not a sentence that carries much meaning (i.e., falsifiability) in its unquantified state.
Like you, I’m curious to see where this all goes. Cheers.
Comments are closed.