I feel like we're in the vertical scaling phase of language model design, I look forward to federated model architectures running on cheap commodity hardware.
Notes:
- In my online conversations on X, I see a lot of technologists assuming that large language models like GPT will grow in size and power indefinitely.
- It often reminds me of Moore's law, which was coined by the late co-founder of Intel, Gordon Moore.
- In relation the CPU release cycles, "Moore's law is the observation that the number of transistors in an integrated circuit (IC) doubles about every two years.". Source: https://en.m.wikipedia.org/wiki/Moore%27s_law
- In basic terms, that means that a consumer could expect the CPUs running in their devices to double in power about every two years, and this held true for a long time until CPU manufacturers started to hit the limits of physics in terms of how small they could manufacture transistors to be.
- Now in my opinion we cannot expect the same doubling of power in large language models like GPT from OpenAI, because they will also eventually hit physical limits on:
- The amount of quality training data available, I have described this as being like "the new oil" in a previous episode. Ref: https://techleader.pro/a/643-Tech-Leader-Pro-podcast-2024-week-18,-The-new-knowledge-acquisition-bottleneck
- The amount of available computing resources available to host these models: processing, memory, storage, networking etc. are all finite, especially in a vertical hosting model.
- Let's look at the previous major GPT releases, and see if my theory holds true:
- GPT-1 was released in June 2018. It had 12 decoder layers, and 117 million parameters.
- GPT-2 was released in February 2019 (an 8 month gap). It had 48 decoder layers (a 4x increase), and 1.5 billion parameters (a ~13x increase). So far, so good!
- GPT-3 was released in May 2020 (a 1 year 3 month gap). It had 96 decoder layers (a 2x increase), and 175 billion parameters (a ~117x increase). Here timelines and decoder layers still fall within a hypothetical Moore's law criteria, while the parameters increased dramatically, great!
- GPT-4 was released March 2023 (a 2 year and 10 month gap). It is believed to have 120 layers (a 1.25x increase), and 1.8 trillion parameters (a ~10x increase), but OpenAI have kept such details hidden and these figures are based on a leak. Reference: https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/
- GPT-5 has yet to be released, with no official date confirmed as of recording this.
- Source for GPT-1 to GPT-3 data: https://360digitmg.com/blog/types-of-gpt-in-artificial-intelligence
- So what are we to conclude from this? Well clearly by GPT-4, the release timeline and increases in layers violate a hypothetical Moore's law for AI language models, but the parameter increases remain impressive.
- It is because of those parameter figures growth however that I often claim they are "brute forcing" AI, and that it's not sustainable long-term as eventually they will start to hit practical limits, and you can already see that parameter growth slowing down between GPT-3 and GPT-4.
- It will be very interesting to see what happens with GPT-5, and if we ever see GPT-6.
- Frankly speaking, I have doubts about GPT-6 ever appearing without some major architectural changes of direction.
- Just like when the CPU industry pivoted from vertical scaling (speed measured in hertz) to horizontal scaling (amount of CPU cores for parallel processing), I fully expect the same pivot to happen with large language models in the years ahead.
- In fact I believe that pivot is close.
- All physical processes have upper limits. Eventually all data will be consumed, memory limits will be hit in the cloud, processing limits will be reached...
- Vertical scaling has limits, while horizontal scaling distributes those limits to multiple environments, which helps to prolong growth.
- I feel like we're in the "vertical scaling" phase of language model design, I look forward to federated model architectures running on cheap commodity hardware.
- In addition, solutions should scale down as well as up.
- We are in the mainframe phase of AI development
- I eagerly await small language models that can run on low-powered edge devices in the wild.
- I want my talking door like in Ubik by PKD! If you get that reference, I salute you!
- What I am working on this week:
- Greppr is now at 6.3 million documents indexed.
- Media I am enjoying this week:
- Maelstrom by Peter Watts, which is part 2 of his Rifter series.
- Back playing Escape from Tarkov with the release of patch 0.15.
Notes and subscription links are here: https://techleader.pro/a/658-There-is-no-Moore's-Law-for-AI-(TLP-2024w35)