The Nvidia Frenzy Has Just Begun
Advertisements
As we delve into the current dynamics of the artificial intelligence (AI) landscape, it is evident that Nvidia, traditionally seen as a key competitor against AMD in the graphics processing unit (GPU) sector, is in fact facing its fiercest competition from tech giants like Google and AmazonThe emergence of generative AI applications, particularly since the launch of ChatGPT by OpenAI, has driven an exponential demand for Nvidia’s GPUs, which have become the cornerstone of AI chip developmentThis surge in demand has not come without significant challenges, notably the production bottlenecks tied to TSMC's advanced packaging technologies, such as CoWoS (Chip on Wafer on Substrate), alongside the critical high bandwidth memory (HBM) necessary for these powerful chipsThis situation has been instrumental in creating a global shortage of GPUs, thereby escalating prices and intensifying the so-called “GPU craze.”
Specifically, the Nvidia H100 GPU has emerged as the most sought-after model, with its prices reportedly soaring to around $40,000 each
In response to these high market prices, TSMC has doubled its production capacity, while other significant players in the DRAM production market, such as SK Hynix, have ramped up their HBM manufacturing capabilitiesThese efforts have led to a notable reduction in delivery times for the H100 from an initial 52 weeks down to 20 weeksHowever, as we examine whether this GPU frenzy will settle, it becomes apparent that the demand from cloud service providers (CSPs) like Google, Amazon, and Microsoft vastly outstrips the supply that can be made available.
It is anticipated that through 2024, the volume of high-end AI servers needed for managing and developing ChatGPT-level applications will only represent a mere 3.9% of overall shipmentsConsequently, this suggests that the present GPU frenzy from Nvidia may just be the beginning of a broader trend toward generative AI
- Major Overhaul of the Nasdaq 100 Index
- The Bank of Japan Should Raise Interest Rates
- A-Shares Soar, Semiconductor Companies Cash In!
- Pledging Will Determine Bitcoin's Role in the Digital Economy
- How Non-Farm Payroll Data Disturbs the Market?
To better comprehend these developments, we must take a step back and consider the underpinning factors leading to Nvidia’s GPU production challenges.
The main bottlenecks encountered in the production of Nvidia’s GPUs can be traced back to several critical stages in manufacturing that are predominantly handled by TSMCThis includes the intermediary processes where GPUs, CPUs, and HBM are fabricated and then positioned onto silicon interposers extracted from 12-inch wafersAs the performances of GPUs have escalated, the dimensions of GPU chips have also increased, requiring a greater number of HBM modules to be integrated into each unit producedThis flooding of the market with larger chips has meant that while the size of these interposers has grown over time, the yield from a given wafer has decreased, creating a paradox in production rates.
Since the introduction of CoWoS by TSMC in 2011, the result is that the HBM modules, with their stacks of DRAM chips, have become increasingly limited and high-demand resources as they shrink in size every two years to enhance performance
This runaway need has fueled a critical scarcity in cutting-edge HBM technologyLooking ahead, TSMC has projected a doubling of its silicon interposer capacity from 15,000 units per month in mid-2023 to over 30,000 units per month in 2024. Additionally, big players like Samsung and Micron have started to secure certifications to supply advanced HBM, further increasing the competition and availability in this space.
The resulting effect of these adjustments has led to a significant reduction in delivery times for the Nvidia H100 GPUsHowever, this raises an essential question: how many AI servers can expect to see increased shipment volumes owing to these changes? We need to define the two main categories of AI servers that are expected to dominate the market in the years to come.
According to a report by DIGITIMES Research published in their Servers Report Database, the AI server landscape can be categorized into two distinct segments
The first category consists of systems equipped with two or more AI accelerators but lacking HBM, referred to as “general-purpose AI servers.” The second category showcases high-end AI servers, which are defined by their configuration of at least four AI accelerators, all equipped with HBM.
Among the many nuances of this market, it is essential to note that the high-end AI servers are primarily aimed at supporting sophisticated applications like ChatGPT-level generative AITherefore, any analysis surrounding server shipments must consider the stark contrast in expectations between general-purpose and high-end AI servers.
When evaluating shipment volumes from 2022 through 2024, we can anticipate general-purpose AI servers to witness an increase from 344,000 units in 2022 to an estimated 725,000 units in 2024, while the high-end AI servers, essential for generative AI, are predicted to grow from 34,000 units in 2022 to 564,000 units in 2024. Despite these figures, it’s crucial to remain vigilant about whether the projected shipments can effectively meet the rising demands from the likes of American CSPs.
Particularly alarming is the disparity between the total server shipment numbers and the specific categories of general-purpose and high-end segments
Over the last few years, the sheer volume of AI servers built doesn't appear to satisfy the qualitative demands posed by generative AI applications.
As we dig deeper into the specifics of what is required to operate ChatGPT-level AI, reports indicate that Nvidia's high-end DGX H100 AI servers, equipped with eight H100 chips, each costing around $40,000, could require up to 30,000 such units to fully realize the capabilities of ChatGPTSuch an investment would amount to an overwhelming $13.8 billion—a clear indicator of the scale of investment needed for generative AI applications.
Looking more broadly, we might wonder how many generative AI models resembling ChatGPT have actually been implemented around the globe
Given the limited high-end AI server shipments in 2022, there barely existed a single opportunity to create a ChatGPT-like AI systemMoving into 2023, the increase in these high-end server shipments gave rise to possibly six or seven systems capable of similar performance ratiosProjections suggest that 2024 will see an issuance of 564,000 high-end AI servers, providing room to build between 18 and 19 ChatGPT-level AI systems, provided the initial projections concerning the server quantities needed are accurate.
It does, however, come with an understanding that the estimates assume that building a ChatGPT-like AI can be solely achieved with 30,000 high-end DGX H100 unitsThe growing complexity of subsequent AI generations may necessitate an even greater number of servers, further complicating the fulfillment of demand levels across the market.
Examining user demands, it is noteworthy that Microsoft, which owns OpenAI, leads in terms of high-end AI server quantity in 2023 with 63,000 units
But looking forward to 2024, Google is anticipated to surpass Microsoft, reaching a tally of 162,000 unitsThe landscape seems to reshape, with leading players like Supermicro, Amazon, and Meta mining significant shares of high-end AI servers for their infrastructural demands and deployments.
In terms of specific hardware, Nvidia's GPUs remain dominant as the preferred AI accelerators, nearing a total of 336,000 units by 2024. However, directly competing against this figure are Google's own Tensor Processing Units (TPUs), projected to reach 138,000 units, presenting an unconventional twist where a direct competitor is also a significant customer of Nvidia’s GPU offeringsThis duality emphasizes the competitive tension that exists between these tech powerhouses.
As we close this analysis, it is evident that Nvidia's rivals such as Google and Amazon are maturing into formidable challengers within the AI acceleration space
What was once thought of as a competitive dichotomy primarily with AMD is evolving into a multifaceted terrain where these entities now play critical roles.
To summarize the implications of these developments, projections indicate that by 2024, only 3.9% of all servers shipped will be capable of supporting the sophisticated demands of generative AI-powered applications, highlighting a significant gap that must be narrowed if the needs of CSPs are to be adequately metTherefore, the current GPU craze surrounding Nvidia reflects only the nascent stage of this broader generative AI evolution that is poised to unfold in the coming years.
In a more expansive view, the Semiconductor Industry Association (SIA) predicts that the global semiconductor market could exceed $1 trillion by 2030, with computing and data storage projected to commandeer the largest shares of this growth
As PCs and servers continue to dominate the statistics, the rapid advancements in technology adoption in data centers signify an underpinning growth trajectory that will likely reshape spending and investments in hardware solutions moving forward.
With the growth of data centers being a pivotal player in this evolving landscape, expectations suggest that the core elements—network infrastructure, servers, and storage—will see exponential growth, paving the way for a robust semiconductor market dedicated to supporting data-centric applications.
Thus, in conclusion, it is worth reiterating that the current GPU frenzy bespeaks only the preliminary steps taken towards a much more comprehensive inflection point within the generative AI realm, which is on the brink of experiencing transformative growth in the near future.
Your email address will not be published.Required fields are marked *
Join 70,000 subscribers!
By signing up, you agree to our Privacy Policy