The Architecture of Intelligence: A Deep Dive into AI Computing Infrastructure

23 minute read

The Architecture of Intelligence

Introduction

Artificial intelligence does not live in the cloud in any meaningful physical sense. It lives in racks of accelerator chips, connected by fiber optic cables, cooled by rivers of chilled water, and powered by an electricity supply that already rivals that of entire nations. As AI workloads have grown from a curiosity to an industrial force — the International Energy Agency estimates global datacenter electricity consumption at roughly 415 TWh in 2024, about 1.5% of the world’s total — understanding the infrastructure behind it has become essential for technologists, policymakers, and investors alike.

This article maps the physical architecture of AI computing: how datacenters connect to one another and to the wider internet, how the major providers have distributed their capacity across North America and Europe, what the interior of a modern AI datacenter actually looks like, and why the supply of electricity has become the single most consequential bottleneck in the field.

Figure 1 — The four pillars of AI computing infrastructure and the scale of the challenge driving them.

Part I — General Architecture Overview

The global fabric: datacenter interconnection

A modern AI service is almost never housed in a single building. It spans a distributed network of datacenters connected by high-bandwidth private backbone links — owned or leased fiber-optic cables that carry traffic across continents and under oceans. The major hyperscalers each operate their own wide-area networks: Google’s B4 and Espresso, Microsoft’s SWAN and ExpressRoute, Amazon’s Global Accelerator network, and Meta’s backbone infrastructure. These private networks exist because the public internet — designed for best-effort delivery — cannot guarantee the latency, bandwidth, or reliability that AI workloads demand, particularly for synchronising distributed training jobs across regions.

Interconnection between continents relies heavily on submarine fiber-optic cables. The North Atlantic alone carries dozens of cable systems linking the eastern United States and Canada to landing points in the UK, Ireland, France, and the Iberian Peninsula. Inside a continent, the major providers operate regional backbone rings with terabits of capacity, often running through carrier-neutral colocation hubs like Ashburn in Virginia, Chicago, Amsterdam, and Frankfurt. Internet Exchange Points (IXPs) — facilities where different networks come together to exchange traffic — serve as critical junction points, particularly for the handoff between provider backbones and the networks of telecom operators and enterprises.

Figure 2 — Simplified topology of a hyperscaler AI network. Datacenter regions within each continent are meshed over private backbone links, with transatlantic connections and user traffic entering through edge PoPs, API gateways, and IXPs.

Entry points and the edge

When an end user sends a prompt to an AI chatbot or an application calls an inference API, the request does not travel directly to a GPU cluster. It passes through multiple layers of edge infrastructure. First, DNS resolution directs the request to a nearby point of presence, typically a CDN edge node or a load-balancing endpoint. There, TLS termination, authentication, rate limiting, and initial request classification take place. The edge decides which backend datacenter region should handle the request — a decision based on latency, current load, model availability, and regulatory constraints such as data residency.

For inference workloads, proximity to the user matters: latency-sensitive applications like real-time conversational AI benefit from having inference capacity close to the request origin. Training workloads, by contrast, are not user-facing and can be routed to whichever region offers the best combination of GPU availability and power cost.

Workload distribution and orchestration

AI workloads fall into several distinct categories, each with different infrastructure demands. Training — the process of building a model from data — is the most compute-intensive, requiring thousands of GPUs communicating in tight synchrony for weeks or months. Inference — running a trained model to generate predictions or responses — is less demanding per-request but must be served at scale and low latency. Fine-tuning, reinforcement learning from human feedback (RLHF), and batch evaluation fall in between.

Figure 3 — The four primary AI workload categories and their relative infrastructure demands. Training dominates GPU, network, and power budgets. Inference is latency-sensitive and geographically distributed.

Orchestration systems decide where each workload runs. Large-scale training jobs are typically managed by cluster schedulers like Slurm (common in HPC-style environments) or custom Kubernetes-based orchestrators. GPU resources are allocated through reservation systems that account for multi-tenancy — different teams or customers sharing the same physical cluster. Geographic placement depends on a mix of factors: electricity cost, carbon intensity, available GPU capacity, regulatory requirements, and thermal conditions that affect cooling efficiency. Workload migration — moving a running job from one region to another — is possible but expensive, especially for training jobs that must checkpoint their state to storage before resuming.

Part II — Mapping Infrastructure to Major AI Providers

The AI infrastructure landscape in North America and Europe is dominated by a handful of hyperscale cloud operators, supplemented by a growing tier of GPU-specialized providers and publicly funded European initiatives.

Hyperscalers and cloud AI platforms

Microsoft Azure and OpenAI. Microsoft operates the world’s largest enterprise cloud and serves as the exclusive infrastructure provider for OpenAI. Key AI-focused regions include East US (Virginia), West US, and several European locations in the Netherlands, Ireland, and the UK. Microsoft has committed to spending over $80 billion on AI-capable datacenters in fiscal year 2025 alone. On the energy front, Microsoft signed a landmark 20-year power purchase agreement with Constellation Energy to restart the Unit 1 reactor at Three Mile Island — now renamed the Crane Clean Energy Center — to power its AI datacenters, with an expected 835 MW of carbon-free output beginning in 2028.

Google Cloud and DeepMind. Google is unique among the hyperscalers in that it designs its own AI accelerator hardware: the Tensor Processing Unit (TPU). TPU pods — large interconnected clusters of TPU chips — underpin both Google’s internal AI products and its Cloud AI offerings through Vertex AI. Major AI datacenter locations include Iowa, South Carolina, and Oregon in the US, and the Netherlands and Finland in Europe. Google signed a first-of-its-kind agreement with Kairos Power for up to 500 MW of small modular reactor (SMR) capacity, with deployment anticipated through 2035.

Amazon Web Services. AWS holds the largest market share in cloud computing overall and has invested heavily in AI with custom silicon — the Trainium chip for training and Inferentia for inference — alongside extensive NVIDIA GPU offerings. Key regions include Northern Virginia, Oregon, Ohio, and in Europe, Ireland and Frankfurt. AWS signed a power purchase agreement with Talen Energy for nearly 2 GW from the Susquehanna nuclear plant in Pennsylvania.

Meta. Meta’s AI infrastructure supports both internal research (including the Llama family of models) and the inference capacity required to serve AI features across its platforms. The company built its Research Super Cluster and subsequent Grand Teton clusters using open-compute hardware designs. Meta notably trained its Llama 3 model on a 24,000-GPU cluster that used an Ethernet-based (RoCE) fabric rather than InfiniBand, demonstrating that carefully tuned Ethernet can match InfiniBand performance at scale. In mid-2025, Meta signed a 20-year PPA with Constellation Energy for 1.1 GW from the Clinton nuclear plant in Illinois.

GPU-specialized cloud providers

Alongside the hyperscalers, a tier of companies has emerged that focus specifically on providing GPU compute. CoreWeave, built from the ground up as a GPU-native cloud provider, has rapidly expanded its North American footprint in partnership with NVIDIA and has attracted significant venture funding. Lambda Labs offers on-demand GPU clusters aimed at AI researchers and startups. Oracle Cloud Infrastructure (OCI) has repositioned itself around bare-metal GPU “superclusters” for AI training, aggressively competing on price and cluster size. In Europe, Nebius (formerly part of Yandex) has built AI-focused cloud operations with GPU clusters available across multiple European locations.

European sovereign and public AI infrastructure

Europe has taken a distinct approach to AI compute, investing in publicly funded supercomputer infrastructure alongside commercial cloud. The EuroHPC Joint Undertaking — a collaboration between the EU and participating countries — funds a network of high-performance computing systems explicitly intended to provide sovereign AI capability. Notable systems include LUMI in Finland (one of the world’s most powerful supercomputers, based on AMD Instinct GPUs), Leonardo in Italy, MareNostrum 5 in Spain, and JUPITER in Germany, the first European exascale system. At the national level, France operates Jean Zay, the UK has invested in Isambard-AI and Dawn, and Germany hosts several Jülich-based facilities. The Gaia-X initiative and EU Data Act further shape the regulatory and architectural constraints on how AI infrastructure is deployed on European soil, with GDPR data residency requirements driving the need for significant in-region capacity.

Colocation and wholesale providers

Not every AI operator builds its own datacenters. Colocation providers — Equinix, Digital Realty, CyrusOne, Vantage, QTS — play a critical role by offering purpose-built facilities that tenants lease. The shift toward AI has forced the colocation industry to upgrade: traditional facilities designed for 6–15 kW per rack are giving way to AI-ready halls rated for 40–100+ kW per rack, with integrated liquid cooling infrastructure. The primary colocation hubs — Ashburn, Dallas, Chicago, and Phoenix in North America, and Amsterdam, Frankfurt, London, Dublin, and the Nordic countries in Europe — are the same locations where the hyperscalers cluster, creating ecosystems of interconnected facilities, power substations, and fiber routes.

Part III — Inside a Modern AI Datacenter

Physical layout and site selection

An AI datacenter campus is selected based on a tightly interlocking set of criteria: sufficient grid power (ideally hundreds of megawatts, with room to grow to a gigawatt), robust fiber connectivity, favourable climate for cooling, available land with appropriate zoning, and reasonable permitting timelines. Modern AI campuses are designed at a scale that would have been unusual even five years ago — multiple buildings, each housing thousands of GPUs, with shared power substations, cooling plants, and networking infrastructure. Construction is phased: buildings are brought online in stages to align with equipment delivery schedules and power availability.

The AI compute floor

The core of an AI datacenter is the compute floor — rows of racks containing accelerator servers. The organizational hierarchy runs from individual chips to nodes to racks to pods to superpods. A modern NVIDIA DGX GB200 NVL72 rack, for example, packs 72 Blackwell GPUs and 36 Grace CPUs into a single liquid-cooled enclosure that weighs about 1.4 metric tons and consumes approximately 120 kW of power. HPE’s configuration of the same system draws up to 132 kW, with 115 kW going to liquid-cooled components and 17 kW to air-cooled peripherals. This represents a dramatic departure from the 6–15 kW per rack typical of conventional datacenters just a few years earlier.

Figure 4 — The internal hierarchy of an AI datacenter. GPUs are assembled into nodes, which populate racks (like the 72-GPU NVL72), which interconnect into superpods, and then into full clusters. Below the compute layer sit the networking fabric, tiered storage, and cooling infrastructure.

Within each rack, GPUs communicate with their neighbours over NVLink — NVIDIA’s proprietary high-speed interconnect that delivers up to 1.8 TB/s per GPU in the fifth generation — and NVSwitch, which enables an all-to-all connection among all 72 GPUs within a rack, creating what NVIDIA describes as a single massive GPU. Across racks, communication relies on the datacenter network fabric.

Datacenter networking: InfiniBand vs. Ethernet

The choice of network fabric for AI clusters has been one of the most significant infrastructure debates of recent years. Historically, InfiniBand dominated, holding over 80% market share for AI back-end networks as recently as 2023. InfiniBand provides sub-2-microsecond latency with credit-based, lossless flow control — hardware-level guarantees that data is never dropped. For AI training workloads, where thousands of GPUs must synchronise gradient updates through all-reduce and all-gather operations, this consistency has been invaluable.

However, the landscape is shifting rapidly. The Ultra Ethernet Consortium released its UEC 1.0 specification in June 2025, which fundamentally rearchitected the Ethernet stack for AI workloads with new congestion control, transport protocols, and packet-spraying techniques. Meta demonstrated that its tuned RoCE (RDMA over Converged Ethernet) fabric could match InfiniBand performance for large model training. Dell’Oro Group reported in mid-2025 that Ethernet had overtaken InfiniBand in AI back-end network deployments, driven by cost advantages and multi-vendor flexibility. The trend is clear: while InfiniBand retains an edge in ultra-large-scale, latency-critical deployments, Ethernet is becoming the predominant choice for the majority of AI clusters.

Storage architecture

AI training clusters require storage systems that can feed data to thousands of GPUs without starving the compute pipeline. The primary tier is typically a parallel file system — Lustre, GPFS/Spectrum Scale, WEKA, or VAST Data — backed by NVMe flash storage capable of delivering hundreds of gigabytes per second of aggregate read throughput. Beneath this sits a warm tier for less frequently accessed datasets and a cold tier using object storage for long-term archival. A critical function is checkpoint storage: long-running training jobs periodically save their state so they can resume after hardware failures. These checkpoints can be many terabytes in size, and writing them quickly enough to avoid stalling the training job requires substantial storage bandwidth.

Cooling: from air to liquid

The transition from air cooling to liquid cooling is arguably the most visible physical change in AI datacenters. A traditional server generating 300–500 watts could be cooled effectively by moving air through the rack with fans. A GPU accelerator node drawing 5+ kW, packed 18 to a rack, cannot. At 120–132 kW per rack, air cooling is physically insufficient — the air simply cannot absorb and remove heat fast enough.

Figure 5 — Comparison of traditional air-cooled datacenter design (left) with modern direct liquid cooling (right). At AI-grade power densities of 100+ kW per rack, liquid cooling is not optional — it is the only viable approach.

Direct liquid cooling (DLC) uses cold plates mounted directly on the hottest components — GPUs, CPUs, and HBM memory — with chilled water or a dielectric coolant flowing through them. The NVIDIA DGX GB200 system is designed around this approach, with coolant entering at 25°C and exiting approximately 20 degrees warmer. Coolant Distribution Units (CDUs) on the data hall floor exchange heat between the facility water loop and the rack-level loop. The heat is ultimately rejected outdoors through cooling towers, dry coolers, or — increasingly — captured for reuse in district heating systems, particularly in Nordic countries. Immersion cooling, where entire server boards are submerged in a non-conductive liquid, represents a more radical approach that some operators are exploring for the highest-density deployments.

The cooling system directly affects Power Usage Effectiveness (PUE), the ratio of total facility energy to IT equipment energy. A PUE of 1.0 would mean every watt goes to useful computation; real-world air-cooled facilities typically achieve 1.3–1.6, while well-designed liquid-cooled AI facilities can reach 1.1–1.2. Given the scale of power involved, even small PUE improvements translate to millions of dollars in annual savings.

Part IV — The Power Supply Issue

Scale of the problem

The IEA estimated global datacenter electricity consumption at approximately 415 TWh in 2024 — about 1.5% of world electricity use — growing at 12% per year. By 2030, this is projected to roughly double to 945 TWh in the agency’s base case, approaching 3% of global electricity consumption. In the United States alone, datacenter electricity consumption reached 183 TWh in 2024 and is projected to surge to over 400 TWh by 2030 — more than all energy-intensive manufacturing combined, including aluminium, steel, cement, and chemicals.

Key Statistic: In the US state of Virginia, datacenters already consume approximately 26% of total electricity. In Dublin, the figure reaches 79%, according to analysis by Oeko-Institute. Half of all projected US electricity demand growth through 2030 is expected to come from datacenters. (Sources: Carbon Brief, September 2025; Pew Research Center, October 2025)

The concentration effect makes this more than a national-level concern. Datacenter development clusters in specific locations — Northern Virginia, Dallas, Phoenix, Amsterdam, Dublin, Frankfurt — creating intense local grid pressure. In the PJM Interconnection market (Illinois to North Carolina), datacenter demand contributed to an estimated $9.3 billion price increase in the 2025–26 capacity market, raising average residential electricity bills by $16–18 per month in affected areas.

Figure 6 — Global datacenter electricity consumption, historical and projected. Consumption is expected to more than double between 2024 and 2030, driven primarily by AI workloads. Source: IEA Energy and AI report (2025).

Grid interconnection and utility partnerships

Securing a grid connection for a large AI datacenter has become one of the longest lead-time items in the entire supply chain. In major markets, the queue for new high-voltage connections stretches years — in some regions, five or more years — because connecting a 500 MW campus requires substation upgrades or entirely new transmission lines. This has led operators to pursue alternative strategies: behind-the-meter generation (building power plants directly on site), co-location near existing power sources (nuclear plants, hydroelectric dams), and long-term power purchase agreements (PPAs) that guarantee supply and help finance new generation capacity.

Energy sources and the nuclear renaissance

The most significant energy trend in AI infrastructure is the emergence of nuclear power as the preferred source for large-scale, carbon-free baseload electricity. The deals being signed are remarkable in both scale and duration:

Microsoft / Constellation Energy: 20-year PPA to restart Three Mile Island Unit 1 (835 MW) as the Crane Clean Energy Center. $1.6 billion invested, $1 billion DOE loan secured. Targeting 2027–2028 restart.
Meta / Constellation Energy: 20-year PPA for the entire 1.1 GW output of the Clinton nuclear plant in Illinois.
Amazon / Talen Energy: PPA covering nearly 2 GW from the Susquehanna nuclear station in Pennsylvania.
Google / Kairos Power: First corporate SMR fleet agreement in the US — up to 500 MW across 6–7 reactors through 2035.

Figure 7 — The evolving energy mix for AI datacenters across three time horizons. Natural gas dominates the near term, while nuclear emerges as the strategic long-term choice for baseload, carbon-free power.

Renewables play an important but more complex role. Solar and wind PPAs are widely used — and the IEA projects that renewables and nuclear together will supply nearly 60% of datacenter electricity by 2030, up from about 35% today. However, their intermittency means they cannot alone provide the 24/7 baseload that AI training demands. Battery Energy Storage Systems (BESS) can help bridge gaps, but the economics and scale of multi-hour storage remain challenging for continuous industrial loads. Hydroelectric power is highly attractive where available — Quebec, Norway, Sweden, and the US Pacific Northwest — offering both reliability and near-zero emissions.

Natural gas remains the dominant near-term source for new AI datacenter capacity, particularly in the United States. Combined-cycle gas plants can be permitted and built faster than nuclear facilities, and they provide firm, dispatchable power. The tension between the AI industry’s corporate net-zero pledges and its rapidly growing absolute energy consumption is real and unresolved: several major hyperscalers have reported rising emissions in recent years, driven directly by datacenter expansion.

Efficiency and mitigation strategies

Hardware efficiency improvements offer meaningful relief. Each new generation of AI accelerator delivers substantially more computation per watt — NVIDIA claims the Blackwell architecture is up to 25 times more energy-efficient than the H100 for certain inference workloads. Techniques such as model quantisation (reducing numerical precision from FP16 to FP8 or even FP4), sparsity-aware training, mixture-of-experts architectures, and knowledge distillation reduce the total compute (and thus energy) required to achieve a given level of model capability. On the infrastructure side, improved PUE through liquid cooling, waste heat recovery, and intelligent workload scheduling — shifting batch jobs to off-peak hours or periods of high renewable generation — all contribute.

However, these efficiency gains face a familiar adversary: Jevons’ paradox. As AI becomes more efficient and therefore cheaper and more useful, demand grows to more than offset the savings. The IEA projects that accelerated server electricity consumption will grow at roughly 30% annually through 2030, far outpacing the efficiency improvements.

Policy, regulation, and geopolitics

Governments are responding to the energy impact of AI datacenters with a mix of support and restraint. In the United States, the federal government has identified datacenter development as a national priority, committing land, streamlined permitting, and financial support (including the $1 billion DOE loan for the Crane Clean Energy Center restart). Many US states offer tax incentives and expedited permitting to attract datacenter investment.

In Europe, the picture is more conflicted. The Netherlands and Ireland — both major datacenter hubs — have imposed moratoriums or restrictions on new datacenter construction in certain areas, driven by concerns over grid strain, water use, and conflict with residential electricity needs. At the same time, EU industrial policy through EuroHPC and the European Chips Act aims to build sovereign AI compute capacity, creating a tension between environmental caution and strategic technology investment.

At the geopolitical level, access to energy — and therefore to AI computing capacity — is becoming a competitive moat and a matter of national security. The concentration of AI training infrastructure in a small number of locations and under a small number of corporate operators raises questions about resilience, strategic dependence, and the distribution of the technology’s benefits. Sovereign wealth funds from the Middle East and the Nordic countries are investing heavily in AI datacenter capacity, positioning energy access as a foundation for participation in the AI economy.

Conclusion

Artificial intelligence is often discussed as if it were purely a software phenomenon — a matter of algorithms, data, and research breakthroughs. But every parameter of every model, every token generated in every conversation, ultimately rests on a physical substrate: silicon chips, copper and optical cables, chilled water flowing through cold plates, and electrons generated by turbines, reactors, and solar panels. The infrastructure described in this article — from the global fabric of interconnected datacenters to the liquid-cooled rack consuming 120 kW of power — is what makes modern AI possible.

The constraints are real and growing. Power supply, not chip availability, is increasingly the binding constraint on AI scaling. The decisions being made today about grid connections, nuclear plant restarts, and datacenter siting will shape the trajectory of AI capability for decades. Understanding this infrastructure is no longer optional for anyone working in, investing in, or making policy about artificial intelligence.

Sources and Further Reading

Energy and Power Data

International Energy Agency — Energy and AI special report (April 2025)
IEA — Electricity Mid-Year Update 2025 — demand projections
Pew Research Center — What we know about energy use at US data centers amid the AI boom (October 2025)
Carbon Brief — AI: Five charts that put data-centre energy use into context (September 2025)
S&P Global — Global data center power demand to double by 2030 on AI surge: IEA (April 2025)

Nuclear and Energy Deals

Constellation Energy / Microsoft — Three Mile Island (Crane Clean Energy Center) restart, 20-year PPA, $1.6B investment
Pennsylvania Capital-Star — Microsoft describes Three Mile Island plant as once-in-a-lifetime opportunity (June 2025)
NucNet — Constellation Secures $1 Billion Federal Loan For Three Mile Island Restart (2025)
Introl Blog — Nuclear power for AI: inside the data center energy deals (January 2026)
Commonfund — AI Data Center and AI Power Demand — Will Nuclear Be the Answer? (September 2025)

Datacenter Hardware and Architecture

NVIDIA — GB200 NVL72 product documentation and specifications
HPE — NVIDIA GB200 NVL72 by HPE QuickSpecs (132 kW per rack)
NVIDIA — DGX GB200 User Guide — hardware architecture
The Register — A closer look at Nvidia’s 120kW DGX GB200 NVL72 rack system (March 2024)
Sunbird DCIM — Is Your Data Center Ready for the NVIDIA GB200 NVL72?

Networking

Dell’Oro Group — AI Networks for AI Workloads report (July 2025)
Ultra Ethernet Consortium — UEC 1.0 Specification (June 2025)
Vitex LLC — InfiniBand vs Ethernet for AI Clusters: GPU Networks 2025 (November 2025)
TrendForce — InfiniBand vs Ethernet: Broadcom and NVIDIA Scale-Out Tech War (October 2025)
Network World — Ethernet, InfiniBand, and Omni-Path battle for the AI-optimized data center (September 2025)
WWT — The Battle of AI Networking: Ethernet vs InfiniBand (February 2025)

Applying This in Practice

If you are applying these ideas to a regulated product, certification target, or production system, I can help turn the analysis into a threat model, architecture review, migration roadmap, or remediation plan.

Discuss a security architecture challenge

Twitter Facebook LinkedIn