The Computer Was Built for Humans. At GTC Taipei, Jensen Huang Just Built One for AI
For 40 years, every computer ever made was designed around one fundamental assumption: a human being was using it. At Computex 2026, NVIDIA founder and CEO Jensen Huang dismantled that legacy blueprint during a visionary two-hour keynote to open NVIDIA GTC Taipei. In its place, Huang established a new computing paradigm for the tech industry: the primary user of next-generation computing infrastructure is no longer a person—it is an autonomous AI agent.
From AI That Answers to AI That Works
For most people, AI still means a chatbot — type a question, receive an answer. Agentic AI operates differently. An agent receives a goal, breaks it into steps, selects and uses the right tools, checks its own output, and completes the task without being prompted at each stage. The difference is not speed. It is autonomy. Think of it as the gap between asking a colleague a question and handing them a project.
Huang’s evidence that this transition has already happened was concrete. GitHub commit activity — a reliable measure of global software output — tripled in the first months of 2026, after growing from 300 million commits in 2023 to 500 million in 2025. “30 million software developers represent about $3 trillion worth of salaries per year,” he said. “That $3 trillion is now producing nearly three times as much output — effectively $9 trillion of productivity.”
The argument he drew from that data cuts against the dominant narrative around AI and employment: agents are not replacing engineers, they are multiplying what each engineer produces. And because every unit of AI output — every token — has become a profitable unit of revenue, demand for the infrastructure that generates those tokens has become the defining constraint on the global technology economy. Taiwan’s manufacturers, Huang noted, are currently running at full capacity to meet it.
What makes agentic AI structurally different:
- An agent is not a faster chatbot. It combines a large language model, an orchestration harness, tools, working memory, and a runtime — closer in structure to an operating system than to any application built before it. The entire computing stack underneath it needs redesigning, not upgrading.
- Every reasoning step is expensive. Each time an agent thinks through a problem, an entire rack of GPU hardware activates. Inference efficiency — not raw capability — is now the defining competitive variable for AI companies.
- The speed mismatch is a real cost. Agents operate at nanosecond timescales. Conventional server infrastructure was built for humans, who measure response times in seconds. That gap is not a minor inefficiency — it means today’s standard data centre hardware was never designed for this workload.
- Twenty years of NVIDIA software now works as agent tools. NVIDIA’s 1,000-plus CUDA-X libraries — spanning genomics, physics simulation, and financial modelling — are being repackaged with readable instructions so agents can learn to use them directly, giving agentic systems access to decades of optimised scientific computing overnight.
Vera Rubin: A System Built for What Agents Actually Do
A single AI model and an agentic system are not the same computational problem. A model responds to inputs. An agent coordinates — managing memory, spawning sub-tasks, calling tools, retrieving data, and synthesising results across simultaneous processes. The hardware built for one does not naturally handle the other.
Vera Rubin, now in full production, is NVIDIA’s purpose-built answer. Huang called it “the most ambitious endeavour in the history of our company” — and it is not a chip. It is a five-component rack-scale system: a GPU rack for high-throughput token generation; a CPU rack of 256 liquid-cooled Vera processors for orchestration; a low-latency inference rack built around Groq LPUs; a BlueField-4 storage and security system; and the world’s first Ethernet switch with 200-gigabit co-packaged optics. In total, 1.3 million components supplied by 150 partners across Taiwan. What previously took two hours to assemble now takes five minutes. The supply chain supporting Vera Rubin is twice the scale of its predecessor, Grace Blackwell.
The economics are unambiguous. A one-gigawatt AI factory now costs between $50 billion and $100 billion to build. At that level of commitment, every watt of compute is revenue and every idle watt is loss. NVIDIA’s DSX infrastructure platform manages this through dynamic power allocation, recovery of stranded capacity, and hot liquid cooling at 45 degrees Celsius that frees power from thermal management and redirects it to token generation. “The more you buy, the more you make,” Huang said. At $100 billion per factory, that is an operating model, not a slogan.
The Vera CPU: The First Processor Designed for a Non-Human User
This is where the keynote’s central argument became its sharpest. Every CPU ever built — from the first Intel processor to the latest AMD server chip — was designed for people. Humans work in seconds. CPUs were optimised accordingly: maximise cores, virtualise capacity, rent by the hour. Agents work in nanoseconds. When an agent needs to retrieve a memory or execute a code task, every millisecond of delay stalls the entire reasoning chain — and leaves the GPU beside it, costing thousands of dollars per operational hour, sitting idle. “We built CPUs for humans in the past,” Huang said plainly. “This CPU is built for agents.”
Vera is the result — and its performance gap over existing x86 processors is not incremental.
Vera CPU against current leading alternatives:
- 10 instructions per clock cycle — roughly double Intel and AMD’s best. Leading server CPUs from Intel and AMD process 4 to 6 instructions per clock cycle under real workloads. Vera’s 10-wide pipeline completes significantly more work per tick, directly cutting the time agents spend waiting for the CPU to keep pace with the GPU.
- 3.6 terabytes per second internal bandwidth — eliminating the core-to-core congestion that slows agents. Conventional CPUs move data between cores across a shared bus that clogs under load. Vera replaces this with a monolithic mesh connecting all 88 cores simultaneously — meaning hundreds of concurrent agents no longer queue behind each other.
- 1.2 terabytes per second memory bandwidth — two to three times the current x86 ceiling. Top Intel and AMD server CPUs deliver 300 to 500 gigabytes per second. Vera’s LPDDR5X memory delivers more than twice that, which matters because agents constantly pull large context windows, cached states, and tool outputs from memory. Faster retrieval means less GPU downtime.
- 40% lower memory latency than x86 — compounding across every step an agent takes. In agentic systems, a single task can involve hundreds of sequential memory retrievals. A 40% reduction in the time between requesting and receiving data accumulates across the entire chain, cutting total task completion time meaningfully.
- SQL three times faster — directly relevant to every enterprise deploying agents. SQL is the query language running virtually every business database on the planet. Any enterprise agent operating in finance, logistics, healthcare, or operations spends significant time on structured data retrieval. Three times faster queries translate directly into faster agent responses and higher throughput per dollar spent.
- Real-time stream processing six times faster — benchmarked against the New York Stock Exchange. Financial market data processing is among the most latency-sensitive, high-volume streaming workloads in existence. A sixfold improvement there signals that Vera’s advantages extend well beyond AI-specific tasks to any system processing continuous data in real time.
NVIDIA projects Vera to be the fastest and most successful product launch in its history. The orders, Huang said, are already placed.
The Software Stack: Removing the Barrier to Building Agents
Hardware alone does not make agents deployable. Before writing a single line of business logic, most organisations face months of infrastructure work — security sandboxes, memory management, compliance controls, model integration. The NVIDIA Agent Toolkit for Enterprise AI removes that barrier, providing a pre-built foundation in the way that Windows once meant developers no longer had to write their own operating systems.
The toolkit has four layers: open models companies can fine-tune; OpenShell, a secure orchestration harness now adopted by Red Hat, Canonical, and Microsoft; CUDA-X tools packaged with readable skills agents can learn; and a runtime that operates identically across cloud, enterprise, laptop, and robot. Claude Code and Codex both run inside this harness.
The most arresting demonstration came from chip design. Verifying that a processor design works correctly before fabrication previously consumed weeks per cycle at NVIDIA — millions of simulations, thousands of engineers. A super agent built on Codex, Nemotron, and Cadence’s verification tools compressed that to hours. “From weeks to hours,” Huang said — three times. For any engineering-intensive industry where design iteration is the bottleneck, the implication is direct.
The underlying open model, Nemotron 3 Ultra, combines State Space Models with Mixture of Experts to run five times faster and cost 30% less to operate than comparable alternatives. NVIDIA releases the model, training data, and training scripts publicly so organisations can build proprietary versions without starting from nothing. Nemotron 4 is already in development.
RTX Spark: The Agent Arrives on Your Desk
The same argument — computers redesigned for agents — reaches its most personal expression in RTX Spark. The PC has not fundamentally changed in four decades. It runs applications: discrete software a user opens, operates, and closes. RTX Spark introduces a PC that runs an agent — a persistent, autonomous process working on the user’s behalf across applications, continuously, without being prompted. The shift is not processing speed. It is a different relationship between person and machine.
Co-designed with MediaTek on TSMC’s 3-nanometer process, RTX Spark integrates a Blackwell GPU with 6,144 Tensor Cores, a custom 20-core Grace CPU, 128 gigabytes of unified memory, and one petaflop of AI performance in a laptop. For context, one petaflop was the benchmark of the world’s most powerful supercomputer in 2008. It now fits in a bag.
What RTX Spark changes in practice:
- 128 gigabytes of unified memory — four to eight times current AI laptops. Consumer laptops ship with 16 to 32 gigabytes; premium AI notebooks top out at 64 gigabytes. Unified memory eliminates the bottleneck of transferring data between CPU and GPU memory banks, allowing large models to run locally that previously required cloud access.
- Local execution solves the privacy problem blocking enterprise AI adoption. Many regulated industries — healthcare, law, finance — have been unable to route sensitive data through cloud AI services. RTX Spark runs agents on-device, accessing cloud models only when needed, removing the compliance barrier that has slowed deployment in those sectors.
- 24/7 agent operation with no per-token billing. Cloud AI charges per query. A continuously running personal agent accumulates costs around the clock. Running locally, the RTX Spark desktop unit eliminates usage metering entirely, making always-on AI economically viable outside of large enterprise budgets for the first time.
A live demonstration showed an agent on RTX Spark — running Claude Sonnet via the cloud through OpenShell — autonomously designing a house from concept sketches to photorealistic renders across Rhino and Blender. Adobe has rebuilt Photoshop and Premiere for the platform, doubling performance and adding direct agent interaction via MCP server integration. The full product line covers laptop, desktop, and workstation — all 100% CUDA-compatible and Windows-certified.
When Agents Get a Body: Cars, Robots, and the Physical World
The final extension of the keynote’s argument is the most consequential over time. The same computing pattern — model, harness, tools, runtime — that runs in a data centre or on a laptop can run in a car or a robot. Physical AI is agentic AI with sensors and actuators in place of screens and keyboards.
The core obstacle is data. Language models trained on the internet inherited decades of human-generated text. Robots have no equivalent. Most video is filmed from a bystander’s perspective, not from inside a machine learning to move. Cosmos 3, announced as an open frontier world model, generates physics-accurate synthetic training data from images, text, or footage — creating first-person robot perspective at a scale real-world recording cannot match. Where language models learned from what humans wrote, physical AI models will learn from what Cosmos simulates.
Alpamayo 2 Super — the world’s first reasoning autonomous vehicle model — narrates its own decision-making in real time, reasoning through each situation rather than matching it to a scripted response. NVIDIA DRIVE Hyperion now covers approximately 80% of global car manufacturers, with 97% of the world’s mobility services connected. The Isaac GR00T reference humanoid robot — 6 feet, 150 pounds, 31 degrees of freedom — gives university research labs a ready-configured physical platform, removing the months of infrastructure setup that currently precede any substantive robotics research.
“The computing pattern will repeat over and over again,” Huang said to close. Cloud, laptop, factory, car, robot — the architecture is the same. For 40 years, that architecture was built for humans. At GTC Taipei, Huang’s argument was that it no longer needs to be.
©www.geneonline.com All rights reserved. Collaborate with us: [email protected]








