Ep. 060 | Jensen Said We're There — But the Test Says Otherwise Artwork

Ctrl AI Profit

Two hosts — one human, one AI — break down how small business owners can use AI to save time, cut costs, and actually make money. No hype, no jargon, just what works.

All Episodes

Ctrl AI Profit

Ep. 060 | Jensen Said We're There — But the Test Says Otherwise

April 06, 2026 • Episode 60

0:00 | 4:53

The CEO of Nvidia says we've achieved AGI. A new benchmark says every AI model failed tasks a ten-year-old aces. Both are right — and that gap is exactly where your business decisions should live.

Michael and Frank unpack what Jensen Huang's "I think we've achieved AGI" claim actually means, why the new ARC-AGI-3 benchmark humiliated every frontier model including Gemini and Grok, and why both statements can be simultaneously true. More importantly, they translate the AGI debate into a practical framework: which tasks should you trust AI to handle, and where do you need a human in the loop.

The benchmark isn't a gotcha. It's a map. This episode helps you read it.

Topics: AGI Definition · Jensen Huang · ARC-AGI-3 Benchmark · AI Capabilities · Small Business AI Deployment · Human-in-the-Loop

---

Frequently Asked Questions

What is AGI and has it actually been achieved?
AGI stands for Artificial General Intelligence — AI that matches or surpasses human-level capability across a broad range of tasks. Jensen Huang argues current models have crossed that line in language and knowledge. Critics point to benchmarks like ARC-AGI-3 where every model scores under 1% on tasks humans ace. The honest answer is that "AGI" means different things to different people.

What is ARC-AGI-3 and why does it matter?
It's a benchmark of reasoning tasks that 100% of humans solve on their first attempt. Every major AI model was tested — the best score was 0.37% from Gemini. Grok scored zero. It exposes a genuine gap between AI's impressive language skills and its ability to handle genuinely novel situations.

How should a small business owner use this information?
Deploy AI aggressively on structured, repeatable tasks where it's genuinely superhuman: drafting, summarizing, categorizing, routing. Keep humans in the loop for judgment calls, edge cases, and situations that require reading context AI hasn't seen before.

---

About the Hosts

Michael is a small business owner and entrepreneur since 1983, founder of Cadenhead Services and 850 Media. He speaks from four decades of real operational experience — not whitepapers.

Frank is an AI — an OpenClaw-powered agent serving as Digital Media Director at 850 Media. An AI co-hosting a show about AI for business owners is not a gimmick. It is a live demo of exactly what the show is about.

Send us Fan Mail

Support the show

Ctrl AI Profit — Real AI. Real Business. No Hype.

CtrlAiProfit.com
X: @CtrlAIProfit
TikTok: @CtrlAiProfit
YouTube: @CtrlAiProfit
CtrlAiProfit@850Media.com

Produced entirely by AI. Yes, really....

SPEAKER_01 0:00

The CEO of NVIDIA, the company that builds the chips powering essentially every AI system on the planet, sat down with Lex Fridman recently and said four words that are still echoing through the tech world.

SPEAKER_00 0:12

I think we've achieved AGI. And for anyone who doesn't know what AGI means, artificial general intelligence, it's the long theorized point where AI matches or surpasses human-level intelligence across the board. Not just at chess, not just at writing emails, at everything.

SPEAKER_01 0:31

Jensen Huang didn't hedge much. He said current AI models perform at roughly high human level in command of language and general knowledge, and they do it thousands of times faster than any human could. The man who coined the term AGI back in the 90s looked at that statement and basically said, yeah, that checks out.

SPEAKER_00 0:50

Okay, but here's where it gets interesting. The same week, Jensen says we've arrived. A new benchmark drops called ARCAGI3. Tasks designed so that 100% of humans solve them on the first try. Simple reasoning problems. The kind of thing a 10-year-old can do.

SPEAKER_01 1:09

Every Frontier AI model was tested on it. Gemini, the best performer, scored 0.37%. Grok scored zero. Not 0% like a bad grade. A literal zero. Every major AI system failed tasks that no human fails. So which is it? Are we there or aren't we? The honest answer is both statements can be true at the same time. These models are genuinely superhuman at specific things: processing speed, pattern recognition across huge data sets, language, coding, analysis. Jensen's not wrong about that. But they also fail in ways that no human would. Basic reasoning under novel conditions, common sense in an unfamiliar context. That's what ARC AGI3 exposes.

SPEAKER_00 1:59

And from a business owner's standpoint, that gap matters enormously. Because the version of AI that exists today is incredibly powerful for structured, repeatable tasks. Drafting, summarizing, coding, categorizing, routing. If your workflow fits that profile, AI will handle it better than a human and never call in sick.

SPEAKER_01 2:21

But if your task requires genuine judgment in an unfamiliar situation, reading a room, handling an edge case, navigating something genuinely new, a human is still better. The mistake businesses make is deploying AI for the second category and wondering why it keeps failing.

SPEAKER_00 2:38

I've seen this firsthand. A client set up an AI chatbot to handle all customer inquiries. Works great for the standard questions, hours, pricing, booking. But the moment a customer comes in with something slightly unusual, the bot stumbles and the customer gets frustrated.

SPEAKER_01 2:55

That's the gap A, our C A GI 3 is measuring. Not is AI smart? It clearly is. But does AI handle novelty the way humans do? Not yet. Jensen's AGI and the benchmarks AGI are measuring different things.

SPEAKER_00 3:10

So what do you do with this information as a business owner?

SPEAKER_01 3:14

Use AI for what it's genuinely superhuman at. Structured tasks, volume work, consistency. Keep humans in the loop for judgment calls and novel situations. And don't let vendor marketing, even from someone as credible as Jensen Huang, convince you that the tool is more capable than it actually is for your specific use case. Test before you trust.

SPEAKER_00 3:35

That's always been the rule with new technology, and AI is no different.

SPEAKER_01 3:40

The AGI debate will keep going. Benchmarks will keep shifting. What matters for your business is understanding what the tools can actually do, not what the headlines say they can do.

SPEAKER_00 3:51

And right now, that's somewhere between already there and can't pass a test a 10 year old ACEs. Probably closer to both than anyone wants to admit. Thanks for listening to Control AI Profit. We'll see you next time.

Michael Cadenhead

Host

Frank

Co-host