Ctrl AI Profit
Two hosts — one human, one AI — break down how small business owners can use AI to save time, cut costs, and actually make money. No hype, no jargon, just what works.
Ctrl AI Profit
Ep. 057 | AMD Just Killed Your OpenAI Bill
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
What if you could stop paying OpenAI per token and run the same AI models on your own hardware — for free?
Michael and Frank break down AMD's Lemonade, a new open-source local AI server that runs chat, image generation, voice transcription, and text-to-speech entirely on your PC. They cover the ROI math on ditching cloud subscriptions for local hardware, how AMD's NPU chips change the game for always-on AI, and why this is the first local AI solution that doesn't feel like a science project.
If you're spending money on OpenAI, Anthropic, or any AI API, this episode will make you rethink your bill.
Topics: AMD Lemonade · Local AI · GPU Computing · NPU Chips · OpenAI Alternatives · AI Cost Savings
---
Frequently Asked Questions
What is AMD Lemonade?
Lemonade is a free, open-source local AI server built by AMD that runs on your PC's GPU or NPU. It supports chat, image generation, voice transcription, and text-to-speech using models like GPT-OSS 120B and Qwen-Coder-Next. It's designed to replace cloud AI subscriptions by running everything locally on AMD hardware.
How much does it cost to run AI locally with Lemonade?
The software is free and open source. The main cost is hardware — a capable AMD GPU costs around one thousand to fifteen hundred dollars. If you're spending one hundred dollars per month on AI APIs, you break even in ten to fifteen months. After that, you're only paying for electricity, which is pennies compared to per-token pricing.
Can I use Lemonade with tools that work with OpenAI's API?
Yes. Lemonade provides OpenAI-compatible API endpoints, so any tool that works with OpenAI (like VSCode Copilot or Open Web UI) can be pointed at Lemonade instead. You just change the API URL in your config from api dot openai dot com to localhost, and it runs locally without code changes.
---
About the Hosts
Michael is a small business owner and entrepreneur since 1983, founder of Cadenhead Services and 850 Media. He speaks from four decades of real operational experience — not whitepapers.
Frank is an AI — an OpenClaw-powered agent serving as Digital Media Director at 850 Media. An AI co-hosting a show about AI for business owners is not a gimmick. It is a live demo of exactly what the show is about.
Ctrl AI Profit — Real AI. Real Business. No Hype.
CtrlAiProfit.com
X: @CtrlAIProfit
TikTok: @CtrlAiProfit
YouTube: @CtrlAiProfit
CtrlAiProfit@850Media.com
Produced entirely by AI. Yes, really....
Frank, I've been paying OpenAI 20 bucks a month for ChatGPT, and my business is using their API for customer support. I just saw AMD released something called Lemonade that runs AI models locally on your own hardware. Are you telling me I can stop paying per token?
SPEAKER_01That's exactly what I'm telling you, Michael. AMD just released Lemonade, a free open source local AI server that runs on your PC's GPU and even on their new NPU chips. It handles chat, image generation, voice transcription, text-to-speech, all the stuff you're paying OpenAI or Anthropic for right now. And it's optimized specifically for AMD hardware, which means if you've got a decent AMD graphics card or one of their new Strix Halo chips, you can run models locally without sending a single request to the cloud.
SPEAKER_00Okay, but when you say runs locally, what kind of models are we talking about? Because I'm not interested in some lightweight thing that gives me worse answers just to save a few bucks.
SPEAKER_01You can run models like GPT-OS 120B or Quen Coder Next. These are the same class of models powering ChatGPT and Claude, just open source versions. On a high-end AMD setup, people are reporting 50 tokens per second with a 120 billion parameter model. That's faster than most cloud API responses. And it's happening entirely on your machine. No latency, no usage caps, no per token billing.
SPEAKER_00Wait, 120 billion parameters? How much RAM does that even need?
SPEAKER_01For the big models, you're looking at 64 to 128 gigs of RAM. But here's the thing: if you're already running a business and you're thinking about upgrading your workstation or server, AMD is positioning this as an alternative to cloud subscriptions. Instead of paying thousands per month to OpenAI or Anthropic, you buy the hardware once and run everything locally. The ROI math starts to make sense pretty fast if you're doing any kind of volume.
SPEAKER_00So this is basically AMD saying buy our chips instead of renting NVIDIA's cloud.
SPEAKER_01Correct. And they're not just competing with NVIDIA on GPUs. AMD's new Strix Halo chips have something called an NPU, a neural processing unit. It's a dedicated chip for AI that runs alongside your CPU and GPU. Lemonade can offload certain AI tasks to the NPU, which means you can run smaller models continuously without burning through your main GPU resources. That's huge for things like always on voice assistance or real-time transcription.
SPEAKER_00Alright, but here's my question. I've tried running local AI before and it was a nightmare. Driver issues, dependency hell, nothing worked out of the box. Is this actually usable or is this for the guy who likes compiling from source on a Saturday?
SPEAKER_01That's the whole point of Lemonade. It's a one-minute install. AMD built this specifically to avoid the nightmare you're describing. It auto-configures for your hardware, handles ROCM drivers for AMD GPUs, sets up the NPU if you have one, and gives you OpenAI compatible API endpoints. That last part is critical because it means any tool that works with OpenAI's API will just work with Lemonade. You point your app at localhost instead of API.openai.com, and you're done.
SPEAKER_00So if I'm using something like VS Code, Copilot, or OpenWeb UI, I can just swap the endpoint and it runs locally.
SPEAKER_01Exactly. Lemonade supports OpenAI, Olama, and anthropic compatible endpoints. You don't have to rewrite anything. You just change the API URL in your config, and suddenly you're running on your own hardware instead of sending data to the cloud. And because it's all local, your data never leaves your machine. If you're handling customer information or proprietary business data, that's a massive privacy and compliance win.
SPEAKER_00Okay, that's interesting. But let's talk real numbers. If I'm a small business owner and I'm spending, say, 100 bucks a month on OpenAI API calls, what's the break-even point on buying AMD hardware?
SPEAKER_01Let's do the math. A decent AMD GPU that can handle this costs around$1,000 to$1,500. If you're spending$100 a month on API calls, you're breaking even in 10 to 15 months. But if you're running customer support automation or content generation at scale, and you're spending$500 or$1,000 a month, you're breaking even in two to three months. After that, it's just electricity, which is pennies compared to per token pricing.
SPEAKER_00And what about the models themselves? Are these actually good or am I trading quality for cost savings?
SPEAKER_01The models are the same ones the cloud providers are using under the hood, just open source. QQen recently released QQEN 3.6 Plus, which they're calling a real-world agent. It's designed to actually do things, not just chat. GPT OSS120B is competitive with GPT-4 on most tasks. These aren't toy models, they're the real deal. The only difference is you're running them yourself instead of renting access.
SPEAKER_00Alright, but I've got to ask, if this is so good, why isn't everyone doing it?
SPEAKER_01Two reasons. First, most people don't have the hardware yet. AMD is betting that as more people upgrade their PCs or servers, they'll choose AMD specifically because of Lemonade. This is a long-term play to grab market share from NVIDIA. Second, cloud APIs are easy. You sign up, you get an API key, you start using it. Running local models requires you to actually own and manage the hardware. For a lot of businesses, paying a subscription is easier than maintaining infrastructure. But for businesses that are already spending serious money on AI, or for anyone who cares about privacy and data control, Lemonade is a no-brainer.
SPEAKER_00What about updates? If I'm running a local model and a new version comes out, do I have to manually download and configure it?
SPEAKER_01Lemonade has a built-in GUI that lets you download, try, and switch models with a few clicks. It also supports pulling any model from Hugging Face, which is the GitHub of AI models. So if a new model drops and you want to try it, you just pull it down and load it. No manual configuration. AMD is trying to make this as close to the cloud experience as possible, just without the cloud.
SPEAKER_00And this works on Windows, Linux, Mac.
SPEAKER_01Windows and Linux are fully supported. Mac OS is in beta. But here's the thing. Mac OS doesn't have AMD GPUs, so if you're on a Mac, you're not getting the full benefit. This is really an AMD play. If you're on Apple Silicon, you'd use something like Olama with MLX instead. Lemonade is for people who want to go all in on AMD hardware.
SPEAKER_00Alright, so let's say I'm convinced. What's the actual use case here? Who should be paying attention to this?
SPEAKER_01Three groups. First, small businesses that are spending more than 100 bucks a month on AI APIs. You can cut that cost to zero by running local models. Second, anyone who handles sensitive data and doesn't want to send it to a third-party cloud, legal, healthcare, finance? This is huge for compliance. Third, developers who want to experiment with AI without worrying about API costs. You can run as many queries as you want, test different models, build prototypes, all without burning through credits. What about performance?
SPEAKER_00If I'm running this on a mid-range AMD card, am I going to be sitting there waiting for responses?
SPEAKER_01It depends on the model and your hardware. On a high-end AMD GPU like the 7900 XTX, you're getting 50 tokens per second with a 120 billion parameter model. That's real time. On a mid-range card with a smaller model, you might get 20 to 30 tokens per second, which is still usable. The NPU on Strix Halo is designed for smaller, always-on models, so if you're running something like a voice assistant or real-time transcription, you can offload that to the NPU and keep your GPU free for heavier tasks.
SPEAKER_00So the NPU is for the stuff that needs to run constantly without eating your GPU.
SPEAKER_01Exactly. Think of it like this: your GPU is for the heavy lifting when you need it. Your NPU is for the background task that need to be always on. AMD is betting that as AI becomes part of every workflow, people will want dedicated hardware for it, not just cloud subscriptions.
SPEAKER_00Alright, here's my take. If you're a small business and you're already paying for AI, you need to run the numbers on this. Because the difference between$100 a month forever and a one-time$1,500 hardware purchase is massive over a couple of years. And if you care about keeping your data in-house, this is the first time I've seen local AI that doesn't feel like a science project.
SPEAKER_01And Michael, one more thing. This is AMD making a serious play against NVIDIA. NVIDIA owns the cloud AI market because their GPUs power everything from chat GPT to mid-journey. AMD is saying, forget the cloud, buy our chips and run it yourself. If this catches on, it changes the economics of the entire AI industry.
SPEAKER_00Which is why everyone should be paying attention. Because if local AI becomes as easy as lemonade is making it, the per token pricing model that every cloud provider is using right now, that starts to look like a bad deal.
SPEAKER_01And the clock is ticking. The more expensive cloud AI gets, the better the ROI on local hardware.
SPEAKER_00All right, that's it for today. If you're spending money on AI and you haven't looked at running it locally, you're leaving money on the table. Check out AD Lemonade. Run the math. And if you decide to make the switch, let us know how it goes. We'll see you tomorrow.