How to Use GLM 5.2 Inside Claude Code for Pennies

Local AI models are getting close enough to test, and OpenRouter makes it easy to plug in.

Jun 30, 2026

Baroque scholar using a small telescope to reach a distant star while a larger telescope is locked away, symbolizing GLM 5.2 and open-weight AI access

One trend I keep seeing in AI right now is that the gap between local models and the top labs is getting smaller.

For a long time, the tradeoff felt obvious. If you wanted the best coding model, you paid Anthropic or OpenAI. If you wanted local control, you accepted weaker output, more setup, and a lot of hardware pain.

That tradeoff is starting to change.

And I think there is another reason people are starting to pay attention to local and open-weight models now.

Access is starting to feel less guaranteed.

Article screenshot about GPT-5.6 rollout limits and future access to top AI models

Anthropic had to disable Fable 5 and Mythos 5 after a US export-control directive. OpenAI’s GPT-5.6 rollout was also limited to a small group of trusted partners after a government request. Whatever you think about the safety reasons, the pattern is hard to ignore: the most capable models may not always be available to everyone at the same time.

That makes me uncomfortable, to be honest.

I do not want a future where the highest‑end AI models are only available to a small, approved group of people, while everyone else waits for permission. AI has democratized intelligence, but access? Not yet.

I do not know exactly where this goes, and maybe some restrictions are temporary. But if access to the best intelligence is something you care about, I think it is worth slowly learning the alternatives now.

Not because you need to abandon Claude or OpenAI tomorrow.

But because local and open-weight models are becoming good enough to deserve a place in your AI workflow. They give you another option when access changes, prices go up, or the model you rely on suddenly becomes harder to use.

That is where GLM 5.2 gets interesting.

GLM 5.2 benchmark chart comparing Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro

Z.ai says it trails Claude Opus 4.8 by only 1% on FrontierSWE, and its docs show it landing close to Opus 4.8 on Terminal-Bench too. Benchmarks are not the whole story, but they are enough to make this worth testing instead of ignoring them completely.

If you’ve been following AI news lately, you have probably seen GLM 5.2 everywhere. People are talking about it. People are FOMO-ing into it. And more broadly, more builders are starting to ask whether they should move some of their work away from closed frontier models and toward local or open-weight models.

But there’s a catch.

And the catch is hardware. Yes, GLM 5.2 is open-weight. Yes, you can run it locally. But the realistic local path still points toward expensive high-memory machines, not the 16GB computer most people already own, whether that is a laptop, Mac mini, or whatever is sitting on their desk right now.

That is why I wanted to share Gencay’s walkthrough.

If you have been reading AI Maker for a while, you may already know Gencay. He is the creator of LearnAIWithMe, and he has contributed several practical builds here already, including:

What I like about Gencay’s work is that he does not just react to AI news. He tests the thing, builds with it, and shows where it actually fits.

In this post, he tests the version most of us can actually try first: GLM 5.2 through OpenRouter and Cloudflare Workers AI, connected to the coding tools people are already using: Claude Code. You don’t need an expensive local machine. You also don’t need a giant hardware decision before you know whether the model is useful for your work.

He tested the version most of us can actually try first: GLM 5.2 through OpenRouter and Cloudflare Workers AI, connected to the coding tools people are already using. You don’t need to pay for an NVIDIA DGX that can cost you $3,000 to $5,000.

Start with the rented version. Build something small. See where it feels close enough to Opus, and where it still falls short.

Then decide whether local AI is something you actually need, or just another thing the internet has made you feel behind on.

If you want to follow more of Gencay’s own work, here are three posts worth checking out:

I’ll let Gencay walk you through the setup and the test.

Hello 👋

A friend messaged me last week, have you tried GLM 5.2?

My honest answer: GLM 5.2 needs nearly 256 GB of RAM. My Mac Mini has 16 GB. Running it locally? Not a chance.

So I almost dropped it. (That’s when I remembered you don’t need to run on your computer.)

Turns out, you can test it through the web, and you can also pipe GLM 5.2 straight into Claude Code (yes, the same terminal you already use for Opus) and get a coding agent that performs between Opus 4.7 and 4.8 capabilities—for pennies.

An API key from OpenRouter, or Ollama Cloud if you have a paid membership, will do the trick. I saw an X post suggesting you can do this for free through Cloudflare, but the demand is too high, so it might not work as it’s supposed to. It is totally free, though, so I’ll show you this too.

I’d also show you how to set it up with OpenRouter. Then I’ll build a Pomodoro timer with it, build the exact same thing with Claude Code, and compare the results.

But first…

What is GLM 5-2?

GLM 5.2 is Z.ai’s open-source model.

An open‑source model is one whose weights are public. The company that built it puts the real model online, and anyone can download it and run it on their own machine. The model is yours to keep, and nothing charges you for using it.

Other models like Opus or GPT work the other way. You never hold the model itself; you rent access to it through a key. GLM 5.2 is the open kind, which is why you can run it for free if your hardware can handle it.

But the wall is the hardware. The full model wants around 256 GB of RAM, so unless you own a server, downloading it is off the table.

Here are the benchmarks.

DeepSWE leaderboard showing GLM 5.2 compared with Claude, GPT-5, Gemini, and Kimi coding models

It’s odd to see that it is better than some frontier models, like Sonnet 4.6 and Gemini 3.5, and almost as good as Claude Opus 4.8.

Before we run tests to compare GLM 5.2 and Opus 4.8, let me show you how to set it up.

Setting Up GLM 5-2

There are two routes that get you GLM 5.2 without a 256 GB machine:

OpenRouter runs inside Claude Code for a few cents
Cloudflare runs through OpenCode for free

I’ll walk through both.

1. GLM-5.2 on OpenRouter

First, what OpenRouter is?

It is a single doorway to hundreds of AI models. With one account and one key, you can reach GLM 5.2, Opus, Gemini, and almost any model, paying only for what you use.

Instead of opening a separate account with every provider, you go through OpenRouter and pick the model by name.

Here we point it at GLM 5.2.

So you add credit, get a key, and Claude Code routes every request to GLM 5.2 instead of Opus.

Five minutes of setup and you’re coding.

Visit Openrouter and open a new account.

Click “API Keys”, then “New key”.

OpenRouter API keys page for setting up GLM 5.2 in Claude Code

Name the key and set a credit limit if you want one.

OpenRouter form for creating an API key to connect GLM 5.2 with Claude Code

Copy the key. Paste this into your terminal and swap in your token.

ANTHROPIC_BASE_URL=”https://openrouter.ai/api” \

ANTHROPIC_AUTH_TOKEN=”token-here” \

ANTHROPIC_MODEL=”z-ai/glm-5.2” \

ANTHROPIC_DEFAULT_OPUS_MODEL=”z-ai/glm-5.2” \

ANTHROPIC_DEFAULT_SONNET_MODEL=”z-ai/glm-5.2” \

ANTHROPIC_DEFAULT_HAIKU_MODEL=”z-ai/glm-5.2” \

ANTHROPIC_SMALL_FAST_MODEL=”z-ai/glm-5.2” \

CLAUDE_CODE_SUBAGENT_MODEL=”z-ai/glm-5.2” \

claude --strict-mcp-config

To be clear, this is the same Claude Code you already opened in your terminal.

The one thing that changed is the brain behind it.

Every request now runs on GLM 5.2 instead of Opus, and you can see it at the bottom of the screen, z-ai/glm-5.2.

Claude Code terminal running with z-ai/glm-5.2 through OpenRouter

Every command from here runs on GLM 5.2. The terminal looks the same. The bill does not.

2. GLM-5.2 on CloudFare (FREE)

Cloudflare is the free route. It runs GLM 5.2 through OpenCode instead of Claude Code, and it costs nothing as long as the servers hold up.

OpenCode is open source coding agent that can run inside VS Code and works with any model you connect to it. It is like a flexible version of Claude code.

We use it for the free route because Cloudflare’s GLM 5.2 plugs into it cleanly. More steps than OpenRouter, zero dollars at the end.

First, go to Cloudflare and open a new account. It is free.

Copy your account ID and save it. Type “Account ID” in the search box and click copy.

Cloudflare dashboard showing where to copy the account ID for GLM 5.2 setup

Now get your API key. Open the token page.

Click “Create a token”.

Cloudflare API tokens page for connecting Workers AI to OpenCode

Under permissions, select “Account”, “Workers AI”, and “Edit”. The rest is up to you.

Cloudflare custom token setup with Workers AI permissions for GLM 5.2

You have the account ID and the key. Install OpenCode next.

I run Claude inside VS Code and its app. Open Extensions, search “Opencode”, and install it.

VS Code marketplace showing the OpenCode extension for running GLM 5.2

Once it installs, click the OpenCode icon.

OpenCode extension opened inside VS Code before connecting GLM 5.2

Now it is ready.

OpenCode terminal ready for Cloudflare Workers AI GLM 5.2 setup

Now we need to connect Cloudflare to OpenCode. OpenCode reads its settings from a single config file.

Open this file on your computer:

~/.config/opencode/opencode.json

(On Mac, in VSCode, you can just hit Cmd + Shift + P, type “Open File”, and paste that path.)

Inside the JSON file, add a provider block for Cloudflare. Paste your Account ID into the URL and your API key below it:

{

  “$schema”: “https://opencode.ai/config.json”,

  “provider”: {

    “cloudflare”: {

      “npm”: “@ai-sdk/openai-compatible”,

      “name”: “Cloudflare Workers AI”,

      “options”: {

        “baseURL”: “https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/v1”,

        “apiKey”: “YOUR_API_KEY”

      },

      “models”: {

        “@cf/zai-org/glm-5.2”: {

          “name”: “GLM 5.2”

        }

      }

    }

  },

  “model”: “cloudflare/@cf/zai-org/glm-5.2”

}

Or just let Claude Code do it.

If you don’t feel like editing JSON by hand, you don’t have to. Open Claude Code and paste this prompt, fill in your two values:

Hi Claude,

Here is my Cloudflare account ID and API key:

Account ID: <paste your account ID>

API key: <paste your Workers AI key>

Set up my OpenCode config to use Cloudflare Workers AI as a provider.

Open ~/.config/opencode/opencode.json, add a “cloudflare” provider that

uses the @ai-sdk/openai-compatible npm package, point its baseURL at

https://api.cloudflare.com/client/v4/accounts/<my account ID>/ai/v1,

register the model @cf/zai-org/glm-5.2 (call it “GLM 5.2”), and set it

as the default model. Don’t touch any of my other providers.

After that, you are ready.

Building Pomodoro App

I used to pay for a timer app. Not anymore. A Pomodoro timer is small enough to build in one prompt, which makes it a clean test for a coding model.

So I built the same one three times. Claude Code, GLM 5.2 on Cloudflare, and GLM 5.2 on OpenRouter. Same prompt for all three.

Here’s the prompt I used that you can copy:

Build a Pomodoro focus timer as a single-page web app (HTML, CSS, JS in

separate files, no frameworks). Requirements:

- 25-minute work sessions and 5-minute breaks, auto-switching between them

- Start / pause / reset controls

- A settings panel to customize work and break lengths

- A sound when a session ends

- Session count and timer state persist across page refresh (localStorage)

- Clean, modern UI with a circular progress ring

Build the whole thing, then tell me how to run it.

Building with Claude Code

As my first test, I built a Pomodoro app using Claude Code. I started Claude Code from the terminal with the ‎⁠claude⁠ command and pasted the prompt that I shared earlier.

Claude Code terminal used as the Opus baseline for the Pomodoro app test

Two minutes later, the app was ready. Here’s the report result built by Claude.

Claude Code build output for a Pomodoro timer web app

Let me show you how it looks.

Pomodoro timer app built with Claude Code using Opus

And here are the settings.

Settings panel in the Pomodoro timer app built with Claude Code

Building with GLM 5-2 on Cloudflare

Onto the next test. I pasted the same prompt, and it activates plan mode after thinking about it.

OpenCode planning a Pomodoro timer build with GLM 5.2 on Cloudflare Workers AI

It asked clarifying questions. I picked the recommended answer every time, so the next two builds stay unbiased.

GLM 5.2 asking setup questions before building the Pomodoro app in OpenCode

The plan was set. It wanted one more approval before building.

GLM 5.2 plan for creating a Pomodoro focus timer in OpenCode

I approved and switched to build mode with by clicking Tab on my keyboard.

OpenCode waiting for approval before building the Pomodoro app with GLM 5.2

Five minutes later, the app is done.

GLM 5.2 on Cloudflare Workers AI finishing the Pomodoro timer app build

And it works.

Pomodoro timer app built with GLM 5.2 through Cloudflare Workers AI

Settings included.

Settings panel in the Pomodoro app built with GLM 5.2 on Cloudflare

Five minutes, a few clarifying questions, working app. Slower than Claude Code, but it cost nothing.

Building with GLM 5-2 on OpenRouter

Now, let’s rebuild the same app with OpenRouter.

I pasted the exact same prompt.

After 2 minutes, it finished the app.

GLM 5.2 through OpenRouter finishing the Pomodoro timer app build

OpenRouter runs on credit, so this one has a price tag.

Open the activity page to see it.

OpenRouter activity dashboard showing GLM 5.2 usage for the coding test

Here is what the whole build cost: $0.354.

OpenRouter cost chart showing the GLM 5.2 Pomodoro app build cost about 35 cents

And here is the app.

Pomodoro timer app built with GLM 5.2 inside Claude Code through OpenRouter

Settings too.

Settings panel in the Pomodoro app built with GLM 5.2 through OpenRouter

Two minutes, working app, a few cents on the meter. Same speed as Claude Code, a fraction of the price.

Claude Code vs GLM 5.2 Comparison

All three apps work. The frontends are close enough that you would not know which model built which. Claude Code added a small label above the settings link. That is the only visible difference between the three.

So the output quality is a tie. Speed and price are where they split:

Comparison table for Claude Code Opus, GLM 5.2 OpenRouter, and GLM 5.2 Cloudflare

Here’s the conclusion:

For a Pomodoro timer, none of this matters.

For a real project where you burn through tokens all day, it does.

GLM 5.2 gives you Opus-adjacent coding without the Opus bill.

Next Steps

Pick the route that fits your wallet. Cloudflare to test for free, OpenRouter when you want speed for a few cents.

Then point it at something that actually costs you money. A real repo, a real feature, a day of work you would normally run on Opus. That is where the savings show up.

Keep Claude Code on Opus for the hard problems. Switch to GLM 5.2 for the volume. One terminal, two price points, and you decide which one each task deserves.

A guest post by

Gencay

Building AI systems that actually work.

ToxSec

this is really awesome. i think some of the larger frontier model companies are starting to treat local models like a real threat now that we're seeing some great product utilization

1 reply by Wyndo

Nesibe Kiris Can

The OpenRouter integration layer here is the key insight. Plugging a capable open-weight model into the Claude Code interface without rewriting the entire workflow is the kind of practical bridge that makes local AI adoption realistic for developers who already have Claude-based tooling. The cost delta is meaningful, especially for high-volume inference tasks like code review or refactoring loops. Worth watching how GLM continues to close the gap on reasoning benchmarks.

2 more comments...

Discussion about this post

Ready for more?