I Built My Own AI Creative Engine (Here's How You Can Too)
Warning: You’ll find a lot of infographics running around here.
I think I’ve always been a sucker for AI tools that enable me to do more and bigger things that weren’t possible before.
This is why AI tools like vibe-coding and automation have a special place for me because they let me explore their rabbithole like a kid who just got a new toy. Tools like Cursor, Make, n8n. I like to think of these tools as Lego blocks that I can play around with, plug in, detach, reconfigure.
They enable me to build things that solve my own problems and automate the parts of my work that are boring and repetitive. Especially with AI automation tools, there’s something deeply satisfying when I can dissect and break down a complex system and turn it into a step-by-step workflow that streamlines my work, saves my time, and frees me up to tackle the challenges I actually enjoy.
But if we take a step back from this, I think we’re witnessing an evolution in how we all use AI: from talking with chatbots to building systems that turn imagination into reality, especially for creative process.
Here’s what I mean.
Using individual AI tools works great—until you need them to work together.
ChatGPT is excellent for text. Midjourney creates stunning images. ElevenLabs does amazing voice work. Kling AI does a really good job at generating videos. Each tool is powerful on its own.
But what happens when your creative process spans multiple models? When you need text and images and video and narration to generate a short-form movie?
Right now, you’re manually stitching them together:
Write text-to-image prompt in ChatGPT → copy output → paste into Midjourney → download the image → upload to video tool → generate voice separately → import everything → edit → export.
Yet despite the abundance of AI tools, our workflows are still isolated inside each tool. It’s inefficient.
What if we could automate them inside a unified space?
What I mean is when you can take text, transform it into an image, enhance that image with a different model, turn it into a video, add narration, and package it—all in one automated sequence—that’s when things change.
This is what separates someone who uses a single AI model from someone who orchestrates multiple AI models to achieve a goal. You’re not just using AI anymore; you’re conducting it.
This is what I’ve been experiencing with Glif.app, and it’s fundamentally changed how I think about what’s possible with AI workflows.
At its simplest, Glif lets you build AI workflows and agents by connecting multimodal AI models: text, image, video, and audio. Then you access them through a simple UI you can actually use without wrestling with complexity. It works like Make, n8n, or Zapier, but it’s purpose-built for AI from the ground up, which means it’s dramatically easier to build and deploy.
Here’s what that looks like in practice:
I had a recurring problem: Every newsletter I write could benefit from visual infographics to break up the text and make complex ideas more digestible. But creating custom infographics for every post? That’s hours of design work I don’t have time and skill for.
So I built a glif that solves it. I paste in a paragraph from my newsletter. I select my visual style and size. The workflow analyzes the text, extracts the key concepts, structures them into an infographic layout, and generates a polished visual using AI image models—all in about 30 seconds as you can see on the image above.
What used to take me an hour (or require hiring a designer) now happens while I’m grabbing coffee.
That’s the shift: AI is no longer just a tool that helps me write better, it also helps me work differently.
And once you start thinking this way—once you stop seeing AI as a chatbot and start seeing it as composable building blocks—the possibilities multiply fast.
Well, I can see what’s on your mind right now, you might argue that this could also have been done inside Gemini using Nano Banana Pro, right? Yes, you’re correct, but you’re limited to Nano Banana Pro, and your image result includes the Gemini icon at the bottom-right of the image, I don’t think most people want this.
And what if you want to try another image model like GPT 1.5, Seedream, or Flux Pro? Your possibilities are limited. Glif unlocks it for you.
Glif Framework - How multimodal workflows work
So if AI orchestration is the shift we’re making—from using individual tools to conducting them—the obvious question is: how does this actually work?
Because here’s the thing: Most AI automation tools were built for connecting business apps. Zapier connects your email to your CRM. Make.com connects your forms to your database. They’re incredible at that. But they weren’t designed for the thing we actually need now: seamlessly connecting text models to image models to video models to audio models.
That’s the gap Glif fills.
It’s not trying to be another general automation platform. It’s purpose-built for one thing: making multimodal AI workflows as easy to build as a Lego set.
What Makes Glif Different
Let’s explore what this actually means.
1. Multimodal Connections Are Native, Not Hacked Together
In traditional automation tools, connecting AI models is painful:
Image outputs from one model need manual URL formatting for the next
Audio files require custom storage solutions
Every model has different API requirements
Glif treats multimodal as the default. Text flows into image models. Images flow into video generators. Audio layers on top. It’s all native.
What this means in practice: You can build a workflow that takes text, generates an image, enhances that image with a different model, turns it into a video, and adds AI voiceover—without writing a single line of code or setting up file storage.
The models just... talk to each other.
And there are a lot of models to choose from.
For example, the image above shows a few options for each multimodal feature available on Glif.
2. Visual Builder That Actually Makes Sense
I’ve used visual workflow builders before. Most of them feel like they’re trying to make complex things look simple without actually making them simple.
Glif’s different. The visual canvas maps directly to how you’d think about the workflow:
Input blocks: What data starts the process? (text, image, URL, number)
Processing blocks: What AI models transform that data? (LLMs, image & audio generators, video models, etc)
Output blocks: What do you want at the end? (image, text, video, structured data)
You drag, connect, and configure. No JSON formatting. No API authentication. No webhook debugging.
3. Remix Culture: Learn by Reverse-Engineering
This might be the most underrated feature.
Glif has a community gallery where people share their workflows publicly. But here’s the powerful part: You can “remix” any workflow, which means you can:
See exactly how someone built something
Copy their entire workflow as a starting point and modify them
Learn patterns by studying what works
It’s like open-source for AI workflows.
I’ve learned more about effective workflow design from remixing other people’s glifs than I would from any tutorial. You see how someone structured a complex video generation workflow, and suddenly you understand the pattern you can apply to your own use case.
The Two Ways to Build: Workflows vs Agents
Now, let’s explore two main approaches to using Glif, depending on what you need:
Workflows: Input → Process → Output
These are straightforward, linear sequences:
You define input blocks (what data the user provides)
You chain processing steps (AI models that transform the data)
You specify output (what gets generated at the end)
Think of it like a recipe: Same ingredients (inputs), same steps (processing), same dish every time (output).
Example from the community: Youtube thumbnail generator.
Input: Brief description of the thumbnail
Processing: Send the brief description to Claude → Enhance the input into a text-to-image prompt → Generate the image using the Flux model
Output: Polished thumbnail ready to upload
Every time you run it with new episode details, it follows the exact same sequence. Predictable, reliable, fast.
When to use workflows: When you have a clear input-to-output transformation that follows the same steps every time. Social graphics, content repurposing, format conversions.
Agents: Workflows with Intelligence and Tools
Agents are workflows that can make decisions and use additional capabilities:
Decision-making: The agent chooses which path to take based on the input
Tool access: Can search the web, analyze URLs, access workflows
Multi-step reasoning: Can break down complex tasks into sub-tasks
Think of it like your junior team: You give them instruction, they figure out what to do, gather information, and come back with the results.
Example from the community: A short-form video agent.
Input: Share to the agent what short-form video you want, which includes video format, the narrative, and visual style
Processing:
Run web search to conduct research
Generate an image using Seedance
Combine images with Nano Banana Pro
Create text-to-speech using ElevenLabs
Create the video using Kling AI
Add subtitles
And many more
Output: A complete package of a short-form video that you can post on Youtube or Tiktok
The agent decides what to do based on the request. It follows the system instruction to do its job.
When to use agents: When the task requires research, decision-making, or adaptive behavior. Not only can the agent do the creative work, it can also handle content analysis, SEO research, data gathering, and strategic recommendations.
What I’ve Built with Glif
Enough with the concept, let me show you what this actually looks like when you start building them.
I’ve built three glifs over the past few weeks. Each one solves a specific recurring problem I had with my newsletter workflow. More importantly, each one changed how I work, not just how fast I work.
Build #1: The Infographic Generator (Workflow)
The Problem I Had
Every newsletter I write is dense with ideas. Frameworks, mental models, step-by-step processes. The kind of content that should have visual breakdowns to make it digestible.
I’d finish writing a post, realize a section would work better as an infographic, and then... just not do it. Because creating a custom infographic meant:
Opening Canva or Figma
Staring at a blank canvas trying to figure out the layout
Manually structuring the text from my newsletter into visual hierarchy
Picking colors, fonts, spacing
Exporting and formatting
Reality: I’d skip it 99% of the time. The friction was too high. My newsletter went out with walls of text instead of visual aids.
What I Built
A workflow that turns any paragraph from my newsletter into a publication-ready infographic in 30 seconds.
How it works:
Input: I paste the paragraph I want visualized + select visual style (Pixel Art, LEGO, realistic whiteboard, futuristic, etc) + aspect ratio (5:4, 16:9, 9:16, etc) + visual complexity + color scheme + text density
Processing: The workflow analyzes the text, extracts key concepts, structures them into a logical visual hierarchy, and generates the infographic using Nano Banana Pro image model
Output: A polished, on-brand infographic ready to drop into my newsletter
What Actually Changed
I’m not skipping visuals anymore.
Now when I’m editing a newsletter draft and see a dense section, I copy the paragraph, paste it into the glif, customize it, hit generate. 30 seconds later, I’ve got an infographic.
Build #2: The SEO Optimizer (Agent)
The Problem I Had
I know SEO matters. I know my newsletter posts could rank better on Google. I know there are keywords I should be targeting.
But doing SEO takes so much time.
The process looked like this:
Identify what I want the post to rank for
Research relevant keywords using an SEO tool
Check what’s already ranking for those keywords
Analyze competitor content structure
Identify gaps and opportunities
Go back and optimize my post
Time investment: 1-2 hours of research and analysis.
Reality: I published posts and hoped for the best. SEO became something I’d “get to eventually” but never did.
What I Built
An agent that takes my newsletter URL and generates a comprehensive SEO optimization report with actionable recommendations.
How it works:
Input: Newsletter URL
Processing: The agent extracts my content, identifies core topics, searches the web for relevant high-ranking content, analyzes keyword opportunities, compares my structure against top performers, and identifies what I’m missing
Output: A detailed SEO report with specific keywords to add, content structure improvements, and topic gaps to address
What Actually Changed
SEO becomes my newsletter publishing process.
Now before I hit publish, I paste my newsletter URL into the agent. It runs its analysis while I’m doing final edits. By the time I’m ready to publish, I have a report telling me exactly what to optimize.
Build #3: Tutorial Visual Generator (agent)
The Problem I Had
I write a lot of how-to content. Step-by-step guides. Framework breakdowns. Implementation tutorials.
But the challenge is text-based tutorials are hard to scan. People want to see the process visually. They want a diagram that shows “these are the steps, this is the flow, this is how it all connects.”
Then reality would hit: opening Figma, Canva, Excalidraw, or Miro; manually mapping each step, creating icons, and designing the flow.
Reality: I’d skip it unless the tutorial was absolutely critical. Most of my how-to content went out as text-only, less engaging.
What I Built
An agent that takes my tutorial text and automatically generates a professional step-by-step visual guide.
The agent receives a brief input from me, then runs research using Perplexity and Wikipedia to ground its answer with fact-based information, structures the tutorial into clear, sequential steps, and presents the structure and style options I can use.
In this example, I asked the agent to visualize how LLMs actually respond to their users. For visual style, I chose Tech/SaaS, Circular Process, 16:9, and Detailed Information.
Here’s the result:
As you can see this is a polished tutorial visual that turns my written guide into a scannable, shareable diagram.
What Actually Changed
I’m not writing text-only tutorials anymore.
Now when I finish a how-to post, I paste the content into the agent. This opens up possibilities to visualize your ideas—or what’s on your mind—into clear visuals that other people can easily consume and digest.
DIY your creative process
If you’re like me, you can see how many new possibilities this tool unlocks:
You can DIY every creative process you’ve been carrying in your head
You can experiment from one AI model to another
And you don’t have to subscribe to specific models just to use their multimodal features
Let me tell you how I had been generating AI Maker’s newsletter thumbnail for last 8 months.
It was generated using GPT image. My old workflow was to upload the newsletter draft to my Claude project with the goal of generating a text‑to‑image prompt. Once finished, I copied that prompt to ChatGPT so it could generate the thumbnail. If I wasn’t satisfied with the result, I would do it all over again until I got what I wanted.
But now?
I have a special Glif agent that turns my newsletter draft into the thumbnail instantly by connecting to GPT‑1.5’s new image generation all in one place.
I no longer need to go to ChatGPT.
This applies as well to Seedance, Midjourney, and Nano Banana Pro as top‑tier image generation models.
Now it’s up to you to figure out your creative process, DIY it, and find the best workflow with reliable results you can repeat every single time.
Getting Started (And What You Should Know)
If you’re thinking “I want to try this,” here’s what you need to know:
How to Start: Three Paths
Path 1: Explore First (Recommended)
Don’t build anything yet. Just explore what’s already there.
Go to glif.app
Browse the community gallery
Run 5-10 workflows that look interesting
Pay attention to what catches your attention
The goal isn’t to find the perfect workflow. It’s to see what’s possible. Once you understand the range of what people are building, you’ll start thinking:
“Oh, I could adapt that for my use case.”
Path 2: Remix and Modify
Found a workflow that’s close to what you need but not quite right?
Hit “Remix” and you get a copy of the entire workflow you can modify. Change the inputs. Swap out AI models. Adjust the outputs. Make it yours.
This is how I learned. I remixed a thumbnail generator, modified it for newsletter infographics, and suddenly understood the pattern I could apply to other visual workflows.
Path 3: Build From Scratch
Once you understand the patterns, build your own.
Start simple: One input → One AI model → One output.
Example: Text description → Image generator → Social media graphic.
Get that working. Then add complexity: Maybe chain two image models (generate, then enhance). Then add decision logic if you need an agent.
The visual builder makes this surprisingly intuitive. You’re literally dragging blocks and connecting them.
The Pricing Model (And Why I Actually Like It)
I’m drowning in $20–30/month AI subscriptions I use twice a month.
Glif flips that: it’s credit-based, not a subscription. I buy a small pack ($2–10), then pay per run depending on the model—text is ~2–4 credits, my infographic workflow is 17, video can be 50-100+ or more.
When I’m experimenting, I top up $10–20 and burn through it. When I’m heads-down on other projects, those credits just wait. I’m not locked into a monthly payment for sporadic use; I pay only when I need it, and credits don’t expire.
For creative, project-driven work, this model actually makes total sense.
The Honest Limitation (And Why It Actually Matters)
Let me be clear about what glif can’t do:
You can’t schedule or fully automate these workflows.
This isn’t Zapier where you set up a trigger (”when I publish a blog post, automatically create social graphics”) and walk away.
This isn’t Make.com or n8n where you can schedule a workflow to run every morning at 8am.
Glif workflows are on-demand. You go to the workflow, input your data, hit run, get your output.
At first, I thought this was a limitation. Now I think it’s the point.
Because here’s what I realized: The workflows I’ve built aren’t things I want to fully automate. They’re creative tools that require my judgment.
These workflows aren’t replacing my decision-making. They’re augmenting my creative process.
And that requires me to be there.
This is why glif positions itself differently than traditional automation tools. It’s not your background task scheduler. It’s your creative engine.
You’re still in the driver’s seat. The workflows just give you superpowers while you’re driving.
Glif also comes with its own MCP server, allowing you to connect your agent and workflow to Claude and ChatGPT.
Read more about MCP here and learn how to connect them.
So that’s glif: A platform for building multimodal AI workflows that amplify your creative work rather than replacing it.
But let’s bring this back to what actually matters.
What This Really Means
I started this newsletter talking about evolution, how we’re moving from chatting with chatbots to building systems that turn imagination into reality. This becomes possible because AI has democratized execution.
But here’s what I think is actually happening:
We’re learning to think in systems instead of tasks.
Five years ago, if you wanted to create infographics for your newsletter, you had two options: Learn design or hire a designer.
Roughly over the past year, AI has given you a third option: prompt an image generator and hope it understands what you want.
Today, you have a fourth option: Build a system that codifies your creative process.
That’s the shift. Not just “AI can do this task for me” but “I can build tools that work exactly how I think.”
And the people who figure this out first—who learn to think in workflows instead of prompts, who build creative engines instead of collecting AI tools—those are the people who will have an unfair advantage.
Not because they’re faster. Because they’re more capable.
Your Next Steps
Here’s what you should do if you are serious about becoming more capable at building your creative system:
Go explore glif.app and run a few community workflows
Think about one repetitive creative task in your work that could become a workflow
Ask yourself: “What would I actually do if the friction was zero?”
If you’re ready to build with me in the AI Maker Lab: I’m giving paid members instant access to the three production‑ready Glif workflows I built for this post.
📊 Infographic Builder
Turn any paragraph into publication-ready infographics in 30 seconds. Choose your style, pick your size, generate. Simple as that.
🔍 SEO Optimizer Agent
Paste your newsletter URL, get a comprehensive SEO report with keyword opportunities, content structure improvements, and strategic recommendations.
💡 Tutorial Visual Generator Agent
Submit your how-to content or tutorial text, and the agent automatically creates professional step-by-step visual guides with icons, flow diagrams, and alternative design variations.
You’ll get:
Direct access to all three workflow and agents
Complete build showing exactly how they work
The patterns I learned so you can remix and build your own glifs
The future of AI isn’t about who has access to the best models.
It’s about who builds the best systems with them.
Let’s build.
In the meantime, I’m going to experiment more with generating short-form 9:16 video because I want to learn how to build a faceless storytelling and explainer video during this holiday season.
Until then, see you next week 👋🏻
Wyndo
P.S. For AI Maker Labs members, you can access all three Glif workflows and agents on the Maker Access page. Feel free to use, remix, and modify them based on your needs.


















There's a lot going for this, but it's possible to do the infographic production, and video explainers using NotebookLM.
Impressive take on how the shift from individual tools to orchestrated workflows is basically where creative leverage lives now. The "on-demand vs fully automated" limitation you mentioned actually makes more sense than it first seems. I spent months trying to fully automate creative stuff only to realize the value is in augmenting decisions, not replacing them. The credit-based pricing model is also way smarter for experimental work where usage is bursty, like dunno why more tools dont do this.