OpenCode + DDEV: how I built a Drupal development environment with 16 AI agents
I've been working with AI development tools for months now. I've tried Claude Code, Cursor, Copilot, and several others I don't even remember. They all have the same problem: either you're locked into a single provider, or agent customization is limited. So I decided to build my own setup.
What I'm going to walk you through here is a full configuration of OpenCode (an open source alternative to Claude Code) integrated with DDEV through custom Docker containers, with 16 specialized agents, overnight autonomous execution, an LLM model proxy and visual testing with Playwright. It's over 7,000 lines of configuration across Markdown, JSON, YAML, Shell and Dockerfile. It's not something you'll set up in an afternoon, but the result is worth it.
What is OpenCode and why not Claude Code
OpenCode is an open source CLI/TUI tool for AI-assisted development. If you've used Claude Code, the idea is similar: an agent that reads your code, runs commands and proposes changes. The key difference is that OpenCode is provider-agnostic. You can connect Anthropic, OpenAI, Fireworks, AWS Bedrock or any OpenAI-compatible API.
Why not Claude Code directly? Cost and flexibility. Anthropic's API is expensive. Yes, in Claude Code you can try using Haiku or Sonnet for some tasks, but even those models are significantly more expensive than open source models running on third-party servers like Fireworks. With OpenCode I can use open source models that are perfectly viable for many tasks and cost a fraction of the price. On top of that, I can create specialized agents with different instructions, tools and models for each type of task.
From experience I can tell you that the cost difference isn't trivial. When you leave an agent running tasks overnight (we'll get to that), the bill using only Anthropic skyrockets. With open source models on cheap providers, those same sessions cost way, way less.
Three containers, one ecosystem
Beyond choosing OpenCode as a tool, I've built a custom integration with DDEV so everything runs inside Docker containers. This isn't something that comes out of the box with OpenCode (I could've done the same with Claude Code or any other tool). The idea is to have a reproducible environment where the AI tool, the web server and the testing browser all live on the same Docker network without installing anything on the host machine.
The architecture is based on three containers. The OpenCode container is where the AI agents run, it has access to the project code and can execute Drush and Composer commands inside the web container. The Web container is the standard DDEV one: Drupal, Drush, code quality tools, the usual. And the Playwright container, a headless browser that the agents can control for visual testing, screenshots and form navigation.
All three share the same Docker network, so they can see each other. OpenCode can use Playwright to visually validate a change and at the same time run a drush cr in the web container. All without leaving the environment.
I've created custom DDEV commands (files in .ddev/commands/host/) to simplify the day-to-day. With ddev opencode I open the interactive interface, with ddev opencode ralph I launch autonomous mode, and with ddev phpunit or ddev phpstan I run the quality tools directly. No more manually entering containers or remembering paths.
16 agents, each with its own job
OpenCode uses an orchestrator + sub-agents architecture. The main agent analyzes what you're asking for and delegates to specialized agents. I've set up 16, split into four groups.
The development agents: drupal-dev for backend, drupal-theme for frontend and Twig, drupal-test for PHPUnit, drupal-perf for performance and cache, drupal-update for Composer updates and DB migrations, and twig-audit for auditing templates.
The quality agents: three-judges is a quality gate with three judges (Architect, Security, Performance) that must approve any significant change. output-verifier validates the accuracy of generated outputs.
The content agents: blog-writer for technical articles, sales-copy for landing pages, email-responder for professional emails.
And the utility agents: applier for mechanically applying code changes, code-explorer for quick codebase exploration, deep-research for deep investigation, ralph-planner for generating requirements, and visual-test for Playwright testing.
Each agent with its own model
Not all agents need the same model behind them. The quality agents (three-judges, output-verifier) and the orchestrator currently use Claude Opus 4-6 because it's the most capable model for tasks that require complex reasoning: evaluating architecture, detecting security issues, making design decisions. But that doesn't mean I'll still be using it a month from now. Models change every few weeks and what's the smartest today might not be tomorrow.
The rest of the agents (development, content, utilities) run on cheaper models. I'm not going to specify which ones because I don't think it's relevant. The point is that generating code, applying mechanical changes or writing content are tasks where a less powerful model works perfectly fine and costs a fraction. The key to the system is that separation: smart models for smart tasks, cheap models for everything else.
The Applier pattern
This is an architectural decision that took me a while to arrive at, but it's made a real difference. The idea is simple: separate the agents that analyze from the ones that write.
The development agents (drupal-dev, drupal-theme, etc.) can only read files and execute commands. When they propose a change, they generate SEARCH/REPLACE blocks as a proposal. They don't touch the code directly.
The applier agent receives those blocks and applies them mechanically. It uses a small, cheap model with temperature 0.0 for maximum precision and minimum cost.
What do you get from this? An analysis agent can't accidentally modify files it shouldn't. And you can review proposals before they're applied. I've seen this many times with AI tools: the agent analyzes well, but when it writes it gets too clever and changes things you didn't ask for. With the Applier pattern that doesn't happen.
Ralph Loop: overnight autonomous execution
The Ralph loop is a system that allows OpenCode to work independently on complex tasks, without human intervention.
The flow has two phases. In the planning phase you pass it a requirements.md file with the objective, requirements and success criteria. Ralph analyzes the existing code and creates tasks with priorities (P0 to P3) using Beads, a persistent task tracking system based on Git. In the execution phase it enters a loop: checks pending tasks, works on the highest priority one, closes it, and repeats. If it discovers additional work, it creates new tasks. When there are no pending tasks left, it finishes.
What nobody tells you is that the key to the Ralph loop isn't the loop itself, but the combination of opencode run (non-interactive mode) with permissions in allow mode (no confirmation prompts). That's what enables fully autonomous execution.
Isn't that dangerous? That's the nuance. The agents can't do git commit, git push, or any destructive Git operation. It's blocked in the permissions config. So you leave Ralph working all night, and in the morning you do a git diff, review the result and commit yourself. The human always has the last word.
I've used the Ralph loop for large refactors, massive PHPCS and PHPStan error fixes, and for implementing complete features starting from a requirements file. It works especially well with the audit fixers: specific instruction files for automatically correcting code quality errors.
The LLM proxy that saves you money
Using a single AI provider is kidding yourself on costs. A premium model is incredible for reasoning, but using it to apply a mechanical SEARCH/REPLACE is throwing money away.
The solution is LiteLLM as a proxy. I have a server that acts as a unified gateway (compatible with the OpenAI API) and routes each request to the most appropriate model. Complex planning and quality evaluation tasks go to the premium model of the moment. Development and code analysis go to a mid-tier model. Mechanical change application goes to the cheapest model I can find that works well with temperature 0.0. And for overnight autonomous execution I choose based on the budget.
The important thing is that I have models configured from Anthropic, OpenAI, Fireworks, AWS Bedrock and other providers. If tomorrow a new open source model comes out that performs better than what I'm using today, I add it to LiteLLM and tell each agent in the OpenCode config which model to use. It's a very simple configuration, you can change it in no time.
How much do you save? It depends a lot on the type of task and whether it's something short or a long session you leave running for hours. But you can easily be talking about two or three times cheaper than using Anthropic's API for everything. That said, the best value for money today is still the Claude subscription. The real savings come when you need to go beyond what the subscription allows, especially in long Ralph loop sessions where consumption spikes.
Skills: recipes for repetitive tasks
Skills are detailed instructions that are invoked as commands inside OpenCode. Think of them as step-by-step recipes for common Drupal operations. I've created 12 skills, but the ones I use most daily are these.
drupal-audit runs the Drupal Audit module (drush audit:run with per-module filtering) for PHPCS, PHPStan, PHPUnit, Twig and cyclomatic complexity. It's the central point for code quality. run-quality-checks is the fallback with PHPCS + PHPStan + PHPUnit for when the Audit module isn't installed. drupal-module-scaffold generates the complete skeleton for a custom module. playwright-browser-testing does visual testing with screenshots and form navigation. And drush-commands provides quick access to common Drush commands: cache rebuild, config export/import, user login.
Each skill has the complete procedure, including which tool to use, what output format to expect and how to handle errors. The agents invoke them automatically when they detect that a task fits an available skill.
Permissions: autonomy with a safety net
The permissions philosophy is clear: agents can read and write code freely, execute any development command, but they can never perform destructive Git operations. They can do git diff, git log and git status, but git commit, git push, git pull, git checkout and git reset are blocked. The user always reviews and commits manually.
I've also set up a desktop notification system. The OpenCode container sends notifications to the host when it needs attention, completes a task or detects an error. That way I can leave Ralph running and find out when it finishes or when something fails, without being glued to the terminal.
The actual workflow
So it doesn't stay theoretical, this is how I work day to day.
In interactive mode I open ddev opencode, ask for what I need ("implement service X", "fix the PHPStan errors in this module"), the orchestrator delegates to the right agent, the applier applies the changes, and the three-judges validate. I review and commit.
In Ralph mode I write a requirements.md with what I want, launch ddev opencode ralph, and go do something else (or go to sleep). In the morning, git diff and review.
After months working with this setup, the reality is I wouldn't go back. Not because AI generates perfect code (it doesn't), but because automating repetitive tasks and the quality enforced by the three-judges have changed the way I work.
What brings me the most value isn't the code generation itself. It's the combination of specialized agents with automatic quality gates. Before, code review was something I did alone, with all the limitations that comes with. Now I have three "judges" reviewing architecture, security and performance before I even see anything.
Is it perfect? No. Agents sometimes get stuck in loops, cheap models generate mediocre code that the judges reject, and setting all this up takes time. But if you work with Drupal daily and you're looking for how to actually integrate AI into your workflow, this is the most complete setup I've managed to build. And I say that being fully aware that I built it, so I might not be entirely objective.
Why I'm not sharing it (for now)
I know someone's thinking "so where do I download all this?". The reality is that the configuration as I have it right now contains a lot of personal stuff: credentials, paths specific to my projects, agent instructions with client context. It's not something I can just push to a repository as-is. I'd need to do a significant cleanup to get it into a shareable state, and right now that's not my priority.
What I can say is that the general idea isn't hard to replicate if you already work with DDEV and have experience with Docker. OpenCode has good documentation, LiteLLM does too, and the agents are ultimately just Markdown files with instructions. The heavy lifting is in fine-tuning each agent's instructions and in the Ralph loop, which is a bash script of about 500 lines.
How my way of working has changed
After over a decade writing code the old-fashioned way, the way I work has changed enormously. And I'm not talking about two years ago. I'm talking about the last few months. Every new model that comes out shifts the ground a bit more, and what worked three months ago can already be done more efficiently today.
As a freelancer, this has a direct impact. The hours I need to reach the same solution as before have dropped significantly. Tasks that used to take me an entire afternoon of writing code, debugging and testing, I now solve in a fraction of that time. And it's not that AI does the work for me: I still review, decide and commit. But the mechanical part, the repetitive part, I don't do that anymore.
And this is going to keep changing. Every month that passes the models improve, new tools come out, and what today seems like a complex setup will probably get simpler. The only thing I'm sure of is that staying stuck working like 5 years ago is no longer an option.
Have Any Project in Mind?
If you want to do something in Drupal maybe you can hire me.
Either for consulting, development or maintenance of Drupal websites.