DDEV AI Workspace: I've published my full Drupal + AI development setup

A while back I wrote an article explaining how I had a system set up to work with OpenCode inside DDEV containers on my Drupal projects. Several people reached out asking if I could share it. I couldn't. It had client data in it, tasks specific to particular projects, references to private repos, and it wasn't as simple as copying and pasting.

Then came the Anthropic change: they pulled the Claude API integration with OpenCode from the Max subscription. If I wanted to keep using Opus and Sonnet with my Max plan, I had to go back to Claude Code for those tasks. I took that forced refactor as an opportunity to reorganize my entire DDEV AI workflow, extract the generic parts, and make them public.

I've been working in a similar way for a long time, but with everything wired up pretty roughly. Spinning up a new project and getting it configured with all the tools took longer than I'd like to admit. Unifying and publishing it let me clean up the architecture in one go and, while I was at it, leave it ready for anyone else to use.

The result is DDEV AI Workspace, a meta add-on that installs the whole stack with a single command, plus seven standalone add-ons in case you only want one of the components. In this article I'll walk through what's in there, how it all fits together, and what to expect from each part.

What you get with one command

Inside your Drupal project directory, this is all you need:

ddev add-on get trebormc/ddev-ai-workspace
ddev restart

From there you've got seven add-ons running and wired together:

  • ddev-opencode: OpenCode in a container, ready to use any provider (Anthropic, OpenAI, OpenCode Zen, or even free models).
  • ddev-claude-code: Claude Code in a container, for when you want to lean on Anthropic's Max plan.
  • ddev-ralph: an autonomous orchestrator that hands off tasks to OpenCode or Claude Code and runs them unattended. Designed for overnight runs.
  • ddev-beads: a git-backed task tracker. Agents log task state there, and you can check progress.
  • ddev-playwright-mcp: headless Chromium behind an MCP endpoint, so agents can take screenshots or run visual tests.
  • ddev-agents-sync: the glue for the agents. Syncs the agent repo, resolves model tokens, and generates CLI-specific configurations.
  • ddev-ai-ssh: the glue for communications. Installs sshd in the web container and generates a project-specific SSH key pair, so the other containers can connect without needing access to the host's Docker socket.

All of these containers live inside the DDEV project's internal network. They see each other, they can talk to Drupal's web container, but they have no access to your host machine. The AI operates inside a sandbox, not on your system.

Why install both CLIs

A question I get a lot: why two AI tools if they basically do the same thing?

Well, precisely because they don't do the same thing. And the short answer is that I've ended up needing both for a reason that's more economic than technical.

The models that work best for me for Drupal development right now are Anthropic's: Opus and Sonnet. And the cheapest way to use them, by a long shot, is the Max subscription. Paying for the direct API gets expensive fast once you start pushing the larger models in long sessions.

Until not long ago I could use my Anthropic Max subscription inside OpenCode, so I had the best of both worlds: the interface I prefer and the models that work well for me, without paying twice. Anthropic closed that access, and now the subscription only works from Claude Code. If I wanted to keep using it without shelling out for the API, I had to go back to Claude Code for tasks that need Opus or Sonnet. That's what I'm doing now.

Personally, I like OpenCode quite a bit more. The interface feels more polished, the workflow more flexible, and the agents-and-subagents system fits better with how I think about tasks. If Anthropic hadn't closed the door, I'd still be running a single CLI. But things are what they are, and Claude Code with the Max subscription is much more cost-effective than paying for the API, so there it is.

For everything that doesn't require Opus or Sonnet, I use OpenCode with other providers. There's OpenCode Zen (significantly cheaper than Anthropic's direct API) and any other provider I want to try. For more mechanical work (generating tests, applying PHPCS fixes, exploring code) it works fine, so it's not worth loading every request onto the Max subscription when there are cheaper options that do the same job.

OpenCode also comes with access to free models. They're pretty limited and fall short when you want to get real work done, but for tinkering or occasional tests they're enough. One heads-up though: most of these free models use the data you send them for training. Depending on the kind of clients you work with (an agency with big clients, code under NDA, sensitive data), using them directly on client projects is a bad idea. I use them for my own tests or on projects where the client already knows and doesn't care. Otherwise, better to go with a paid provider with a clear privacy policy.

If your case is different from mine (you're going to pay for the direct API, you only use non-Anthropic providers, or the other way around, you only use the Max subscription), you can install just the CLI you need and skip the other one.

Modular install, if your workflow needs it

The meta add-on is what I recommend, but each component can be installed separately if your way of working requires it:

# Only OpenCode (pulls in Playwright, Beads, Agents Sync, and AI SSH)
ddev add-on get trebormc/ddev-opencode

# Only Claude Code
ddev add-on get trebormc/ddev-claude-code

# Only Ralph (requires OpenCode or Claude Code already installed)
ddev add-on get trebormc/ddev-ralph

That said, my recommendation is still the meta add-on. One command and you have everything installed and configured, without having to worry about dependencies or install order. If you end up only using Claude Code, you'll have OpenCode starting up with each ddev start without using it. It uses some RAM (little, but not zero), so keep that in mind if you work with tight memory budgets or with several DDEV projects running at the same time. On the plus side, it's there if you ever need it. Going component by component is more cumbersome to configure and to keep aligned when updates come out. Unless you have a specific, solid reason to separate things out, a single command is much simpler and less work.

How the pieces fit together

The architecture diagram in the repo explains it pretty well, but to summarize how the containers talk to each other:

The CLIs (OpenCode and Claude Code) talk to the Drupal web container over SSH, using DDEV's internal network. That's where they run Drush, Composer, PHPUnit, or whatever else they need. They talk to Playwright over HTTP/MCP for anything browser-related. And they delegate task tracking to Beads, also over SSH.

At first I set up this communication with docker exec, but that meant giving some containers access to the host's Docker socket, and that felt like an unnecessary risk. What enables the current scheme is the ddev-ai-ssh add-on: it installs sshd in the web container and generates a project-specific SSH key pair. So, for example, Ralph connects over SSH to the Claude Code container to hand off a task, and Claude Code in turn connects over SSH to the web container to run a Drush command. All inside the same perimeter, without touching the host.

Because the keys are unique per project, if you have several DDEV projects running at the same time they stay isolated from each other. Containers from project A can't talk to containers from project B. Each environment is a closed sandbox, no cross-talk.

Ralph has the most interesting role: it doesn't run AI itself. Instead, it launches OpenCode or Claude Code sessions with a tightly structured prompt and keeps an eye on the task until it's done. It's designed to be left running overnight with a list of issues in Beads, so by morning it's worked through all of them. After that you always need human review.

Agents Sync is the least flashy piece but the most important one day to day. On each ddev start it clones or updates the agent repo, runs envsubst over the model tokens, and generates two separate folders: one for OpenCode and one for Claude Code, with the agents and rules formatted the way each tool expects. That lets me maintain a single set of agents and have them work in both CLIs without duplication.

The drupal-ai-agents repository

Within the whole stack there's a piece that isn't a container. It's drupal-ai-agents, a Git repository with the agents, rules, and skills specifically designed for Drupal 10/11 development. It doesn't install as a DDEV add-on: ddev-agents-sync syncs it automatically on each ddev start, pulling down the latest version and leaving it ready for the CLIs to use.

This is what makes OpenCode or Claude Code, which are generic tools on their own, behave as Drupal-specialized CLIs from the start. Without this repo, you'd have to set up Drupal-specific agents, rules, and skills yourself on every new project.

It contains three things.

First, 10 specialized agents. Each with a specific role: drupal-dev for backend, drupal-theme for frontend, drupal-test-generator for generating tests, code-review as a quality gate, applier for mechanical changes with SEARCH/REPLACE blocks, ralph-planner for generating work plans, and a few more.

Second, 12 rule sets. Some global, others scoped by file type. For example, drupal-coding-standards.md activates only with *.php and enforces strict types, 2-space indentation, dependency injection, and proper cache metadata. twig-patterns.md activates with *.twig and covers the basics: presentation only, rendering full fields, cache bubbling, common anti-patterns. There's another explicit rule that blocks agents from doing commits or pushes. Commits are always mine; that's intentional, and I explain why later.

Third, 24 reusable skills. Skills are recipes the agents invoke when they need them. There are skills for each kind of Drupal test (unit, kernel, functional, functionaljs, behat, playwright), for module scaffolding, for config management, for debugging, for Drush, for performance audits, for Xdebug profiling, and more.

All of this works with a model token system. Instead of hardcoding specific model names (Anthropic's Opus, OpenAI's latest GPT, whichever model is current at the time) in every agent, I use tokens like ${MODEL_SMART}, ${MODEL_NORMAL}, ${MODEL_CHEAP}, ${MODEL_APPLIER}. In a single file (.env.agents) I define which real model corresponds to each token, per CLI separately. If a better or cheaper model comes out tomorrow, I change one line and all the agents update. No rewriting ten agents one by one.

The other useful trick is the fat frontmatter. Each agent .md has a single frontmatter with fields specific to OpenCode and to Claude Code mixed together. The sync script generates two versions, one for each CLI, with only the fields each one understands.

The important point isn't the technical detail, it's that it lets me have a single source of truth for each agent, with no duplication. If I improve a skill, I edit it once and it's applied in both CLIs automatically. If an interesting new CLI shows up tomorrow, I just extend the frontmatter and keep going. For an agency with a team where some people prefer Claude Code and others OpenCode, or that's considering switching tools at some point, this is key: the agent repo isn't locked into a specific technology. If you have to switch CLI, you switch CLI, but your library of agents and skills stays the same.

Your own agent repository

The agents, skills, and rules that come by default are the ones I use. And as I've been saying, they reflect how I work and the patterns of the Drupal projects I'm on. I'm well aware that other people work differently, and that there are cases where mine just won't fit.

Think for example of someone working with newer Drupal versions, or even with development branches that aren't stable yet. Patterns and APIs change, and it makes sense to maintain an agent repo specific to that context. Or think of an agency with a team of several developers: everyone should be working with the same agents and skills, and those agents should evolve with the team, not with whatever I decide to publish in mine.

For this there's an environment variable, AGENTS_REPOS, where you define the list of repositories the agents and skills come from. You can use only mine (the default), only yours, or combine several. When you combine, the repos listed later override the earlier ones if there are files with the same name. So you can start from mine as a base and only replace the specific agents you want to change, without maintaining a full fork.

The repo structure is what's in drupal-ai-agents: you can clone it and adapt it, or build a new one from scratch following the same convention. An agency that sets up its own repo has full control: it updates its agents, its skills, its rules, and the whole team syncs automatically on each developer's next ddev start.

Personally, mine work for what I do, but I make no claim that they're the right ones for everyone. This is the path to adapt it to your case.

A real use case

An example of how I use all this day to day.

I open Claude Code with ddev cc and start by telling it what I want to implement. I don't stop at a quick description: I try to document as thoroughly as I can how I would do it as a programmer, what patterns I tend to use, what dependencies I want, how I want the thing structured. The more detail I give it up front, the fewer corrections I have to make later.

Sometimes it proposes a different approach from mine. Then we spend some time discussing it. If what it's proposing makes sense and improves what I had in mind, I change course. If it doesn't, I explain why I prefer my approach and we keep going. This back and forth before implementing anything saves me having to revert code later.

Once we have a clear picture of how we're going to build it, I tell it to go ahead. From there, Claude Code invokes the agents that fit at each moment: the backend one for services and modules, the frontend one when Twig or CSS is involved, the review one for validating what's been written. I also tell it not to forget about generating tests, which triggers the matching agent for whatever test type fits the code we just wrote.

When it's done, in the working directory I have all the new and modified files. But there's no commit. That's intentional, and in the next section I explain why.

Commits are always mine

I mention it above in passing, but it deserves its own section, because it's a conscious decision and not an oversight.

The agent repo has a rule that blocks any agent from running git commit or git push. Whatever the AI does stays in the working directory until I review the diff with my own eyes and commit it myself. It doesn't matter if I'm working interactively with OpenCode or Claude Code, or if I'm letting Ralph run autonomously: commits are never automated.

The reason is that, as of today, I don't trust AI output enough. I've run into cases many times where the implementation isn't correct, or isn't how I'd do it. I explain how I'd do it, the AI agrees with me (almost always, because of how these models are trained), it modifies the code, and sometimes wrecks something that was already fine. If that had been committed automatically, the damage would already be done.

Reviewing before committing forces me to read what the AI produced, think about whether I'm convinced, and step in if I need to. It's the human step I'm not ready to give up yet. Maybe in a couple of years, with more reliable models, I'll reconsider. Not today.

What to expect (and what not to)

Not selling this as a silver bullet. A few things are worth keeping in mind.

It's actively in development. I use it daily on my projects and I keep refining it as things come up. It's very likely that there are rough edges I haven't found yet. If you run into any, drop me a line on LinkedIn or open an issue on the matching repo, and I'll look into it when I can.

Playwright is the most fragile part. When I reorganized the architecture to make it public, things that used to work in my private setup broke, and Playwright is by far where I've had the most issues. It disconnects sometimes and I have to restart DDEV to get it back. I had to wrestle with permission issues so it could generate screenshots and make them accessible to the project. And the latest Playwright update changed some things I've been fixing on the fly. More problems of this kind will probably show up.

Models still hallucinate. If you leave Ralph running overnight, by morning you can find changes that compile but do something different from what you asked for. I use it for scoped, repetitive tasks (fixing PHPStan warnings in a list of files, for example), not for architecture decisions. And as I said, nothing gets committed without me reviewing it.

Model tokens need calibration. The system is convenient but you have to test each new model before assigning it the SMART token. A model that looks great on general benchmarks sometimes generates PHP that doesn't compile. What makes the difference for Drupal agents isn't the general benchmark, it's how it handles strict typing, dependency injection conventions, and the specific patterns we use.

It's designed by me and for how I work. I tried to make it generic, but many decisions (prioritizing the Audit module, specific testing patterns, commit rules) reflect my workflow. If yours is different, remember that you can use your own agent repo as I explained earlier.

Where to find it

Everything is on GitHub under my username, Apache-2.0 licensed:

  • ddev-ai-workspace: the meta add-on that installs everything at once.
  • drupal-ai-agents: the configuration repo with the agents, rules, and skills.
  • The individual add-ons also have their own READMEs with specific details.

If you find this useful, I'd really appreciate a star on the repos. It gives me a real signal of whether this work is reaching people and resonating, and it helps more Drupal devs discover it on GitHub. I think this is a useful tool for the sector, and the more people use it on real projects, the better it'll get and the more we all gain as a community.

Feedback is welcome. If you try it and something doesn't work, if you notice weird behavior, or if you're missing a skill or agent for a specific Drupal use case, reach out on LinkedIn or open an issue on the matching repo. As I said, I keep this thing alive and I refine it with whatever comes up. The more real feedback, the better it'll work.

Need a Drupal Expert?

Senior Drupal developer, freelance, specialized in what's hardest: migrations, multilingual sites, SaaS platforms and Stripe integration. I leverage AI to cut delivery times and costs, with expert review on every line of code.

No agency, no middlemen. Direct contact with the one who does the work.