Gemini CLI in Production: A Safe Workflow That Doesn't Break Your Repo
If you've ever let an agent refactor a real codebase, you know the pattern:
it turns "green" -- but at the cost of architecture, types, and lint discipline.
This guide shows a 3-gate workflow that keeps Gemini fast without handing it the keys to your repo.
TL;DR (30 Seconds)
- Gate 1 -- Plan: Start in read-only and have it generate a plan (
--approval-mode plan) - Gate 2 -- Execute: Only run micro-scoped plans (
/cook @plans/) -- after a checkpoint commit.md - Gate 3 -- Verify: After every run: build + lint + test + sabotage scan
- If the sabotage scan hits: immediate reset (
git reset --hard HEAD)
The Problem (20 Seconds)
Unconstrained agents often fall into the same spiral:
1) Type errors appear
2) Agent tries "something"
3) Lint gets in the way -- gets disabled
4) Types stay red -- any gets sprinkled in
5) Build is green -- architecture is worse
Gemini isn't "bad". It needs tight boundaries and small tasks.
The 3-Gate Workflow (Anchor)
Everything in this article is just detail around one of these gates.
Gate 1 -- Plan (read-only)
Goal: deterministic plan, no writes.Gate 2 -- Execute (micro-scope + approvals)
Goal: exactly one small plan, clear scope, minimal blast radius.Gate 3 -- Verify (ruthless)
Goal: prove quality. If not provable -- reset.If a plan has more red flags than green flags: back to Gate 1.
Gate 1: Plan
Start Gemini in read-only. It can analyze, but not edit.
1gemini --sandbox --approval-mode plan -e none2# then in chat: /planThe output is a proposal: files, steps, verification. Don't execute yet.
Gemini Modes: When to Use Which?
--approval-mode plan-- always first (read-only)--approval-mode default-- execution with explicit approval per edit--approval-mode auto_edit-- only after checkpoint and only for trusted micro-plans--approval-mode yolo/-y-- never for refactors (max risk)
as any".
Gate 1.5: GEMINI.md -- Your AI Constitution
Before you seriously plan, your repo needs a GEMINI.md at the root. This is the single highest-leverage thing in the entire workflow.
Must Include
- Hard Non-Negotiables (no
any, no lint-disable, no tsconfig weakening) - Verification Commands (build/lint/test for your project)
- Scope Boundaries (what can be touched, what never)
- STOP Conditions (when the agent must stop and propose options)
- Plan-first Mandate (Gate 1 is not optional)
Preview (short)
1## Hard Non-Negotiables2- NEVER introduce any, as any, unknown as any, or broad casts.3- NEVER add eslint-disable or weaken lint rules.4- NEVER weaken strictness in tsconfig or type-check settings.5- If types/lint fail: STOP and propose solutions -- don't bypass.67## STOP Conditions8STOP if:9- You would need any / lint disabling / config weakening.10- You are unsure about intended public API behavior.11- The change is breaking without explicit approval.1213## Verification Commands14Run after every execution:15- <package-manager> run build16- <package-manager> run lint17- <package-manager> run testGate 2: Execute
Why You Should Almost Never "Cook" the First Plan
The first agent plan is usually too broad, e.g.:
- Meta/registry edits (large blast radius)
- Rename/move "for cleanup" (breaks imports)
- Large conversions without runtime proof (breaks later)
Example (Universal): One Refactor Becomes Three Micro-Plans
Take any cross-cutting refactor (API migration, module split, build tooling, design system tokens, etc.). Split it into:
- Plan A -- Mechanical groundwork: adapters/wrappers so old + new can coexist.
- Plan B -- Isolation: legacy paths behind a clear boundary (deprecations, internal exports, feature flag).
- Plan C -- Migration pass: migrate a small, representative set of call sites/modules and prove it with tests/build.
Each plan:
- has a small file list
- has acceptance criteria
- ends with the same verify commands
A "Cook-Safe" Plan Contract (Checklist)
A plan is only "cook-safe" if it explicitly contains:
- Scope: exact list of files/folders to change
- Do Not Touch: areas that are off-limits
- Forbidden changes: no
any, no lint-disable, no config weakening - Steps: max 5-10 steps, each one measurable
- Acceptance criteria: "How do I know step 3 is done?"
- Rollback: how to revert if things go south
- Verification: which commands must be green
Gate 2.5: The Checkpoint Moment (Don't Skip This)
Before you run /cook:
1git status2git add -A3git commit -m "checkpoint: before agent run"Only then:
1/cook @plans/<micro-plan>.mdIf you can't stomach a reset, the task is too big.
Gate 3: Verify (Ruthless)
After every run:
1<package-manager> run build2<package-manager> run lint3<package-manager> run testThen:
1git diff --statIf the diff is larger than expected: scope was too big -- back to Gate 1.
Sabotage Gate (Automatic Red Flag Scanner)
Many "agent fixes" are hidden quality debt: any, eslint-disable, @ts-ignore, aggressive casts.
Scan the working area:
1rg -n ":\s*any\b|as any|unknown as any|eslint-disable|@ts-ignore|@ts-expect-error" <path>If Hits: Immediate Protocol
1) Stop (don't "just quickly fix it")
2) Revert to checkpoint:
1git reset --hard HEAD3) Tighten scope, rewrite plan, back to Gate 1.
Red Flags vs Green Flags
Red Flags (back to Gate 1)
- "I had to add
any, otherwise it was red" - Lint/TS was weakened or disabled
- The plan says "refactor everything" instead of "touch these 6 files"
- Config/tooling changes without explicit goal and tests
Green Flags (safe to execute)
- Small scope + clear taboos
- Every step has acceptance criteria
- Verify commands are part of the plan
- Rollback is trivial
Bonus: Two-Model Workflow (Quality Lever)
An agent is good at producing -- but not always good at constraining.
Deliberately use a second "reviewer" (human or second model) to:
- Shrink scope
- Spot red flags
- Sharpen acceptance criteria
- Enforce verification
The Agent Rule
Make Gemini your power tool, not your architect. Define constraints in GEMINI.md. Force plan-first in read-only mode. Refine plans into micro-scope tasks. Checkpoint before execution. Verify after every run. Use a second reviewer to remove risky steps. That's how you get fast automation without losing control.
The best results come from treating agent-assisted refactoring like any other engineering discipline: plan, constrain, execute in small increments, and verify ruthlessly. The agent does the tedious mechanical work. You own the architecture.