Tech & Engineering
Tech & Engineering/14 min read

Gemini CLI in Production: A Safe Workflow That Doesn't Break Your Repo

Artiphishle|
geminiai-toolingrefactoringworkflowdevops

If you've ever let an agent refactor a real codebase, you know the pattern:

it turns "green" -- but at the cost of architecture, types, and lint discipline.

This guide shows a 3-gate workflow that keeps Gemini fast without handing it the keys to your repo.


TL;DR (30 Seconds)

  • Gate 1 -- Plan: Start in read-only and have it generate a plan (--approval-mode plan)
  • Gate 2 -- Execute: Only run micro-scoped plans (/cook @plans/.md) -- after a checkpoint commit
  • Gate 3 -- Verify: After every run: build + lint + test + sabotage scan
  • If the sabotage scan hits: immediate reset (git reset --hard HEAD)

The Problem (20 Seconds)

Unconstrained agents often fall into the same spiral:

1) Type errors appear

2) Agent tries "something"

3) Lint gets in the way -- gets disabled

4) Types stay red -- any gets sprinkled in

5) Build is green -- architecture is worse

Gemini isn't "bad". It needs tight boundaries and small tasks.


The 3-Gate Workflow (Anchor)

Everything in this article is just detail around one of these gates.

Gate 1 -- Plan (read-only)

Goal: deterministic plan, no writes.

Gate 2 -- Execute (micro-scope + approvals)

Goal: exactly one small plan, clear scope, minimal blast radius.

Gate 3 -- Verify (ruthless)

Goal: prove quality. If not provable -- reset.
If a plan has more red flags than green flags: back to Gate 1.

Gate 1: Plan

Start Gemini in read-only. It can analyze, but not edit.

1gemini --sandbox --approval-mode plan -e none
2# then in chat: /plan

The output is a proposal: files, steps, verification. Don't execute yet.

Gemini Modes: When to Use Which?

  • --approval-mode plan -- always first (read-only)
  • --approval-mode default -- execution with explicit approval per edit
  • --approval-mode auto_edit -- only after checkpoint and only for trusted micro-plans
  • --approval-mode yolo / -y -- never for refactors (max risk)
Rule: YOLO on cross-cutting refactors is the fastest path to "40 files full of as any".

Gate 1.5: GEMINI.md -- Your AI Constitution

Before you seriously plan, your repo needs a GEMINI.md at the root. This is the single highest-leverage thing in the entire workflow.

Must Include

  • Hard Non-Negotiables (no any, no lint-disable, no tsconfig weakening)
  • Verification Commands (build/lint/test for your project)
  • Scope Boundaries (what can be touched, what never)
  • STOP Conditions (when the agent must stop and propose options)
  • Plan-first Mandate (Gate 1 is not optional)

Preview (short)

1## Hard Non-Negotiables
2- NEVER introduce any, as any, unknown as any, or broad casts.
3- NEVER add eslint-disable or weaken lint rules.
4- NEVER weaken strictness in tsconfig or type-check settings.
5- If types/lint fail: STOP and propose solutions -- don't bypass.
6
7## STOP Conditions
8STOP if:
9- You would need any / lint disabling / config weakening.
10- You are unsure about intended public API behavior.
11- The change is breaking without explicit approval.
12
13## Verification Commands
14Run after every execution:
15- <package-manager> run build
16- <package-manager> run lint
17- <package-manager> run test

Gate 2: Execute

Why You Should Almost Never "Cook" the First Plan

The first agent plan is usually too broad, e.g.:

    1. Meta/registry edits (large blast radius)
    2. Rename/move "for cleanup" (breaks imports)
    3. Large conversions without runtime proof (breaks later)
Fix: Refine into micro-plans with hard scope.

Example (Universal): One Refactor Becomes Three Micro-Plans

Take any cross-cutting refactor (API migration, module split, build tooling, design system tokens, etc.). Split it into:

  • Plan A -- Mechanical groundwork: adapters/wrappers so old + new can coexist.
  • Plan B -- Isolation: legacy paths behind a clear boundary (deprecations, internal exports, feature flag).
  • Plan C -- Migration pass: migrate a small, representative set of call sites/modules and prove it with tests/build.

Each plan:

    1. has a small file list
    2. has acceptance criteria
    3. ends with the same verify commands

A "Cook-Safe" Plan Contract (Checklist)

A plan is only "cook-safe" if it explicitly contains:

  • Scope: exact list of files/folders to change
  • Do Not Touch: areas that are off-limits
  • Forbidden changes: no any, no lint-disable, no config weakening
  • Steps: max 5-10 steps, each one measurable
  • Acceptance criteria: "How do I know step 3 is done?"
  • Rollback: how to revert if things go south
  • Verification: which commands must be green

Gate 2.5: The Checkpoint Moment (Don't Skip This)

Before you run /cook:

1git status
2git add -A
3git commit -m "checkpoint: before agent run"

Only then:

1/cook @plans/<micro-plan>.md
If you can't stomach a reset, the task is too big.

Gate 3: Verify (Ruthless)

After every run:

1<package-manager> run build
2<package-manager> run lint
3<package-manager> run test

Then:

1git diff --stat

If the diff is larger than expected: scope was too big -- back to Gate 1.


Sabotage Gate (Automatic Red Flag Scanner)

Many "agent fixes" are hidden quality debt: any, eslint-disable, @ts-ignore, aggressive casts.

Scan the working area:

1rg -n ":\s*any\b|as any|unknown as any|eslint-disable|@ts-ignore|@ts-expect-error" <path>

If Hits: Immediate Protocol

1) Stop (don't "just quickly fix it")

2) Revert to checkpoint:

1git reset --hard HEAD

3) Tighten scope, rewrite plan, back to Gate 1.


Red Flags vs Green Flags

Red Flags (back to Gate 1)

    1. "I had to add any, otherwise it was red"
    2. Lint/TS was weakened or disabled
    3. The plan says "refactor everything" instead of "touch these 6 files"
    4. Config/tooling changes without explicit goal and tests

Green Flags (safe to execute)

    1. Small scope + clear taboos
    2. Every step has acceptance criteria
    3. Verify commands are part of the plan
    4. Rollback is trivial

Bonus: Two-Model Workflow (Quality Lever)

An agent is good at producing -- but not always good at constraining.

Deliberately use a second "reviewer" (human or second model) to:

    1. Shrink scope
    2. Spot red flags
    3. Sharpen acceptance criteria
    4. Enforce verification


The Agent Rule

Make Gemini your power tool, not your architect. Define constraints in GEMINI.md. Force plan-first in read-only mode. Refine plans into micro-scope tasks. Checkpoint before execution. Verify after every run. Use a second reviewer to remove risky steps. That's how you get fast automation without losing control.

The best results come from treating agent-assisted refactoring like any other engineering discipline: plan, constrain, execute in small increments, and verify ruthlessly. The agent does the tedious mechanical work. You own the architecture.