AI-Assisted Development
Internal Policy · v1.0 Last updated 21 May 2026

Engineering with AI agents, not around them.

A practical playbook for how we use Claude Code and similar coding agents at the company. Twenty-one principles that keep us fast, safe, and accountable — without slipping into the bad habits of unsupervised autonomous generation.

20 principles
8 categories
~15 min read
10 rules to live by
The short version

Ten rules to live by.

01

Developers remain responsible for all generated code.

02

Keep sessions focused. Restart when context degrades.

03

Avoid extremely large contexts above ~250k tokens.

04

Keep plugins and skills thin, composable, deterministic.

05

Use worktrees so parallel work doesn't contaminate.

06

Never grant unrestricted production or root access.

07

Never expose broad secrets or master API keys.

08

Review all AI-generated code with the same rigor.

09

Maintain strong, AI-oriented project documentation.

10

Operate under explicit organizational AI policy.

No principles match that search.
I

Foundations

The non-negotiable starting point: AI is an accelerator, never an autonomous owner.

01

Developers remain responsible

AI systems are accelerators, not autonomous owners of software systems.

Required

The developer who commits the code owns it — correctness, security, maintainability, performance, compliance, testing, and production impact. That ownership is the non-negotiable part. How much review effort it takes to discharge it is a judgment call, and that judgment scales with experience and with the risk of the change.

Calibrate review to risk and seniority. Juniors should lean toward thorough, careful review — it's how the instinct is built. Mid and senior developers are trusted to judge how much scrutiny a change actually warrants: a one-line copy tweak or a well-covered refactor doesn't need the same depth as an auth path or a migration. This is the same judgment we already apply to ordinary code reviews — for very small, low-risk changes, an experienced developer can reasonably decide that an AI code review is enough.
Scrutinize closely
  • Edge cases and error handling
  • Concurrency and race behavior
  • Authorization & authentication paths
  • SQL correctness and migrations
  • Infrastructure impact and rollback
  • Backward compatibility
Don't
  • Assume code is correct just because it "looks good"
  • Ship anything risky on AI review alone
  • Let deadline pressure collapse review on high-risk changes
  • Confuse "small for me" with "low risk for the system"
II

Context Management

Large-context agents degrade when context becomes noisy, contradictory, or oversized. Discipline beats raw token count.

02

Context window discipline

Keep active context focused. Compact when it grows, start fresh when it drifts.

Recommended Hard limit ~250k tokens

Don't carry unrelated tasks in the same session. Start a new session whenever the architectural concern changes — new feature, debugging a different system, repo switch.

Warning signs of context degradation: contradictory outputs, forgotten constraints, repeated mistakes, stale assumptions, unnecessary rewrites, degraded reasoning quality. When you see them, use /new or /clear.
Reach for /compact before you reset. When a session is getting long but the thread is still worth keeping, /compact summarizes the history into a condensed form and frees up the window — you keep the task's intent without dragging every stale token along. Use it proactively as you approach the limit rather than waiting for quality to drop; reserve /clear and /new for a genuine change of task.
Restart when
  • Changing tasks or repos
  • Switching architectural concerns
  • The model repeats itself
  • Hallucinations start appearing
  • Earlier instructions get forgotten
Don't
  • Push past ~250k tokens hoping it holds
  • Pile unrelated debugging into one chat
  • Re-paste large logs into every prompt
  • Treat a long session as "investment" worth preserving
III

Skills, Plugins, and Tooling

Thin, composable, deterministic. Resist the urge to build one giant autonomous engineering agent.

03

Keep skills and plugins thin

Small, single-purpose tools beat one mega-agent every time.

Recommended

Thin tools reduce hallucinations, improve predictability, simplify debugging, and reduce hidden side effects.

Good shape
  • Database migration helper
  • Frontend review skill
  • Code review assistant
  • Deployment validator
Bad shape
  • One giant autonomous engineering agent
  • "Do everything" skill with sprawling tool access
  • Skills with hidden side effects you can't audit
04

Marketplace usage

Community skills are powerful — and dangerous. Vet before you install.

Required

Before adopting a community plugin: review source, validate permissions, understand external API access, verify maintenance quality, evaluate security implications.

Maintain an internal catalog so the same vetting isn't redone in every team:

CategoryStatus
ApprovedSafe for general use
RestrictedRequires review
ProhibitedNot allowed
Do not install
  • Unmaintained plugins
  • Opaque closed-source tools
  • Plugins requesting excessive permissions
  • Anything with unclear telemetry behavior
Always check
  • Source code on GitHub
  • Permission scope (filesystem, network, secrets)
  • Last commit and issue response time
  • Who's behind it and what's their incentive
05

Internal skills we want

Three high-value categories to invest in first.

Recommended

Frontend design skill. Component generation, accessibility checks, responsive layouts, design-system compliance. Enforces tokens, approved components, a11y standards.

Framework-specific skills (Laravel, Next.js, React, Terraform, Kubernetes). Pin framework versions, maintain internal guidance docs, review generated migrations carefully — outdated patterns and version mismatch are the main risks.

Code review skill. Focus on security risks, missing tests, dead code, code smells, performance anti-patterns, naming, architectural violations.

AI review is supplementary. It does not replace human architectural review, security review, or production readiness validation.
IV

Documentation

Agents perform dramatically better with curated, high-signal docs. Poor docs create hallucinations — full stop.

06

Project documentation is mandatory

Every project ships with the docs an agent needs to be useful on day one.

Required

Every project includes: architecture overview, coding conventions, deployment flow, environment setup, business-domain glossary, API contracts, and testing strategy.

Recommended files: AI_CONTEXT.md, ARCHITECTURE.md, CONTRIBUTING.md.

07

Base your CLAUDE.md on Karpathy's

Don't invent your own structure — start from the canonical file and adapt it.

Required
The canonical base: every project's CLAUDE.md starts from the Andrej Karpathy CLAUDE.md. Copy it, then adapt the project-specific details. Keep its structure intact so every repo feels familiar to both people and agents.

Adapt these sections to your project: architecture summary, key domain rules, coding standards, forbidden patterns, deployment constraints, common pitfalls, testing expectations.

Less is more. Large low-quality documentation actively harms model performance. Keep these files concise, curated, and updated when they drift.
V

Workflow

Where the day-to-day rubber meets the road: worktrees, parallel sessions, incremental generation.

08

Use git worktrees with Claude Code

One worktree per task, one Claude Code session per worktree.

Recommended

AI agents touch many files, explore alternative implementations, and create broad diffs. A worktree gives each session its own checkout on its own branch — sharing the same .git history but with a fully isolated working directory. That isolation is what makes running sessions in parallel (∥) safe: edits in one session never touch the files of another.

Claude Code creates worktrees for you — you don't drive git worktree by hand. Pass --worktree (or -w) and it creates an isolated worktree under .claude/worktrees/<name>/ on branch worktree-<name>, then starts the session inside it:

  1. First time in a repo, run claude once and accept the workspace-trust prompt.
  2. Start an isolated session: claude --worktree feature-auth (omit the name and Claude generates one like bright-running-fox).
  3. Open another terminal and run claude --worktree bugfix-123 for parallel, non-colliding work. Or just ask Claude to "work in a worktree" mid-session.
  4. On exit, Claude auto-removes the worktree if it's clean, or prompts you to keep/remove it when there are changes.
Two setup steps that save pain: add .claude/worktrees/ to .gitignore so worktree contents don't show as untracked, and add a .worktreeinclude file (gitignore syntax) listing the gitignored files — .env, local certs, config/secrets.json — that should be copied into each new worktree, since a fresh checkout won't have them.
Best practices
  • Prefer --worktree over hand-rolled git worktree
  • Name worktrees after the task; one task per worktree
  • Use .worktreeinclude for env/secret files
  • Branch a session straight off a PR: claude --worktree "#1234"
  • Set isolation: worktree on subagents that edit in parallel
  • Re-init each worktree's dev env (deps, venv) — it's a fresh checkout
Avoid
  • Using --worktree before accepting repo trust (it errors out)
  • Committing the .claude/worktrees/ directory
  • Assuming .env carries over without .worktreeinclude
  • Leaving -p (non-interactive) worktrees uncleaned — remove with git worktree remove

Run sessions in parallel

The biggest throughput multiplier is also the most under-used: stop waiting on a single thread.

New Recommended

AI coding agents are most powerful when treated as a fleet, not a chat partner. Generation has unpredictable latency, and most coding work blocks on review rather than on the model. Sitting and watching a single session is the easiest way to leave throughput on the floor.

Pair this with Worktrees (08): each parallel task gets its own filesystem so the agents don't trip over each other.

Do
  • Fan out across multiple worktrees / chats
  • Use background subagents for independent investigation
  • Spawn parallel sessions for orthogonal work
  • Return to each when it's ready — context-switch freely
  • Treat agents like async coworkers, not chat windows
Don't
  • Sit and wait for one response before starting the next thing
  • Stack unrelated tasks into one session for "efficiency"
  • Hand-hold step by step when the spec is already clear
  • Use parallelism to skip review — each thread still gets reviewed
Rule of thumb: if you're idle while a session "thinks", you should already be in a second session. The cost of context-switching is far smaller than the cost of serial blocking.
11

The recommended dev flow

Ten checkpoints from task to merge — and how to run many of them at once.

Recommended

There are two things to hold in your head. First, the path a single change travels — ten steps, grouped into four phases. Second, that you don't walk one task down that path at a time: you start several and let them overlap.

One task, ten steps
Plan
  1. 1Define the task
  2. 2Provide context
  3. 3Generate a plan
Build
  1. 4Generate incrementally
Verify
  1. 5Review the diff
  2. 6Run tests
  3. 7Lint & static analysis
  4. 8Security check
Land
  1. 9Human review
  2. 10Merge

Warm numbers are your steps; muted numbers run on the agent. Notice the shape: you bookend the work and check the diff, while the agent owns the long middle stretches.

Why you run several at once

Because your steps are sparse, a single task leaves you idle while the agent generates and runs checks. So give each task its own worktree + session (see Worktrees (08)) and stagger them. When the agent is busy in one lane, you're reviewing or signing off in another — you become the scheduler, and the "needs-you" moments naturally fan out so you're rarely pulled two ways at once.

needs you — kick off, review, sign off (steps 1–2, 5, 9–10) agent working — you're free (steps 3–4, 6–8) time →
feature-auth
plan + build tests + checks
bugfix-123
plan + build tests + checks
refactor-api
plan + build tests + checks

Three worktrees, staggered. Scan any vertical slice: at most one lane needs you, the others run on the agent. Eliminating that idle time is the whole point — see Run sessions in parallel (∥).

Across multiple projects it's the same picture one level up: a set of lanes per repo, each with its own CLAUDE.md. Start a fresh session per project so contexts stay clean (see Context discipline (02)), and rotate your attention across projects exactly as you rotate across lanes.
12

Incremental generation over massive generation

Small steps improve correctness, reduce hallucinations, and stay reviewable.

Recommended
Prefer
  • Small iterations
  • Focused prompts
  • Narrow diffs
  • Plan → implement → review loops
Avoid
  • "Build the entire system" prompts
  • Giant autonomous sessions
  • Multi-hour unsupervised coding
VI

Security & Infrastructure Access

The blast radius of an autonomous agent with prod credentials is enormous. Default to least privilege, always.

09

Server & database access

Default policy: no unrestricted production, DB, or cloud-admin access.

Required
Strongly recommended
  • Read-only access by default
  • Sandbox environments
  • Scoped, temporary credentials
  • Audited operations
Never allow
  • Unrestricted root access
  • Unrestricted production write access
  • Unrestricted secret access

If production access is genuinely required: minimize permissions, require human approval, enable full audit logging, restrict destructive operations, isolate environments.

10

API key management

AI tools never receive broad, unrestricted secrets.

Required

Never place production secrets, master API keys, or unrestricted cloud credentials directly into prompts or persistent agent memory.

Preferred
  • Scoped tokens
  • Ephemeral credentials
  • Secret managers
  • Environment-specific permissions
Also
  • Rotate credentials regularly
  • Monitor agent actions
  • Log sensitive operations
  • Separate dev and prod environments
VII

Governance & Policy

Explicit rules, classified data, and audit trails — so we can answer "what happened" and "who decided".

13

Clear company policy is required

AI usage operates under explicit organizational policy — not folk wisdom.

Required

The policy defines: approved tools, prohibited tools, data handling rules, security requirements, review requirements, production access rules, compliance obligations, and accountability expectations.

14

Data classification rules

Not all data may be shared with AI systems.

Required
ClassificationAI usage
PublicAllowed
InternalAllowed with approved tools
ConfidentialRestricted
RegulatedRequires explicit approval

Special attention required for: customer data, credentials, healthcare data, financial information, legal documents, proprietary algorithms.

15

Logging & auditability

If we can't see what an agent did, we can't trust what it did.

Recommended

Maintain repository audit trails, commit attribution, AI-assisted PR tagging, infrastructure action logs, and credential usage tracking.

VIII

Quality & Measurement

Same standards as human-written code. And actually measure whether AI is paying off.

16

AI code meets normal engineering standards

No exemption for being "AI-generated". Same tests, linting, typing, docs, observability.

Required

Required: tests, linting, typing, documentation, observability, security review, performance validation.

17

Benchmark AI usage

Don't assume productivity gains. Measure them.

Recommended

Track: delivery speed, defect rates, rollback frequency, review time, onboarding speed, developer satisfaction. If the numbers don't move, the workflow needs to change — not the policy.

IX

Organizational Health

Shared knowledge and the long game: keeping our engineers genuinely capable, not just AI-adjacent.

18

Share patterns across teams

Useful prompts, internal skills, workflow patterns, context files, and review checklists belong in a shared library.

Recommended

A working prompt is institutional knowledge. So is a context file that's been iterated on. So is a worktree convention. Surface these so the next team gets the benefit.

19

Maintain human expertise

Avoid the long-tail risk: shallow understanding, atrophied debugging, architectural weakness.

Required

Keep strengthening system design, debugging, performance analysis, security engineering, and code comprehension. AI is a multiplier on capability — it does not create it.