AI-Assisted Development — Internal Guidance

I

Foundations

The non-negotiable starting point: AI is an accelerator, never an autonomous owner.

01

Developers remain responsible

AI systems are accelerators, not autonomous owners of software systems.

Required

The developer who commits the code owns it — correctness, security, maintainability, performance, compliance, testing, and production impact. That ownership is the non-negotiable part. How much review effort it takes to discharge it is a judgment call, and that judgment scales with experience and with the risk of the change.

Calibrate review to risk and seniority. Juniors should lean toward thorough, careful review — it's how the instinct is built. Mid and senior developers are trusted to judge how much scrutiny a change actually warrants: a one-line copy tweak or a well-covered refactor doesn't need the same depth as an auth path or a migration. This is the same judgment we already apply to ordinary code reviews — for very small, low-risk changes, an experienced developer can reasonably decide that an AI code review is enough.

Scrutinize closely

Edge cases and error handling
Concurrency and race behavior
Authorization & authentication paths
SQL correctness and migrations
Infrastructure impact and rollback
Backward compatibility

Don't

Assume code is correct just because it "looks good"
Ship anything risky on AI review alone
Let deadline pressure collapse review on high-risk changes
Confuse "small for me" with "low risk for the system"

II

Context Management

Large-context agents degrade when context becomes noisy, contradictory, or oversized. Discipline beats raw token count.

02

Context window discipline

Keep active context focused. Compact when it grows, start fresh when it drifts.

Recommended Hard limit ~250k tokens

Don't carry unrelated tasks in the same session. Start a new session whenever the architectural concern changes — new feature, debugging a different system, repo switch.

Warning signs of context degradation: contradictory outputs, forgotten constraints, repeated mistakes, stale assumptions, unnecessary rewrites, degraded reasoning quality. When you see them, use /new or /clear.

Reach for /compact before you reset. When a session is getting long but the thread is still worth keeping, /compact summarizes the history into a condensed form and frees up the window — you keep the task's intent without dragging every stale token along. Use it proactively as you approach the limit rather than waiting for quality to drop; reserve /clear and /new for a genuine change of task.

Restart when

Changing tasks or repos
Switching architectural concerns
The model repeats itself
Hallucinations start appearing
Earlier instructions get forgotten

Don't

Push past ~250k tokens hoping it holds
Pile unrelated debugging into one chat
Re-paste large logs into every prompt
Treat a long session as "investment" worth preserving

III

Skills, Plugins, and Tooling

Thin, composable, deterministic. Resist the urge to build one giant autonomous engineering agent.

03

Keep skills and plugins thin

Small, single-purpose tools beat one mega-agent every time.

Recommended

Thin tools reduce hallucinations, improve predictability, simplify debugging, and reduce hidden side effects.

Good shape

Database migration helper
Frontend review skill
Code review assistant
Deployment validator

Bad shape

One giant autonomous engineering agent
"Do everything" skill with sprawling tool access
Skills with hidden side effects you can't audit

04

Marketplace usage

Community skills are powerful — and dangerous. Vet before you install.

Required

Before adopting a community plugin: review source, validate permissions, understand external API access, verify maintenance quality, evaluate security implications.

Maintain an internal catalog so the same vetting isn't redone in every team:

Category	Status
Approved	Safe for general use
Restricted	Requires review
Prohibited	Not allowed

Do not install

Unmaintained plugins
Opaque closed-source tools
Plugins requesting excessive permissions
Anything with unclear telemetry behavior

Always check

Source code on GitHub
Permission scope (filesystem, network, secrets)
Last commit and issue response time
Who's behind it and what's their incentive

05

Internal skills we want

Three high-value categories to invest in first.

Recommended

Frontend design skill. Component generation, accessibility checks, responsive layouts, design-system compliance. Enforces tokens, approved components, a11y standards.

Framework-specific skills (Laravel, Next.js, React, Terraform, Kubernetes). Pin framework versions, maintain internal guidance docs, review generated migrations carefully — outdated patterns and version mismatch are the main risks.

Code review skill. Focus on security risks, missing tests, dead code, code smells, performance anti-patterns, naming, architectural violations.

AI review is supplementary. It does not replace human architectural review, security review, or production readiness validation.

IV

Documentation

Agents perform dramatically better with curated, high-signal docs. Poor docs create hallucinations — full stop.

06

Project documentation is mandatory

Every project ships with the docs an agent needs to be useful on day one.

Required

Every project includes: architecture overview, coding conventions, deployment flow, environment setup, business-domain glossary, API contracts, and testing strategy.

Recommended files: AI_CONTEXT.md, ARCHITECTURE.md, CONTRIBUTING.md.

07

Base your `CLAUDE.md` on Karpathy's

Don't invent your own structure — start from the canonical file and adapt it.

Required

The canonical base: every project's CLAUDE.md starts from the Andrej Karpathy CLAUDE.md. Copy it, then adapt the project-specific details. Keep its structure intact so every repo feels familiar to both people and agents.

Adapt these sections to your project: architecture summary, key domain rules, coding standards, forbidden patterns, deployment constraints, common pitfalls, testing expectations.

Less is more. Large low-quality documentation actively harms model performance. Keep these files concise, curated, and updated when they drift.

V

Workflow

Where the day-to-day rubber meets the road: worktrees, parallel sessions, incremental generation.

08

Use git worktrees with Claude Code

One worktree per task, one Claude Code session per worktree.

Recommended

AI agents touch many files, explore alternative implementations, and create broad diffs. A worktree gives each session its own checkout on its own branch — sharing the same .git history but with a fully isolated working directory. That isolation is what makes running sessions in parallel (∥) safe: edits in one session never touch the files of another.

Claude Code creates worktrees for you — you don't drive git worktree by hand. Pass --worktree (or -w) and it creates an isolated worktree under .claude/worktrees/<name>/ on branch worktree-<name>, then starts the session inside it:

First time in a repo, run claude once and accept the workspace-trust prompt.
Start an isolated session: claude --worktree feature-auth (omit the name and Claude generates one like bright-running-fox).
Open another terminal and run claude --worktree bugfix-123 for parallel, non-colliding work. Or just ask Claude to "work in a worktree" mid-session.
On exit, Claude auto-removes the worktree if it's clean, or prompts you to keep/remove it when there are changes.

Two setup steps that save pain: add .claude/worktrees/ to .gitignore so worktree contents don't show as untracked, and add a .worktreeinclude file (gitignore syntax) listing the gitignored files — .env, local certs, config/secrets.json — that should be copied into each new worktree, since a fresh checkout won't have them.

Best practices

Prefer --worktree over hand-rolled git worktree
Name worktrees after the task; one task per worktree
Use .worktreeinclude for env/secret files
Branch a session straight off a PR: claude --worktree "#1234"
Set isolation: worktree on subagents that edit in parallel
Re-init each worktree's dev env (deps, venv) — it's a fresh checkout

Avoid

Using --worktree before accepting repo trust (it errors out)
Committing the .claude/worktrees/ directory
Assuming .env carries over without .worktreeinclude
Leaving -p (non-interactive) worktrees uncleaned — remove with git worktree remove

∥

Run sessions in parallel

The biggest throughput multiplier is also the most under-used: stop waiting on a single thread.

New Recommended

AI coding agents are most powerful when treated as a fleet, not a chat partner. Generation has unpredictable latency, and most coding work blocks on review rather than on the model. Sitting and watching a single session is the easiest way to leave throughput on the floor.

Pair this with Worktrees (08): each parallel task gets its own filesystem so the agents don't trip over each other.

Do

Fan out across multiple worktrees / chats
Use background subagents for independent investigation
Spawn parallel sessions for orthogonal work
Return to each when it's ready — context-switch freely
Treat agents like async coworkers, not chat windows

Don't

Sit and wait for one response before starting the next thing
Stack unrelated tasks into one session for "efficiency"
Hand-hold step by step when the spec is already clear
Use parallelism to skip review — each thread still gets reviewed

Rule of thumb: if you're idle while a session "thinks", you should already be in a second session. The cost of context-switching is far smaller than the cost of serial blocking.

11

The recommended dev flow

Ten checkpoints from task to merge — and how to run many of them at once.

Recommended

There are two things to hold in your head. First, the path a single change travels — ten steps, grouped into four phases. Second, that you don't walk one task down that path at a time: you start several and let them overlap.

One task, ten steps

Plan

1Define the task
2Provide context
3Generate a plan

Build

4Generate incrementally

Verify

5Review the diff
6Run tests
7Lint & static analysis
8Security check

Land

9Human review
10Merge

Warm numbers are your steps; muted numbers run on the agent. Notice the shape: you bookend the work and check the diff, while the agent owns the long middle stretches.

Why you run several at once

Because your steps are sparse, a single task leaves you idle while the agent generates and runs checks. So give each task its own worktree + session (see Worktrees (08)) and stagger them. When the agent is busy in one lane, you're reviewing or signing off in another — you become the scheduler, and the "needs-you" moments naturally fan out so you're rarely pulled two ways at once.

needs you — kick off, review, sign off (steps 1–2, 5, 9–10) agent working — you're free (steps 3–4, 6–8) time →

feature-auth

plan + build tests + checks

bugfix-123

plan + build tests + checks

refactor-api

plan + build tests + checks

Three worktrees, staggered. Scan any vertical slice: at most one lane needs you, the others run on the agent. Eliminating that idle time is the whole point — see Run sessions in parallel (∥).

Across multiple projects it's the same picture one level up: a set of lanes per repo, each with its own CLAUDE.md. Start a fresh session per project so contexts stay clean (see Context discipline (02)), and rotate your attention across projects exactly as you rotate across lanes.

12

Incremental generation over massive generation

Small steps improve correctness, reduce hallucinations, and stay reviewable.

Recommended

Prefer

Small iterations
Focused prompts
Narrow diffs
Plan → implement → review loops

Avoid

"Build the entire system" prompts
Giant autonomous sessions
Multi-hour unsupervised coding

VI

Security & Infrastructure Access

The blast radius of an autonomous agent with prod credentials is enormous. Default to least privilege, always.

09

Server & database access

Default policy: no unrestricted production, DB, or cloud-admin access.

Required

Strongly recommended

Read-only access by default
Sandbox environments
Scoped, temporary credentials
Audited operations

Never allow

Unrestricted root access
Unrestricted production write access
Unrestricted secret access

If production access is genuinely required: minimize permissions, require human approval, enable full audit logging, restrict destructive operations, isolate environments.

10

API key management

AI tools never receive broad, unrestricted secrets.

Required

Never place production secrets, master API keys, or unrestricted cloud credentials directly into prompts or persistent agent memory.

Preferred

Scoped tokens
Ephemeral credentials
Secret managers
Environment-specific permissions

Also

Rotate credentials regularly
Monitor agent actions
Log sensitive operations
Separate dev and prod environments

VII

Governance & Policy

Explicit rules, classified data, and audit trails — so we can answer "what happened" and "who decided".

13

Clear company policy is required

AI usage operates under explicit organizational policy — not folk wisdom.

Required

The policy defines: approved tools, prohibited tools, data handling rules, security requirements, review requirements, production access rules, compliance obligations, and accountability expectations.

14

Data classification rules

Not all data may be shared with AI systems.

Required

Classification	AI usage
Public	Allowed
Internal	Allowed with approved tools
Confidential	Restricted
Regulated	Requires explicit approval

Special attention required for: customer data, credentials, healthcare data, financial information, legal documents, proprietary algorithms.

15

Logging & auditability

If we can't see what an agent did, we can't trust what it did.

Recommended

Maintain repository audit trails, commit attribution, AI-assisted PR tagging, infrastructure action logs, and credential usage tracking.

VIII

Quality & Measurement

Same standards as human-written code. And actually measure whether AI is paying off.

16

AI code meets normal engineering standards

No exemption for being "AI-generated". Same tests, linting, typing, docs, observability.

Required

Required: tests, linting, typing, documentation, observability, security review, performance validation.

17

Benchmark AI usage

Don't assume productivity gains. Measure them.

Recommended

Track: delivery speed, defect rates, rollback frequency, review time, onboarding speed, developer satisfaction. If the numbers don't move, the workflow needs to change — not the policy.

IX

Organizational Health

Shared knowledge and the long game: keeping our engineers genuinely capable, not just AI-adjacent.

18

Share patterns across teams

Useful prompts, internal skills, workflow patterns, context files, and review checklists belong in a shared library.

Recommended

A working prompt is institutional knowledge. So is a context file that's been iterated on. So is a worktree convention. Surface these so the next team gets the benefit.

19

Maintain human expertise

Avoid the long-tail risk: shallow understanding, atrophied debugging, architectural weakness.

Required

Keep strengthening system design, debugging, performance analysis, security engineering, and code comprehension. AI is a multiplier on capability — it does not create it.

Engineering with AI agents, not around them.

Run sessions in parallel. Don't wait on a single thread.

Ten rules to live by.

Developers remain responsible

Scrutinize closely

Don't

Context window discipline

Restart when

Don't

Keep skills and plugins thin

Good shape

Bad shape

Marketplace usage

Do not install

Always check

Internal skills we want

Project documentation is mandatory

Base your CLAUDE.md on Karpathy's

Use git worktrees with Claude Code

Best practices

Avoid

Run sessions in parallel

Do

Don't

The recommended dev flow

Plan

Build

Verify

Land

Incremental generation over massive generation

Prefer

Avoid

Server & database access

Strongly recommended

Never allow

API key management

Preferred

Also

Clear company policy is required

Data classification rules

Logging & auditability

AI code meets normal engineering standards

Benchmark AI usage

Share patterns across teams

Maintain human expertise

Base your `CLAUDE.md` on Karpathy's