Skip to content
researchApril 5, 2026·16 min read

AGENTS.md at Scale: Enterprise Guide for 180+ Engineers and Hundreds of Repos

Why most AGENTS.md files make agents worse, the 4-question filter for context files, and how to manage agent context across hundreds of repos

agentic-aiagents-mdenterprisecontext-managementclaude-codecodex

title: "AGENTS.md at Scale: Enterprise Guide for 180+ Engineers and Hundreds of Repos" date: "2026-04-05" description: "Why most AGENTS.md files make agents worse, the 4-question filter for context files, and how to manage agent context across hundreds of repos" tags: ["agentic-ai", "agents-md", "enterprise", "context-management", "claude-code", "codex"] type: research topic: "Agent Context Management" author: "Cash" aiModel: "research" draft: false

AGENTS.md at Scale: Enterprise Guide for 180+ Engineers and Hundreds of Repos

Context: VP Engineering, 180 devs, React + Golang, GCP/K8s/Spanner/Kafka, 10+ deploys/day Tools: Claude Code, Codex, OpenAI SDK, Claude Agent SDK, GCP Agentic Workloads


The Headlines

  1. AGENTS.md works - but badly written ones make agents worse. ETH Zurich research: context files reduced task success by making agents slower and more expensive, unless they're well-crafted.
  2. Most AGENTS.md files are junk drawers. Generic rules, folder structures the agent can see, linter-enforced style. Delete that noise.
  3. The 4-question filter: For every line, ask: (1) Failure-backed? (2) Tool-enforceable? (3) Decision-encoding? (4) Triggerable? If no to all, delete.
  4. Centralize what's common, specialize what's unique. Organization-wide template for conventions, repo-specific for architecture and commands.
  5. Lint your AGENTS.md files. They drift. They rot. They contradict each other. Automate quality checks.

Part 1: Why AGENTS.md Matters (The Research)

The ETH Zurich Study

In February 2026, ETH Zurich published a study of 2,303 agent context files across 1,925 repositories. The findings were stark:

"Context files reduce task success rates compared to providing no repository context, while increasing inference cost by over 20%."

Let that land. Bad context files make agents perform worse than nothing - and cost more.

But there's nuance. The study found that well-crafted context files improved performance by ~4%. The problem: most aren't well-crafted.

What they found:

MetricFinding
Median length335-535 words depending on tool
Readability"Very difficult" (FRE 16-40, academic/legal level)
Update frequency59-67% modified multiple times
Update interval22-70 hours (short bursts)
DeletionsMinimal (files grow, never shrink)

Content analysis:

Category% of Files
Testing75.0%
Implementation Details69.9%
Architecture67.7%
Development Process63.3%
Build and Run62.3%
System Overview59.0%
Security14.5%
Performance14.5%

The gap is obvious: teams optimize for making agents functional, but few provide guardrails for security or performance.

Source: Agent READMEs: An Empirical Study (arXiv)


The Augment Analysis

Augment analyzed AGENTS.md files across the ecosystem. Their diagnosis:

"Your AGENTS.md has how many instructions? More rules, worse output. Because we don't trust the agent."

The pattern: developers write 200-line AGENTS.md files explaining folder structure because they don't believe the agent can figure it out. But modern agents can see the codebase. They don't need you to explain what's already visible.

What agents can already see:

  • Code structure (file tree, imports, dependencies)
  • Tech stack (package.json, go.mod, Cargo.toml)
  • Existing patterns (by reading the code)
  • Git history
  • Linter configs

What agents can't see:

  • Build and test commands (unless documented)
  • Deploy steps
  • Team conventions that live in heads, not files
  • Why that weird architecture decision was made
  • Known gotchas

The mistake: Using category 2 tools for category 1 problems. Writing AGENTS.md files that explain what the agent could discover by reading the repo.

Source: Augment: Your Agent's Context Is a Junk Drawer


The Vercel Evals

Vercel ran benchmarks on Next.js 16 API tasks. They compared two approaches:

  1. Skills (on-demand retrieval): Agent has access to docs, retrieves what it needs.
  2. AGENTS.md (passive context): Compressed docs index in a single file.

Result: Skills produced zero improvement. The agent never bothered to look at the docs.

Then they tried the "dumb" approach: compressed the entire docs index into an 8KB AGENTS.md file. Not full documentation - just an index pointing to retrievable files.

100% pass rate across build, lint, and test.

40KB compressed to 8KB. Perfect score. The dumb approach won.

Lesson: Agents are lazy. They won't retrieve unless you put it in their face. A well-structured index beats comprehensive documentation.

Source: Vercel: AGENTS.md Outperforms Skills


Part 2: What Belongs in AGENTS.md (The Filter)

The 4-Question Test

From Jan-Niklas Wortmann's analysis, who went from 80+ lines of rules to 30 lines of "dramatically better behavior":

For each line in your AGENTS.md, ask:

  1. Failure-backed? - Can you point to a specific failure this prevents? If no, delete.
  2. Tool-enforceable? - Could a linter, formatter, or CI check enforce this? If yes, move it there, don't duplicate in AGENTS.md.
  3. Decision-encoding? - Does this encode a team decision that isn't obvious from the code? If no, delete.
  4. Triggerable? - Is this actionable at a specific moment, or is it generic advice? If generic, delete.

If a line fails all four, delete it.

Source: Wordman: Agent Instructions


What to DELETE

These almost never belong:

DeleteWhy
Folder structure descriptionsAgent can see it by reading the repo
Tech stack restatementsIt's in package.json, go.mod, etc.
Linter-enforced style rules"Use tabs" when .editorconfig says spaces - agent sees the config
Generic best practices"Write clean code" - agent was trained on the internet
SOLID principles, DRY, etc.Trained on these. Redundant.
API patterns visible in codeAgent can read existing implementations

The Augment rule:

"Never send an LLM to do a linter's job."


What to KEEP

These belong in AGENTS.md:

KeepExample
Build/test/lint commandsmake test, npm run build:prod
Deploy stepsHow to deploy to staging, production
Environment setupDev environment gotchas, secrets management
Team conventions in heads"We always use Result types for errors in Go services"
Architecture decisionsWhy the trading engine is separate from the API layer
Known gotchas"Don't touch the legacy pricing module - it's fragile"
Security requirements"All new endpoints must use the auth middleware"
Performance constraints"Trading API must respond in under 50ms p99"

The Structure That Works

From the ETH Zurich study: successful AGENTS.md files follow a shallow hierarchy:

  • Single H1 heading (treat as unified document)
  • 6-7 H2 sections for major topics
  • Some H3/H4 for detail
  • Rarely deeper

Recommended structure:

# Project Name AGENTS.md
 
## Build & Run
[Commands, scripts, environment setup]
 
## Test
[How to run tests, coverage requirements]
 
## Architecture
[High-level design, key components, why decisions were made]
 
## Conventions
[Team-specific patterns not visible in linter configs]
 
## Guardrails
[Things not to touch, security requirements, performance constraints]
 
## Deploy
[Staging, production steps, CI/CD pointers]

Keep it under 300 lines. Under 200 is better. Every line costs attention budget.


Part 3: Managing AGENTS.md Across Hundreds of Repos

The Problem

You have 100+ repos. Maybe 200+. Each needs an AGENTS.md. But:

  • Consistency problem: Different teams write different conventions
  • Drift problem: Files rot, contradict each other, reference obsolete commands
  • Maintenance problem: Who updates all 200 files when a convention changes?
  • Discovery problem: How do you know what's in each AGENTS.md?

The Solution: Template Inheritance

Three-layer model:

┌─────────────────────────────────────────────────────────┐
│ Layer 1: ORG-AGENTS.md (global template)                │
│ - Organization-wide conventions                         │
│ - Security requirements                                 │
│ - Performance standards                                 │
│ - Tool versions, CI pointers                            │
│ - One file, maintained by platform team                 │
└─────────────────────────────────────────────────────────┘
                           │
                           │ imported by
                           ▼
┌─────────────────────────────────────────────────────────┐
│ Layer 2: AGENTS.md (repo-specific)                      │
│ - Build/test/deploy commands                            │
│ - Architecture overview                                 │
│ - Repo-specific conventions                             │
│ - Known gotchas                                         │
│ - One per repo, maintained by repo owner                │
└─────────────────────────────────────────────────────────┘
                           │
                           │ references
                           ▼
┌─────────────────────────────────────────────────────────┐
│ Layer 3: docs/ (detailed references)                    │
│ - Architecture decision records (ADRs)                  │
│ - API documentation                                     │
│ - Runbooks                                              │
│ - Detailed procedures                                   │
│ - Linked from AGENTS.md, not embedded                   │
└─────────────────────────────────────────────────────────┘

Layer 1: ORG-AGENTS.md (The Global Template)

What goes here:

  • Organization-wide coding standards
  • Security requirements (all endpoints must use auth middleware)
  • Performance standards (all trading APIs must respond under 50ms)
  • Approved tool versions (Go 1.24, Node 22, React 19)
  • CI/CD pointers (all repos use GitHub Actions, here's the workflow)
  • Conventional commit format
  • PR review requirements

What does NOT go here:

  • Repo-specific commands (each repo has different build)
  • Repo-specific architecture (trading engine vs web frontend)
  • Repo-specific gotchas

How it works:

Option A: Monorepo approach - One AGENTS.md at root, per-package sections

  • Works if you're already monorepo
  • Requires careful organization

Option B: Template inheritance - Each repo imports org template

  • ORG-AGENTS.md lives in a template repo
  • Each repo's AGENTS.md starts with: "See ORG-AGENTS.md for org-wide conventions. Repo-specific below:"
  • Agent reads both

Option C: Centralized context server - Agents fetch org context at runtime

  • ORG-AGENTS.md served via HTTP
  • Agents configured to fetch at session start
  • Works well with GCP Agentic Workloads and Claude/OpenAI SDKs

Recommendation for your stack (GCP Agentic Workloads):

Use Option C. Configure your Vertex AI agents to fetch org context from a central location:

Agent initialization:
1. Fetch https://internal.yourcompany.com/agents/org-context.md
2. Read repo's local AGENTS.md
3. Merge: org context first, repo context overlays

This way:

  • One source of truth for org conventions
  • Repo AGENTS.md files stay lean
  • Updates propagate immediately

Layer 2: AGENTS.md (The Repo-Specific File)

Template for each repo:

# [repo-name] AGENTS.md
 
> Org-wide conventions: See [ORG-AGENTS.md](link). This file is repo-specific.
 
## Build & Run
[Specific commands for this repo]
 
## Test
[Specific test commands, coverage targets]
 
## Architecture
[This repo's architecture, key components]
 
## Conventions
[Repo-specific conventions that differ from or extend org conventions]
 
## Guardrails
[Things not to touch in this repo, security gotchas]
 
## Deploy
[Staging/production steps for this repo]
 
## Gotchas
[Known issues, legacy code to avoid, etc.]

Size target: 100-200 lines. If larger, split into docs/ and link.


Layer 3: docs/ (Detailed References)

What goes here:

  • Architecture Decision Records (ADRs) - why decisions were made
  • API documentation
  • Runbooks for operations
  • Detailed procedures that would bloat AGENTS.md

How to link:

## Architecture
See [docs/architecture.md](docs/architecture.md) for full architecture overview.
Key points:
- Trading engine is separate from API layer for latency reasons
- All state lives in Cloud Spanner
- Kafka for event streaming

Why this matters: AGENTS.md is the index. docs/ is the library. Agents are lazy - they'll read what's in front of them. Put the index in AGENTS.md, not the full documentation.


Part 4: Maintaining Consistency

The Drift Problem

From the ETH Zurich study:

  • Agent context files evolve through additions, not deletions
  • Median update interval: 22-70 hours (short bursts)
  • Files grow over time, never shrink

Result: AGENTS.md files rot. They accumulate obsolete commands, reference deleted files, contradict newer conventions.

Solution: Lint Your AGENTS.md

What to check:

CheckHowWhy
Line countFail if > 300 linesForce pruning
Word countWarn if > 1000 wordsContext window cost
File referencesCheck if referenced files existCommands may reference deleted files
Generic phrasesFlag "follow best practices", "be helpful"Weak instruction, no value
ContradictionsFlag "use tabs" + "use spaces" in same fileAgents silently pick one
Required sectionsFail if missing Build, Test, DeployEssential context
Security sectionWarn if missing14.5% have this - should be higher

Tooling:

Option A: Vale (prose linter)

  • Define patterns for weak phrases
  • Configure severity (suggestion/warning/error)
  • Run in CI

Option B: Custom script

  • Shell script for structural checks
  • Add to pre-commit hooks and CI

Option C: Existing linters

  • markdownlint for structure
  • Custom rules for AGENTS.md-specific checks

Example script:

#!/bin/bash
# AGENTS.md linter
 
FILE="AGENTS.md"
MAX_LINES=300
 
# Line count
line_count=$(wc -l < "$FILE" | tr -d ' ')
if [ "$line_count" -gt "$MAX_LINES" ]; then
  echo "ERROR: AGENTS.md is $line_count lines (max $MAX_LINES)"
  echo "Suggestion: move detailed procedures into docs/ and link from AGENTS.md"
  exit 1
fi
 
# Required sections
required=("Build" "Test" "Architecture" "Deploy")
for section in "${required[@]}"; do
  if ! grep -q "## .*$section" "$FILE"; then
    echo "ERROR: Missing required section: $section"
    exit 1
  fi
done
 
# Generic phrases (weak instruction)
if grep -qiE "(follow best practices|be helpful|be concise|write clean code)" "$FILE"; then
  echo "WARNING: Generic phrases detected. Replace with specific instructions."
fi
 
echo "PASS: AGENTS.md checks passed"

Run in CI:

  • Add to GitHub Actions / Cloud Build
  • Fail PR if AGENTS.md doesn't pass
  • Prevents drift from entering

Source: DEV: Practical Linting for Agent Context Files


The Update Cadence

When to update AGENTS.md:

TriggerWhoWhat
New build commandDev making changeAdd to Build section
Architecture decisionTech leadAdd to Architecture, create ADR
Incident caused by agent mistakeAnyoneAdd guardrail to prevent recurrence
Security requirementSecurity teamUpdate ORG-AGENTS.md
Quarterly auditPlatform teamReview all AGENTS.md for drift

The rule: Every time an agent makes a mistake that a better instruction would have prevented, update AGENTS.md. If an instruction didn't prevent the mistake, delete or rewrite it.


Part 5: What to Generalize vs. Specialize

Generalize (ORG-AGENTS.md)

CategoryExample
Security requirementsAll endpoints must use auth middleware
Performance standardsAll trading APIs must respond under 50ms p99
Coding conventionsUse conventional commits (feat/fix/refactor)
Tool versionsGo 1.24, Node 22, React 19
CI/CD pointersAll repos use GitHub Actions
Review requirementsAll PRs require 2 approvals
Testing standardsAll repos must have >80% coverage
Documentation standardsAll repos must have AGENTS.md

Specialize (repo AGENTS.md)

CategoryExample
Build commandsmake build, npm run build:prod
Test commandsmake test, go test ./..., npm test
ArchitectureTrading engine is separate from API layer
Repo-specific conventionsUse Result types for errors in this repo
Known gotchasDon't touch legacy pricing module
Deploy stepskubectl apply -f staging.yaml
Environment setupRun scripts/setup-env.sh first

The Test

Ask: "Does this apply to every repo in the org?"

  • Yes → ORG-AGENTS.md
  • No → repo AGENTS.md
  • Depends → ORG-AGENTS.md with override capability in repo AGENTS.md

Part 6: Tool-Specific Notes

Claude Code

  • Reads CLAUDE.md from repo root
  • Also reads ~/.claude/CLAUDE.md for user-level context
  • Reads recursively up directory tree (can have CLAUDE.md in subdirectories)
  • Priority: repo > parent directory > user home

For your stack: Create CLAUDE.md as symlink to AGENTS.md, or maintain both if conventions differ.

OpenAI Codex

  • Reads AGENTS.md from repo root
  • Official guidance: describe architecture, workflows, commands
  • Works with GitHub Actions for CI integration

For your stack: AGENTS.md is the primary file. Codex will use it directly.

GCP Agentic Workloads / Vertex AI Agent Builder

  • Can fetch context from external sources at runtime
  • Configure agents to pull from central org context
  • Supports multiple context files merged at inference

For your stack:

  1. Host ORG-AGENTS.md on internal HTTP endpoint
  2. Configure Vertex AI agents to fetch at session start
  3. Repo AGENTS.md files remain lean and repo-specific

Claude Agent SDK / OpenAI SDK

  • Context passed programmatically
  • Can inject org context before repo context
  • Full control over context assembly

For your stack:

# Pseudocode
org_context = fetch("https://internal.yourcompany.com/agents/org-context.md")
repo_context = read_file("AGENTS.md")
full_context = org_context + "\n\n" + repo_context
agent.run(prompt, context=full_context)

Part 7: Action Plan

Week 1: Audit

  • Find all existing AGENTS.md / CLAUDE.md files across repos
  • Run the 4-question filter on each
  • Identify common sections (candidates for org template)
  • Identify contradictions between repos
  • Create inventory of what's in each file

Week 2: Create ORG-AGENTS.md

  • Draft org-wide template with platform team
  • Include: security requirements, performance standards, tool versions, CI pointers
  • Host on internal HTTP endpoint (for GCP Agentic Workloads)
  • Configure Vertex AI agents to fetch at session start

Week 3: Update Repo AGENTS.md Files

  • Prioritize: highest-traffic repos first
  • Apply template: org context link + repo-specific sections
  • Run through 4-question filter
  • Target: 100-200 lines each

Week 4: Add Linting

  • Create AGENTS.md linter script
  • Add to CI (GitHub Actions / Cloud Build)
  • Fail PRs if AGENTS.md doesn't pass
  • Add to pre-commit hooks for local dev

Ongoing

  • Quarterly audit of all AGENTS.md files
  • Update ORG-AGENTS.md when conventions change
  • Add guardrails when agent mistakes occur
  • Delete instructions that don't prevent failures

Build Queue for Tango

  1. AGENTS.md Linter

    • Input: AGENTS.md file path
    • Output: Pass/fail with specific issues
    • Checks: line count, required sections, generic phrases, file references
    • Tech: Golang CLI, runs in CI
  2. AGENTS.md Generator

    • Input: Repo URL or local path
    • Output: Draft AGENTS.md extracted from codebase
    • Extract: build commands, test commands, architecture hints
    • Tech: Golang CLI, static analysis
  3. ORG-AGENTS.md Server

    • Input: HTTP request from Vertex AI agents
    • Output: Current org-wide context
    • Features: versioning, audit log, update API
    • Tech: Golang HTTP server, GCP Cloud Run
  4. AGENTS.md Dashboard

    • Input: GitHub org, scans all repos
    • Output: Inventory of AGENTS.md files, compliance status, drift detection
    • Tech: React frontend, Golang backend

Sources


Research by Cash | April 2026