Xrticles │ i read claude code's entire source code. here's what anthropic got wrong

claude code is anthropic's terminal-based AI coding agent. you type a prompt, it reads your codebase, runs commands, edits files. it's good. i use it daily.

today, the full source (2,203 files, ~30MB of typescript) landed in my hands. i spent a few hours reading it, tracing execution paths, grepping for patterns. not to hate on it. but because i wanted to understand how one of the best AI dev tools in the world is actually built.

and the thing is, the engineering is genuinely impressive in places. the query loop architecture, the tool concurrency system, the permission classifier. real thought went into these.

but some parts made me stop and go "wait, really?"

this isn't a hit piece. every codebase has debt. but claude code is a product that runs arbitrary commands on your machine, and it's built by a company with $10B+ in funding. some of these choices are worth examining out loud.

every claim below links to the actual file and line. no vibes. just code.

1. one react component. 5,005 lines.

the main interface you interact with when you use claude code is a single react component called REPL, defined in screens/REPL.tsx.

it is 5,005 lines long.

the main REPL function starts at line 572 and runs to the end of the file. the file also has a few small helper components above it, but the bulk is one function. across the file you'll find:

68 useState calls
43 useEffects
54 useRefs
44 useCallbacks
18 useMemos

that's 227 hook calls in one file, the vast majority inside that single REPL component.

the JSX nesting goes 22 spaces deep (line 4604). there are over 300 conditional branches. the import section alone is 244 import statements pulling in from 235 distinct modules.

here's what part of it looks like. this is the actual import block, trimmed for readability:

notice the "external" === 'ant' comparisons. these get evaluated at build time to strip anthropic-internal-only features from the public build. clever, but it means the same file is serving two completely different products.

why i think this matters: this isn't a philosophical "big files are bad" take. a file with 227 hook calls, dominated by one component, is functionally untestable in isolation. every useEffect interacts with every useState. the dependency arrays become impossible to reason about. the // TODO: fix this on line 4114, next to an eslint-disable-next-line react-hooks/exhaustive-deps, is the team admitting they know it.

what i think should have been done: a state machine (something like XState or even a simple reducer) driving 15-20 focused components. the REPL has clear states: initializing, waiting for input, streaming response, executing tools, awaiting permission, compacting context, showing results. each state maps to a component. the 68 useStates become one state object with typed transitions. this is standard react architecture for complex UIs. i'm not sure why it wasn't done here.

the counterargument: they might have started small and grown. that's how god components always happen. nobody writes 5,000 lines on purpose day one.

2. 89 feature flags. 960 references. scattered everywhere.

claude code uses bun's compile-time feature() function for feature gating. there are 89 distinct feature flags referenced 960 times across the codebase.

here's the full list (yes, all of them):

some of these are clearly experiments (ABLATION_BASELINE, OVERFLOW_TEST_TOOL). some are entire product directions (KAIROS, COORDINATOR_MODE, BRIDGE_MODE). some sound like they should have shipped or been deleted months ago (EXPERIMENTAL_SKILL_SEARCH, NEW_INIT).

and on top of this, there are 472 distinct environment variables referenced across 1,425 call sites.

why i think this matters: 89 flags means the team doesn't know what shape the product is. feature flags are great for gradual rollouts, but when you have KAIROS, KAIROS_BRIEF, KAIROS_CHANNELS, KAIROS_DREAM, KAIROS_GITHUB_WEBHOOKS, and KAIROS_PUSH_NOTIFICATION as separate flags, you're not doing gradual rollout. you're building an entire parallel product inside the same codebase behind conditional requires. that's not a feature flag system. that's a monorepo without the monorepo tooling.

the counterargument: bun's feature() is compile-time, so dead code gets eliminated from the build. the runtime never sees the unused paths. performance-wise, this is fine. the cost is purely in developer experience and code readability. but that cost is real when 960 feature checks are scattered across your codebase and nobody knows which ones are still alive.

3. 61 files with circular dependency workarounds

grep the codebase for "break import cycle", "avoid circular dependency", or "circular dependency" and you get hits in 61 different files.

the developers aren't hiding it. they're commenting about it:

the pattern is always the same: extract types to a separate file, use lazy requires, or inline code that should be imported. the types/permissions.ts file exists entirely to break cycles. so does schemas/hooks.ts. entire files created purely as import cycle band-aids.

why i think this matters: 61 files is not "a few places where we hit a cycle." 61 files means the module graph was never designed. it grew organically and now has deep tangles. every lazy require is a place where typescript can't help you at compile time. every "extracted to break cycles" file is architectural debt that makes the codebase harder to navigate.

what i think the root cause is: the Tool.ts type tries to do too much. it imports from permission types, message types, analytics, MCP types, agent types, progress types, hooks, and more. it's a 792-line type definition file at the center of the dependency graph. when the central type in your architecture imports from everything, everything imports from it, and you get cycles.

the counterargument: circular dependencies are a fact of life in large typescript projects. the team is at least aware of them and documenting the workarounds. many large codebases have this problem. but 61 files is a lot. at some point you need to redesign the module boundaries, not add more lazy requires.

4. the type name that appears 1,193 times

every analytics call in claude code requires a type cast:

AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS appears 1,193 times across the codebase — over 1,000 of those are explicit as type casts, the rest are imports and type definitions.

the intent is admirable. claude code runs on people's actual codebases. you don't want to accidentally log file paths, source code, or secrets to your analytics pipeline. so they made a type that forces developers to manually confirm "yes, this string is safe to log."

but when you're writing this 53-character type cast over a thousand times, it stops being a guardrail. it becomes a ritual. you stop reading it after the first week.

what i think should have been done: a builder pattern or validated helper function:

runtime validation catches the thing the type cast claims to prevent. the long type name catches nothing, it just asks nicely.

the counterargument: you could argue the friction is the point. making it annoying to log analytics means developers think before adding metadata. but over a thousand usages suggests the friction isn't stopping anyone. it's just adding noise.

5. String.fromCharCode to spell "duck"

this one is genuinely funny.

in buddy/types.ts, the companion pet system (yes, claude code has a hidden pet system, more on that in a future post) defines its species list like this:

the explanation is in the comment: one of the species names (probably one of the more exotic ones) collides with an internal model codename. anthropic's CI greps the build output for these codenames as a security canary. a string literal like "axolotl" or "capybara" (speculating) would trigger the check.

instead of adding an exception to the CI check, they hex-encoded all 18 species names.

why this matters (a little): it doesn't, really. it's a pet system behind a feature flag. but it's a perfect example of how local workarounds compound. the CI check is probably correct and important. the fix should have been a regex exclusion for the buddy module, not encoding every species as hex values. future developers reading this file will be baffled by why "duck" can't just be "duck".

why it's also kind of great: the fact that anthropic engineers spent time building a procedurally generated pet system with rarity tiers (common through legendary), named species, hats, eye styles, and stat distributions, all inside a terminal coding tool, is honestly charming.

6. main.tsx: the 4,683-line entry point

main.tsx is the CLI entry point. it's 4,683 lines and contains:

every CLI command definition (claude, init, config, mcp, doctor, etc.)
all argument/flag parsing via Commander.js
the complete OAuth login flow
session resume logic
remote session management
profile startup benchmarking
plugin loading
MDM (mobile device management) configuration

the comments explain why:

so the architectural choice is intentional: everything is in one file to minimize the import graph depth. bun evaluates imports eagerly. deeper import trees = more startup latency. keeping everything in main.tsx means one level of imports instead of three or four.

why i think this is a problematic trade-off: they're saving ~135ms at startup by making the entry point unreadable. a lazy-loading command registry would achieve the same thing. only load the init command module when someone runs claude init. only load OAuth when authentication is needed. this is how every other CLI tool works (oclif, yargs with command modules, even basic commander subcommands in separate files).

the counterargument (and it's a real one): bun's module loading might have quirks that make lazy loading less predictable than in node. and 135ms matters for a tool you invoke constantly. i don't use bun enough to know if their approach is the only way to get fast startup. but i suspect there are alternatives.

7. the conditional require pattern

this is a consequence of #2 and #6. throughout the codebase, especially in REPL.tsx and query.ts, you'll find code like this:

this is typescript code using require() inside an ES module, wrapped in a compile-time feature check, with a type assertion to recover the types that require() loses.

REPL.tsx has 17 of these patterns. query.ts has 6. they exist because:

1. import is hoisted and always evaluates at module load time

1. feature() checks need to prevent the import entirely for dead code elimination

1. require() is the only way to conditionally import in the module body

1. but require() loses type information, so you need as typeof import(...)

why i think this matters: each of these is a place where the type system has a gap. the as typeof import(...) cast tells typescript "trust me, this is the right type." if someone changes the export shape of reactiveCompact.js, the cast silently lies. there's no compiler error. you find out at runtime.

what alternatives exist: dynamic import() returns a promise and preserves types. it's slightly more awkward (you need to await it) but it's the standard solution for conditional module loading in modern JS. bun supports it.

the pattern underneath all of this

if you zoom out, most of these issues come from the same root cause: claude code grew faster than its architecture could keep up with.

you can see the layers of history. a simple terminal REPL grew into a multi-agent coordinator with voice mode, companion pets, vim bindings, and remote sessions. features got added behind flags faster than old flags got cleaned up. the module graph grew connections faster than anyone drew boundaries.

this is not unique to anthropic. every fast-moving company has codebases like this. the reason claude code's case is interesting is the scale: this is one of the most important AI products in the world, and its source reveals the same messy engineering trade-offs that exist at every startup.

the code ships. it works. lots of developers rely on it daily. that matters more than clean architecture. but it's worth being honest about the cost.

*next in this series: what anthropic got right in claude code. the query loop, the tool concurrency system, and the permission classifier are genuinely well-designed. i'll break those down next.*

about me: i'm rohan, building getheadcount.io — an AI staffing company deploying domain-specific AI agents as workers.

i read claude code's entire source code. here's what anthropic got wrong