The new /goals feature in Claude Code and Codex is useful for small, contained jobs. It gives a single agent a goal, lets it run in one context window, and continues until the model decides it is done. For quick work, that is convenient. For large, multi-step, expensive, or important work, the same design breaks down.
/goals is a convenient loop. Abracapocus is governed execution infrastructure.
The core weakness is that /goals is unbounded. It has no external verification layer, no structured handoff between units of work, and no real cost ceiling. The run continues until the model stops, fails, drifts, or the user intervenes. That creates a dangerous combination: it can cost more to solve the same problem while being less capable of handling large projects.
Where single-goal loops break down
Context rot is the first scaling problem. A long autonomous run inside one context window accumulates noise. Earlier decisions get buried, contradicted, or forgotten. The model may still sound confident, but coherence degrades as the run stretches on. There is no reliable mechanism to reset the context, preserve only the important state, or stop the model from drifting from the original intent.
Completion is the next problem. With /goals, the model decides when the work is done. So “complete” usually means “the model said it was complete.” There is no deterministic acceptance gate, no required evidence object, no structured diff analysis, and no verification step outside the model’s own judgment.
Completion depends on the same context and model judgment that performed the work.
Completion depends on explicit criteria, evidence, verification, and a durable handoff.
Cost is where this really bites. A single complex /goals run can burn a large amount of model time before you know whether it succeeded. Because the work is not broken into sized, inspectable units, there is no task-level cost ceiling. You cannot reliably say: “Run this bounded slice, stop, preserve the evidence, then decide whether to continue.” One wandering goal can cost as much as many smaller tasks, but without the checkpoints, audit trail, or ability to rerun only the broken part.
It also uses one model for everything. The model you start with handles the simple parts, the risky parts, the repair work, and the final judgment. There is no profile-aware routing, where cheap models handle simple tasks and stronger backends handle high-blast-radius changes. That makes /goals not just less structured, but more expensive than it needs to be.
What Abracapocus changes
Abracapocus is built for the scale case. It breaks work into bounded tasks with explicit acceptance criteria. Each task starts with fresh context, so a 20-task phase never becomes one long, decaying conversation. BuildMemory and RAG supply the relevant prior state without dragging the entire run history forward.
When something fails, Abracapocus does not stall. It classifies the failure, generates a focused repair task with fresh context, and escalates only when necessary. Each task produces a structured handoff for the next: task intent, changed files, evidence, acceptance result, and the next runnable unit of work. The system advances by evidence-backed execution records, not by one long, fragile conversation.
The result is better cost control and better scale. Tasks are shaped and sized before execution. Simple tasks go to cheaper models; harder tasks route to stronger backends. Failed tasks are repaired directly instead of rerunning the whole goal. Every accepted task produces evidence: changed files, execution records, and acceptance data. You can see what happened, why it was accepted, and where to resume if something breaks.
The honest limitation
The honest limitation is simple. /goals is fine for small, well-scoped work: fast, direct, convenient. But the wheels come off when the work has more than a handful of steps, when context matters, when cost matters, or when you need to know exactly what changed and why.
/goals is optimized for completing a task. Abracapocus is optimized for completing a project.