aa . aa
Lab / deep-research

Research that audits itself.

Open source Claude Code skill. Type /research in any project. Get a multi-agent run with adversarial critic gate and citation audit. Zero API keys. Thirty-second install.

Press release

deep-research turns /research into a multi-agent run that fact-checks itself.

Most research agents hallucinate confidently and ship a paragraph that looks right. deep-research dispatches parallel subagents, then runs an adversarial critic and a citation auditor before anything reaches the synthesis. Every claim has to trace back to a source. Every source has to actually exist.

The problem. Single-agent research loops tour the same web in a single browser tab and synthesize a confident-sounding answer. There's nothing checking the answer. Hallucinated SKUs, drifted specs, and citations that point to the wrong page all survive because nothing inside the loop is paid to disagree with the synthesis. Add a critic? Now you need API keys, a Python service, a vector store, and a config file. The friction kills the install.

The solution. deep-research is a single-folder Claude Code skill. Paste one line, type /research, and the main session orchestrates: plan, dispatch parallel dr-subagent-researcher workers, draft a synthesis, then hand it to dr-critic for adversarial review and dr-citation-checker for claim→source verification. All four use Claude Code's native WebSearch and WebFetch. No Tavily, no Firecrawl, no Exa, no Python service. The output is a structured directory you can git-track — synthesis, audit, claims, sources, plan, and per-subagent notes — that compounds into a research asset rather than a one-shot answer.

Sample run · 8 subagents · 136 sources · 58 claims · critic-fix · citation audit
0
API keys
30s
Install time
3
Worker subagent types
8
Artifacts per run

"The wedge isn't a smarter model. It's accountability inside the loop. A critic that's paid to disagree with the synthesis. A citation auditor that's paid to fail every claim that doesn't trace. The model already does the work — the framework decides whether the work counts."

Agam Arora · builder

Availability. Public today at github.com/agamarora/deep-research (v0.2.2). MIT licensed. Native to Claude Code — no monthly fee on top of your existing subscription.

FAQ

Questions you'd ask before installing.

What is this, exactly?
A Claude Code skill: one slash command (/research) plus three worker subagent definitions. Plain markdown files. No runtime, no service, no daemon. Install drops a folder under ~/.claude/skills/; setup.mjs writes the slash command and agents into ~/.claude/commands/ and ~/.claude/agents/. Restart Claude Code; the agents are now available globally in every project.
Why not just ask Claude Code directly?
Direct asks give you a single agent pulling sources serially with no internal disagreement. /research dispatches multiple subagents in parallel from a single message — different angles in flight at once — then runs a separate critic that's prompted to find weaknesses, then a separate citation auditor that fails every claim that can't trace. The structure is the value, not the smarter prompt.
Do I need any API keys?
No. The framework uses Claude Code's bundled WebSearch and WebFetch. No Tavily, no Firecrawl, no Exa, no SerpAPI, no MCP search server. The constraint is deliberate — the install only stays under thirty seconds if there's nothing to sign up for.
How long does a run take? What's the token cost?
Five to ten minutes for a complex run. Token cost is roughly fifteen times a normal chat — multiple subagents in parallel, each making ten-to-fifteen tool calls, plus the critic and citation passes. Don't /research "what's today's date." The complexity tier is auto-selected: simple fact (1 subagent), direct comparison (2-4), complex research (5-10+).
What does the output look like?
A directory at reports/<date>-<slug>/ with eight files: README.md (cover page GitHub renders), query.md, plan.md, sources.md (T1/T2/T3 credibility tiers), notes/ (one per subagent), claims.md (atomic claims with citation refs), synthesis.md (the report), audit.md (critic verdict + citation audit), and meta.json. Every claim in synthesis links to a source in sources.md. Every source has a credibility tier. The audit shows which claims passed, which got flagged, and what the critic forced to revise.
How is this different from gpt-researcher / langchain open_deep_research?
Both are excellent and both are Python services with a runtime to manage and search-API keys to provision. deep-research is markdown-only, runs inside Claude Code, requires zero external accounts, and ships an adversarial critic plus citation audit in the base framework rather than as optional reflection. The trade-off: deep-research only runs where Claude Code runs. The full comparison table is in the README FAQ.
What if I install mid-session?
/research works immediately — slash commands hot-reload. The worker subagents need a Claude Code restart to register in the subagent_type enum; until then they fall back to general-purpose with their role prompt prepended. Output quality is unaffected; synthesis.md carries a dispatch_mode: fallback note for transparency.
Are reports tracked in my repo?
Your call. The framework's own repo gitignores reports/; in your project, you choose. Track them and research compounds — searchable, revisitable, source of truth for future runs. Keep them local and the repo stays small. The directory structure is git-friendly either way.
What if Anthropic ships an official deep-research feature?
deep-research adopts whichever path Anthropic publishes. The orchestrator-subagents pattern is portable — agents and the slash command are plain markdown. The framework rides on top of Claude Code's primitives; if those primitives change, the framework follows. Until then, this is the path.
Why not bundle Tavily, Firecrawl, or Exa?
Each one is a signup, a key, a billing relationship. Each is great in isolation; none is great as a default. The thirty-second install only works if there's nothing to provision. Forks are welcome to add them — the architecture doesn't care which retrieval tool a subagent uses.
What's next?
v0.3 is on the roadmap: configurable run path, richer eval harness on sample runs, an opt-in second-source confirmation pass on T2 claims. See the CHANGELOG for what shipped and the issues for what's open.
Get started

One line. Paste into Claude Code.

It clones the repo to ~/.claude/skills/deep-research/, runs setup.mjs, and registers /research globally. Restart Claude Code so the worker subagents load natively. Thirty seconds end-to-end.

Install deep-research: run `git clone --single-branch --depth 1 https://github.com/agamarora/deep-research.git ~/.claude/skills/deep-research && cd ~/.claude/skills/deep-research && node setup.mjs` — then I can use `/research <query>` immediately. (Restart Claude Code afterward to load the worker subagents natively; the framework works either way via a fallback path.)

Prefer to run it yourself? Manual install in the README. Same outcome, one extra terminal.

Architecture

Main session orchestrates. Workers run in parallel.

The /research slash command body is the orchestrator — it runs on the main thread, plans the decomposition, and dispatches workers. Earlier versions used a separate lead-researcher subagent, but Claude Code blocks subagent→subagent dispatch, so the lead could never spawn workers. v0.2.2 collapses the lead into the main session; the worker subagents are the only subagents now.

Three worker types, each with one job:

The architecture is Anthropic's published multi-agent pattern — the same orchestrator-subagents shape that beat single-agent by 90.2% on internal evals.

Resources

Everything is public.

← back to lab agamarora.com
Copied. Paste into Claude Code.