TypeClawTypeClaw
Internals

Web search

Why DDG goes through curl-impersonate, the lexiforest pin, failure modes

src/agent/tools/websearch.ts exposes general web search to the agent. Default source goes through src/agent/tools/ddg.ts (DuckDuckGo's lite.duckduckgo.com/lite/ SERP — the only major engine that serves a parseable, key-free SERP). wikipedia is a separate narrower path.

Why curl-impersonate, not fetch

The DDG client shells out to curl-impersonate, not Bun's native fetch. Seeing Bun.spawn(['curl_chrome136', ...]) is intentional — do not modernize. DDG fingerprints TLS handshake (JA3/JA4) and HTTP/2 SETTINGS-frames before any header is read. Bun's fetch can't match Chrome's handshake (upstream Bun #11368), so requests get 202'd / CAPTCHA-gated on residential IPs. Verified empirically and matches what the Python (primp/ddgs) and JS (node-curl-impersonate) DDG-scraping ecosystems converged on.

The pin

Pinned to lexiforest's actively-maintained fork (Chrome 136+ profiles, v1.5.6, May 2026) — not the original lwthiker/curl-impersonate whose last release carries Chrome ≤116 (two years stale, useless against current DDG fingerprinting). The constant block in dockerfile.ts calls this out by name.

Bumping the pin

Edit the four CURL_IMPERSONATE_* constants in src/init/dockerfile.ts, verify the new release ships the wrapper named in CURL_IMPERSONATE_PROFILE, then typeclaw start in any agent folder. Build smoke-tests the binary via curl_${PROFILE} --version.

No -H overrides on the curl call

curl_chrome136 already sends the full Chrome 136 header set with correct ordering. Adding our own headers corrupts the impersonation — DDG fingerprints header order. The previous BROWSER_HEADERS const was deleted for this reason.

Failure modes

  • curl_chrome136 not on PATH → ENOENT → tool error. Container is broken; typeclaw start --build is the fix.
  • CAPTCHA despite impersonation → DdgCaptchaError. Mitigations: cookie warmup, vqd preflight, request jitter.
  • Network timeout (>30s) → curl exits non-zero. Cap is REQUEST_TIMEOUT_SECONDS.

On this page