Copy page
View as Markdown View this page as plain text

Code Execution

Code execution lets an agent run JavaScript or TypeScript source on demand inside an isolated sandbox. The sandbox has no global access to its host — every capability (fs, fetch, console, custom business logic) is explicitly bridged in through imports and globals. Results flow back via the export selected by options.execute or through bridged “report” callbacks.

This page is the caller-facing contract: how to invoke runCode, what options it accepts, what shape the result has, and the limitations agent authors need to know when writing code that will run inside the sandbox. For the runtime-side mechanics (isolate setup, enforcement, implementation approaches), see Runtime §13 Code Execution Runtime.

1. What You Get

When you call state.runCode(source, options):

  • The source runs as an ES module in a fresh sandbox with no host capabilities.
  • Anything you pass through options.imports is resolvable as a bare-specifier import inside the sandbox.
  • Anything you pass through options.globals is resolvable as a free identifier inside the sandbox.
  • options.execute chooses which module export to run and which arguments to pass. It defaults to { fn: 'default', args: [] }.
  • The call returns a CodeExecution handle — awaitable, terminatable, with a live reports view.
  • The spec does not mandate a wall-clock timeout; callers impose their own by calling handle.terminate().

Code execution is not a Node.js runtime, a web runtime, or a Worker runtime. It is a plain ECMAScript evaluator. The surface is identical across host platforms.

2. API Surface

Code execution is invoked through ThreadState.runCode(), which returns a CodeExecution handle. The handle is awaitable and exposes a terminate() method for caller-initiated cancellation.

2.1 Signature

runCode(
  source: string,
  options?: CodeExecutionOptions,
): CodeExecution;
ParameterTypeDescription
sourcestringJavaScript or TypeScript source text. Treated as an ECMAScript module.
optionsCodeExecutionOptionsExport selection, imports, globals, memory cap, and language hints.

2.2 CodeExecution Handle

interface CodeExecution extends PromiseLike<CodeExecutionResult> {
  /** Stop the run. Idempotent. Settles the handle with `status: 'terminated'`. */
  terminate(reason?: string): void;
  /** `true` until the handle settles. */
  readonly running: boolean;
  /** Snapshot of values emitted via the `report` bridge so far, in call order. */
  readonly reports: readonly unknown[];
}
  • The handle is a PromiseLike<CodeExecutionResult>; await run yields the final result.
  • terminate(reason?) stops the run. Subsequent calls are no-ops. When reason is provided, it surfaces in result.error.message.
  • running flips to false once the handle settles.
  • reports is a live, append-only view of values emitted via the report bridge. The same values appear in CodeExecutionResult.reports when the run ends.

Callers that need a time budget layer one on themselves:

const run = state.runCode(source, options);
const budget = setTimeout(() => run.terminate('30s budget'), 30_000);
const result = await run;
clearTimeout(budget);

2.3 Example

const run = state.runCode(
  `
    import { readFile } from 'fs';
    import { report } from 'supervisor';

    const message = await getMessage();
    if (/username/.test(message)) {
      report({ topic: 'username', message });
    }

    export async function scan() {
      return { scanned: true };
    }
  `,
  {
    execute: { fn: 'scan', args: [] },
    imports: {
      fs: {
        readFile: async (path: string) => state.readFile(path),
      },
      supervisor: {
        report: (payload: unknown) => {
          // host-side sink
        },
      },
    },
    globals: {
      console: { log: (...args: unknown[]) => {} },
      getMessage: async () => 'latest user message',
    },
  },
);

const result = await run;

if (result.status === 'success') {
  // result.result is the resolved scan() return value: { scanned: true }
}

3. CodeExecutionOptions

OptionTypeDefaultDescription
execute{ fn?: string; args?: unknown[] }{ fn: 'default', args: [] }Export to execute and arguments to pass if the export is a function. Use fn: 'default' for the default export.
importsRecord<string, Record<string, unknown>>{}Map of module specifier → named exports exposed to the sandbox. Resolves bare-specifier imports only.
modulesRecord<string, string>{}Additional relative ES modules available to the sandbox, keyed by relative specifier such as './helpers.js'.
globalsRecord<string, unknown>{}Map of identifier → value installed on the sandbox’s module scope.
language'javascript' | 'typescript''typescript'Source language. TypeScript source is stripped of types before evaluation; no type checking is performed.
memoryLimitBytesnumberruntime-definedMaximum sandbox heap in bytes. Exceeding this settles the handle with status: 'memory'.
filenamestring'<runCode>'Label used in stack traces and error sources.
report(value: unknown) => voidOptional host-side sink invoked for every value emitted through the built-in report bridge. Collected values are also exposed on the handle’s live reports array and in the final CodeExecutionResult.reports.

Unknown keys on options are rejected.

Deadlines are not an option. The caller implements their own budget by calling handle.terminate() (§2.2).

3.1 Execute

execute selects the module export that becomes CodeExecutionResult.result.

const run = state.runCode(
  `
    export function increment(n: number): number {
      return n + 1;
    }

    export default function fallback() {
      return 123;
    }
  `,
  {
    execute: { fn: 'increment', args: [100] },
  },
);

Rules:

  • When execute is omitted, the runtime behaves as if { fn: 'default', args: [] } was supplied.
  • fn names the export to execute. Use 'default' for the default export.
  • If the selected export is a function, the runtime calls it with args inside the sandbox and awaits any returned thenable.
  • If the selected export is not a function, args MUST be omitted or empty and the export value itself becomes the result.
  • Missing selected exports fail the run with status: 'link_error'.
  • args use the same marshaling rules as other sandbox boundary values.

3.2 Imports

imports maps a bare specifier to an object of named exports.

imports: {
  fs: {
    readFile: async (path) => { /* ... */ },
    writeFile: async (path, data) => { /* ... */ },
  },
}

Inside the sandbox:

import { readFile, writeFile } from 'fs';
import * as fs from 'fs';          // namespace import of the same object

The default export of a bridged module is the value at key default:

imports: { greeter: { default: (name) => `hi, ${name}` } }
// inside: import greet from 'greeter';

Rules:

  • Only bare specifiers are resolvable from options.imports ('fs', 'supervisor', '@scope/pkg'). Relative specifiers resolve from options.modules; URL ('https://…') specifiers fail at module link time.
  • Specifiers are not resolved from any host package resolver, node_modules, CDN, or virtual module registry — only from options.imports or options.modules.
  • Importing an unknown specifier fails linkage.
  • Named-export mismatches (import { x } where x is missing) fail linkage.

3.3 Modules

modules provides local relative ES modules to the sandbox.

const run = state.runCode(
  `
    import { add } from './math.js';
    export const result = add(1, 2);
  `,
  {
    execute: { fn: 'result' },
    modules: {
      './math.js': 'export const add = (a, b) => a + b;',
    },
  },
);

Rules:

  • Keys MUST be relative specifiers from the module graph root, such as './helpers.js' or './lib/math.js'.
  • Entry source imports resolve from filename when one is supplied.
  • Entry and module source MAY import sibling or parent modules with ./ and ../ specifiers when those imports resolve within the supplied module graph.
  • Values are JavaScript or TypeScript module source and are evaluated inside the same sandbox.
  • modules are not host filesystem reads and do not grant package, URL, CDN, or node_modules access.

3.4 Globals

globals installs identifiers at module scope. They are not installed on globalThis.

globals: {
  console: { log: (...args) => {} },
  getMessage: async () => '...',
}

Inside the sandbox:

console.log('hi');              // works
const m = await getMessage();   // works
globalThis.console;             // undefined — globals are not on globalThis

This distinction matters for the limitations — sandbox code cannot enumerate bridged capabilities via Object.keys(globalThis).

3.5 Bridged Value Marshaling

Values cross the sandbox boundary in both directions (host → sandbox via imports/globals, sandbox → host via call arguments and return values).

Host valueSandbox viewNotes
string, number, boolean, null, undefined, bigintSame primitiveCloned by value.
Plain object / array / Map / Set / Date / typed arrayStructured cloneDeep copy; no shared reference.
FunctionCallable proxyCalls forward serialized args to host; return value (or resolved promise) marshaled back.
PromisePromiseResolves/rejects asynchronously inside sandbox.
ArrayBuffer / Uint8ArraySame binary viewCloned.
Class instance, WeakMap, WeakRef, symbol with host identityNot transferableMay throw a SerializationError.

Bridged functions do not expose host this or host closure state to the sandbox beyond their explicit arguments.

Asynchronous bridges are supported: a bridged function may return a Promise, and the sandbox’s await unwraps it.

4. CodeExecutionResult

interface CodeExecutionResult {
  status: 'success' | 'error' | 'memory' | 'terminated' | 'link_error';
  result?: unknown;
  reports: unknown[];
  logs: CodeExecutionLog[];
  error?: CodeExecutionError;
  durationMs: number;
  memoryUsedBytes?: number;
}
FieldTypeDescription
statusenumOutcome classification. See §4.1.
resultunknownResolved selected export, after optionally calling it with execute.args and awaiting top-level promises. Omitted on non-success outcomes.
reportsunknown[]Values emitted through the report bridge, in call order. Empty array if unused.
logsCodeExecutionLog[]Captured console.* calls when the runtime installs a capturing console. Empty when the caller supplies its own.
errorCodeExecutionErrorPresent iff status is not 'success'.
durationMsnumberWall-clock execution time in milliseconds, from runCode() invocation to handle settlement.
memoryUsedBytesnumberPeak sandbox heap in bytes when the engine exposes heap-usage measurement; omitted otherwise.

4.1 Status Values

StatusMeaning
successModule evaluated; selected export resolved.
errorUncaught runtime error inside the sandbox.
memorymemoryLimitBytes exceeded.
terminatedCaller called handle.terminate(), or the runtime’s own safety cap fired. error.message distinguishes the two (including any reason the caller passed).
link_errorModule graph failed to link (unknown specifier, missing named export, parse error).

4.2 CodeExecutionError

interface CodeExecutionError {
  name: string;
  message: string;
  stack?: string;
  /** Module specifier that failed to link, when applicable. */
  specifier?: string;
  /** Source filename associated with line and column, when available. */
  filename?: string;
  /** 1-based line in source, when available. */
  line?: number;
  /** 1-based column in source, when available. */
  column?: number;
}

Errors raised by bridged functions are reflected into the sandbox as a regular thrown error. Host-side stack traces do not leak into the sandbox result.

4.3 CodeExecutionLog

interface CodeExecutionLog {
  level: 'log' | 'info' | 'warn' | 'error' | 'debug';
  args: unknown[];
  timestamp: number;
}

Only populated when the runtime installs a capturing console (typically when the caller does not provide one via options.globals).

5. Result Channels

The sandbox has two channels for returning data. They coexist and can both be used in the same run.

5.1 Configured Export

The canonical success result is the module export selected by options.execute. By default this is the module’s default export:

export default { summary: 'ok', matches: 3 };

After module evaluation, the runtime resolves the configured export as follows:

  1. Read module.namespace[execute.fn], where execute.fn defaults to 'default'.
  2. If it is a function (including an async function), call it with execute.args inside the sandbox. The result of that call is then processed by step 3.
  3. If the current candidate is a Promise (or thenable), await it. Repeat this step until the value is no longer a thenable.
  4. Marshal the final value into CodeExecutionResult.result.

All four of these produce the same surface for callers:

export default 42;                          // result === 42
export default async () => 42;              // result === 42
export default () => Promise.resolve(42);   // result === 42
export default Promise.resolve(42);         // result === 42

Named exports are selected with execute.fn:

export function increment(n: number) {
  return n + 1;
}

// state.runCode(source, { execute: { fn: 'increment', args: [100] } })
// result === 101

If the selected export is missing, the run settles with status: 'link_error'. If the selected export is not a function and execute.args is non-empty, the run settles with status: 'error'. If the selected export function throws (or its returned promise rejects), the run settles with status: 'error' and the thrown value in error.message — identical to any other uncaught runtime error.

5.2 Bridge Reports

For streaming or multi-value output, pass a report callback into the sandbox as a bridged function:

const run = state.runCode(source, {
  imports: {
    supervisor: {
      report: (payload) => telemetry.push(payload),
    },
  },
});

Every report(x) call runs the host sink synchronously (or returns its promise).

When options.report is supplied, the runtime installs a built-in report global that forwards values to it, and pushes each reported value into CodeExecutionResult.reports in call order.

Use whichever channel fits: configured export for “this code computes one answer”; reports for “this code scans and flags N things.”

6. Limitations

These are the things your sandboxed source code cannot do. They are a design property of code execution, not a configuration knob.

6.1 No Host Capabilities on globalThis

The sandbox’s globalThis contains only ECMAScript intrinsics (Object, Array, Promise, Math, JSON, Map, Set, Date, RegExp, Error, typed arrays, BigInt, Symbol, Reflect, Proxy, structuredClone) plus globalThis, undefined, NaN, Infinity.

It does not contain process, global, window, self, document, require, Deno, Bun, fetch, Request, Response, URL, URLSearchParams, WebSocket, WebAssembly, crypto, setTimeout, setInterval, setImmediate, performance, atob, btoa, TextEncoder, or TextDecoder — unless the caller explicitly installed one via globals.

If your code needs any of those, bridge them in.

6.2 No Dynamic Code Loading

Inside the sandbox:

  • eval is absent or throws.
  • Function, AsyncFunction, and GeneratorFunction constructors throw when called with source strings.
  • Dynamic import(...) only resolves specifiers present in options.imports or options.modules; URL imports reject.

6.3 No Shared Memory With the Host

  • No SharedArrayBuffer, no Atomics, no postMessage-style channels.
  • Bridged values cross the boundary by deep clone (§3.5). A sandbox mutation of a received object does not affect the host-side source object.

6.4 No Runtime-Imposed Deadline

The spec does not enforce a wall-clock timeout. handle.terminate() signals the sandbox to stop at the next yield point — usually a few milliseconds, but a pure-synchronous tight loop may run until the runtime’s own safety cap fires.

Callers SHOULD:

  • wrap untrusted code with their own setTimeout + handle.terminate(), as shown in §2.2; and
  • where the run may be CPU-bound, treat the platform safety cap as the worst-case upper bound.

6.5 Memory Is Capped

If you set memoryLimitBytes, exceeding it settles the handle with status: 'memory'. If you omit it, a runtime-defined default applies. Plan allocations accordingly.

6.6 Timers Are Not Bridged Automatically

The sandbox has a working microtask queue — Promise, await, and queueMicrotask behave normally. But setTimeout / setInterval are absent unless the caller bridges them.

6.7 Nondeterministic Intrinsics Are Present

Date.now() and Math.random() work inside the sandbox and are nondeterministic. If you need determinism, bridge deterministic replacements through globals.

6.8 Each Run Is Fresh

Every call to runCode gets its own isolate. State — module caches, intrinsic mutations, top-level variables — does not carry over between runs. If you need persistence, write to thread storage through a bridged function.

7. TypeScript Support

When language is 'typescript' (the default):

  • The runtime accepts TypeScript source.
  • Types are erased before evaluation. Type errors are ignored — only syntax errors block linkage.
  • import type, export type, satisfies, as, generics, enums, and namespaces are all accepted at the erasure layer.
  • tsconfig.json is not consulted. No path aliases, no paths, no baseUrl.

When language is 'javascript', the source is parsed as standard ECMAScript modules with no transformation.

8. Module Semantics

  • Source is evaluated as an ES module. import, export, and top-level await are available.
  • There is one entry module per runCode call plus any caller-supplied options.modules.
  • imports entries are synthetic modules: their namespace is a frozen copy of the host-provided object.
  • modules entries are source-backed ES modules available by relative specifier.
  • import.meta exposes only { url: string } where url is a synthetic sandbox:<filename> URL. Host paths are not exposed.

9. Interaction With Thread State

runCode is a method on ThreadState. It runs on behalf of a thread, but the sandbox does not receive state implicitly. If you want the sandbox to read thread messages, files, or env, bridge the specific capabilities you want to expose:

await state.runCode(source, {
  imports: {
    thread: {
      readFile: (path) => state.readFile(path),
      getMessages: (opts) => state.getMessages(opts),
      emit: (event, data) => state.emit(event, data),
    },
  },
});

This is intentional: bridges are a capability boundary. Leaking state would let sandboxed code invoke arbitrary tools, mutate thread env, or terminate the thread — the opposite of what isolation means.

Bridged functions run on the host side and can await thread operations, schedule effects, emit events, or call tools.

10. Usage Patterns

10.1 Single-Value Computation

const { result } = await state.runCode(
  `
    const n = input.reduce((a, b) => a + b, 0);
    export default n;
  `,
  { globals: { input: [1, 2, 3] } },
);
// result === 6

10.2 Flagging Multiple Items

const result = await state.runCode(
  `
    import { report } from 'supervisor';
    for (const row of rows) {
      if (row.flagged) report(row.id);
    }
  `,
  {
    globals: { rows: await loadRows() },
    imports: {
      supervisor: { report: (id) => flaggedIds.add(id) },
    },
  },
);

10.3 Running Model-Authored Code With a Deadline

const userCode = llmResponse.code; // untrusted, produced by an LLM
const run = state.runCode(userCode, {
  memoryLimitBytes: 64 * 1024 * 1024,
  globals: {
    input: await state.readFile('/data/input.json'),
  },
});

const deadline = setTimeout(() => run.terminate('2s budget'), 2_000);
const result = await run;
clearTimeout(deadline);

if (result.status !== 'success') {
  // surface result.error to the model as a tool error
}

10.4 Composing With Tools

Custom tools can be thin wrappers around runCode:

export default defineTool({
  description: 'Evaluate an expression against thread data',
  args: z.object({ code: z.string() }),
  execute: async (state, { code }) => {
    const run = state.runCode(code, {
      imports: {
        thread: { readFile: (p) => state.readFile(p) },
      },
    });
    const deadline = setTimeout(() => run.terminate('2s budget'), 2_000);
    const result = await run;
    clearTimeout(deadline);
    return result.status === 'success'
      ? { status: 'success', result: JSON.stringify(result.result) }
      : { status: 'error', error: result.error?.message ?? 'run failed' };
  },
});

11. Security Considerations

  • Untrusted source. Source from an LLM or remote user is untrusted. The sandbox and its limitations (§6) are the barriers against escape. Do not relax them based on source inspection.
  • Bridged callables are the attack surface. Once a function crosses the boundary, the sandbox may call it repeatedly with any serializable arguments. Bridged functions MUST validate arguments against an explicit schema and SHOULD enforce their own call-rate budget when they perform expensive work.
  • Host-side data exposure. Bridged functions run on the host and can reach any state the host closure captures. They MUST NOT return values that were not deliberately exposed — secrets, other threads’ state, and unrelated environment variables MUST NOT be reachable through a bridge unless the caller explicitly passed them.
  • Runaway execution. Callers evaluating untrusted code MUST enforce their own deadline via setTimeout + handle.terminate(). terminate() settles at the next yield point; for CPU-bound loops, the platform safety cap is the effective worst case.
  • Memory exhaustion. Callers SHOULD set memoryLimitBytes explicitly when evaluating untrusted code.
  • Log redaction. Captured logs may contain values derived from globals. Treat logs with the same confidentiality level as the most sensitive input passed into globals or imports.

12. TypeScript Reference

interface CodeExecutionOptions {
  execute?: {
    fn?: string;
    args?: unknown[];
  };
  imports?: Record<string, Record<string, unknown>>;
  modules?: Record<string, string>;
  globals?: Record<string, unknown>;
  language?: 'javascript' | 'typescript';
  memoryLimitBytes?: number;
  filename?: string;
  report?: (value: unknown) => void;
}

interface CodeExecution extends PromiseLike<CodeExecutionResult> {
  terminate(reason?: string): void;
  readonly running: boolean;
  readonly reports: readonly unknown[];
}

interface CodeExecutionResult {
  status: 'success' | 'error' | 'memory' | 'terminated' | 'link_error';
  result?: unknown;
  reports: unknown[];
  logs: CodeExecutionLog[];
  error?: CodeExecutionError;
  durationMs: number;
  memoryUsedBytes?: number;
}

interface CodeExecutionLog {
  level: 'log' | 'info' | 'warn' | 'error' | 'debug';
  args: unknown[];
  timestamp: number;
}

interface CodeExecutionError {
  name: string;
  message: string;
  stack?: string;
  specifier?: string;
  filename?: string;
  line?: number;
  column?: number;
}