openquack

SPEC-031 — Agent kickoff (one-shot voice-to-action)

Status: draft (M2 — adoption-band demo feature) Owner: OpenQuackKit/Agents/ + OpenQuackApp/{RecordingOverlay,ResponseWindow}.swift Last updated: 2026-05-23 (v3 — claude –bg + daemon-managed sessions)

Goal

A second, separately-bindable hotkey that takes the user’s spoken utterance and dispatches it to a fresh background Claude Code session managed by claude’s own supervisor daemon, via the claude --bg "<prompt>" CLI. The session runs detached — no Terminal pops up, no workspace-trust dialog, no focus stolen. The session is centrally tracked: it appears in claude agents --json (and the claude agents TUI) alongside any other background agents the user is running. When the agent finishes (or asks for input), OpenQuack posts a macOS notification; clicking it opens a small floating window with the full response and a button to drop into the live session via claude attach <id>.

Voice → action → notification → optional drop-in.

The user doesn’t have to be in any particular app, doesn’t position a cursor, doesn’t paste anything, and isn’t interrupted by a Terminal window appearing during whatever they were doing. The session persists after dispatch — closing OpenQuack does not kill it; the claude daemon does. Re-entering the session uses claude attach, which connects to the live daemon-owned PTY rather than starting a fresh process.

User story

Two shapes, both real:

  1. Larry on a Tuesday morning, no editor open: “Set me a timer for ten o’clock and put a Notification Centre entry on it.” → presses the kickoff hotkey, speaks, releases. Overlay confirms “Agent launched ✓ (claude)” for ~700 ms and dismisses. Five seconds later, a notification slides in: “claude finished — Timer set for 10:00. macOS reminder scheduled.” Larry keeps doing whatever he was doing.
  2. Larry mid-coding-session, OpenQuack repo in focus, sees a side task: “In a scratch dir, sketch what a SwiftUI view that visualises the FnHotkeyMonitor’s flag transitions would look like — just a one-file prototype, don’t touch the repo.” → kickoff hotkey, speaks, releases. Larry stays in the editor. ~90 seconds later a notification fires; clicking it opens the response window with the file path and a brief summary. The “Continue in Terminal” button drops him into the live session via claude --resume <id> if he wants to iterate.

Both work because the kickoff hotkey is not the dictation hotkey. Dictation still pastes at the cursor of whatever app is in focus (the SPEC-005 path). Kickoff bypasses focus entirely, runs the agent in the background, and surfaces the result through native notification mechanics that the user can ignore or click into at their own pace.

Why now, and why distinct from SPEC-006

SPEC-006 (Agent dispatch, draft v2) covers the multi-turn closed-loop session vision: a conversation panel, approval prompts, side-effect chips, voice approval, and a session that lives across utterances. That is the long-game design and it’s in the deferred-feature band per docs/ROADMAP.md (adoption pivot 2026-05-14).

This spec is a distinct one-shot kickoff. The two compose later (SPEC-006’s ClaudeCodeAgent can adopt the same spawn primitive), but neither needs to absorb the other. Carving the one-shot kickoff out gives us:

The relationship is stated and capped: SPEC-006 owns the multi-turn / session-reuse / approval-routing surface; SPEC-031 owns the one-utterance-spawns-one-session surface. The two specs cite each other; neither grows.

Non-goals (explicit, to keep MVP shippable)

UX

Two hotkeys, separately bindable

Dictation hotkey  (SPEC-003, default ⌃⇧Space)  ──▶  Whisper → paste at cursor
Kickoff hotkey    (SPEC-031, default unset)    ──▶  Whisper → background `claude`

The kickoff hotkey is bound under Settings → Shortcut → Agent kickoff, opt-in. Default: unset.

The recording overlay distinguishes the two modes

Same as before: a globe-icon “claude” mode chip appears in the pill during a kickoff-mode recording. This is the user’s only confirmation that the right hotkey fired before they start speaking. Cancel (Esc / cancel hotkey) works the same in both modes.

After release — dispatch + brief confirmation

Dictation:  Transcribe ──▶ PasteService.paste(transcript)        [SPEC-005]
Kickoff:    Transcribe ──▶ AgentKickoffService.startClaudeKickoff(…)
                            └─ short-lived Process spawns:
                                  claude --bg \
                                    --permission-mode bypassPermissions \
                                    --name "OpenQuack: <prefix>" \
                                    <prompt>
                                  (cwd=~/OpenQuackAgent/)
                            └─ stdout parsed for banner:
                                  "backgrounded · <short-id> (idle — …)"
                            └─ short-id captured; daemon owns the session
                            └─ overlay flashes "Agent launched ✓ (claude)"
                            └─ overlay dismisses after ~700 ms
                            └─ spawn process exits; agent keeps running

No Terminal window appears. The recording overlay flashes the launched-state for the same dwell time the dictation path uses for “Pasted ✓”, then dismisses. The kickoff session is owned by the claude daemon; OpenQuack only holds a reference (short-id + workspace + prompt).

If the agent CLI is missing (claude not on PATH), the kickoff fails loudly: an error banner in the overlay with a one-click “Install Claude Code” link to claude.com/claude-code, and the transcript is copied to the clipboard as a fallback so the user doesn’t lose what they said.

If the daemon disclaimer hasn’t been accepted (claude prints ”–bg with bypassPermissions requires accepting the disclaimer first. Run claude --dangerously-skip-permissions once interactively.”), OpenQuack opens a .command Terminal with that exact command so the user can accept the one-time disclaimer, and the current transcript is stashed on the clipboard. The next kickoff press dispatches normally.

Completion signal — FSEventStream on state.json

The claude daemon writes per-session state to ~/.claude/jobs/<short-id>/state.json with shape:

{
  "state": "working" | "blocked" | "done" | "idle",
  "detail": "<one-line summary the agent wrote>",
  "tempo": "...",
  "inFlight": { "tasks": 0, "queued": 0, "kinds": [] },
  "needs": "<what the agent is waiting on, when blocked>",
  "output": "<final response, when done>",
  ...
}

OpenQuack watches each session’s state.json via FSEventStream and notifies on transitions:

State transition Notification
working → done “claude finished” + detail (or output first line)
working → blocked “claude needs input” + needs
→ idle after working Same as done — final state
anything → exit (file deleted) “claude session ended” + last detail

Title: claude finished / claude needs input / claude failed. Body: agent-written detail field if present, else first ~150 chars of output, word-boundary trimmed. Notifications use a kickoffResult category whose default action opens the response window (below).

Permission: UNUserNotificationCenter.current().requestAuthorization is called the first time a kickoff completes — never on app launch — so the prompt arrives in context. If the user denies notification permission, completed kickoffs surface as a dot on the menu-bar duck; clicking the duck opens the response window directly.

Response window (click handler)

Clicking the notification opens a small floating window (ResponseWindow):

┌── claude — your kickoff result ───────────────────────┐
│                                                       │
│  You said:                                             │
│    "Set me a timer for ten o'clock and put a          │
│    notification centre entry on it."                   │
│                                                       │
│  claude finished (state: done, 5.2s)                   │
│  ─────────────────────────────────────────            │
│    Timer set for 10:00.                                │
│    Created reminder via osascript:                     │
│      tell application "Reminders" …                    │
│    macOS reminder scheduled; the system will fire     │
│    a notification at 10:00 sharp.                      │
│                                                       │
│  [Copy response] [Continue in Terminal]                │
│  [Show all kickoffs] [Stop session] [Close]            │
│                                                       │
└────────────────────────────────────────────────────────┘

The response window does NOT auto-open on completion — that would be intrusive. It only opens via notification click or menu-bar fallback.

What about Claude.app (the desktop app)?

The Claude desktop app currently does not register a claude:// URL scheme route for opening a specific Claude Code session by ID (verified in Claude.app/Contents/Resources/app.asar — only claude://cowork/shared-artifact?uuid= is registered). Until Anthropic adds something like claude://session/<id>, the only deep-link continuation surface is Terminal + claude attach.

Filing a feature request with Anthropic for a session-deep-link URL is a follow-up. When/if it lands, the response window can grow a fourth button (“Open in Claude app”) without other changes.

Backend

Public surface

// Sources/OpenQuackKit/Agents/AgentKickoffService.swift

public enum AgentKickoffService {
    /// Spawn `claude --bg <prompt>` in the default workspace.
    /// The spawn process exits within ~seconds after the daemon
    /// accepts the dispatch; the actual agent session is owned by
    /// the claude daemon and persists after this returns. Parses
    /// stdout for the dispatch banner to capture the short session
    /// ID.
    public static func startClaudeKickoff(prompt: String) throws -> KickoffSession

    /// Open the user's default terminal at the workspace with
    /// `claude attach <short-id>` running. Attaches to the LIVE
    /// daemon-owned session, not a fresh process. Used by the
    /// "Continue in Terminal" button.
    public static func continueInTerminal(shortID: String, workspace: URL) throws

    /// Open the user's default terminal with `claude agents` (the
    /// full background-sessions TUI). Used by the "Show all
    /// kickoffs" button.
    public static func showAgentsTUI() throws

    /// Run `claude stop <short-id>` via Process. Daemon
    /// terminates the session. Used by the "Stop session" button.
    public static func stopSession(shortID: String) throws

    /// Spawn a Terminal with `claude --dangerously-skip-permissions`
    /// for the user to accept the one-time disclaimer that
    /// `--bg --permission-mode bypassPermissions` requires. Called
    /// from the consent flow OR auto-triggered if dispatch fails
    /// with the "disclaimer not accepted" error.
    public static func openDisclaimerTerminal() throws

    /// `~/OpenQuackAgent/`. Created on first use with mode 0700.
    public static var defaultWorkspace: URL { get }

    /// `claude` resolvable on PATH plus common manual-install locs.
    public static func isClaudeAvailable() -> Bool

    public enum Error: Swift.Error, Equatable {
        case claudeCLIMissing
        case emptyPrompt
        case invalidPrompt
        case workspaceUnavailable
        case launchFailed
        /// `claude --bg` printed the disclaimer error. Caller should
        /// call `openDisclaimerTerminal()` and stash the transcript.
        case disclaimerNotAccepted
        /// Couldn't parse the short-id banner from stdout.
        case bannerParseFailed(stdout: String)
        case scriptWriteFailed
        case terminalDispatchFailed(exitCode: Int32)
    }
}

/// Reference to a kickoff dispatched via `claude --bg`. The session
/// itself is owned by the claude daemon — OpenQuack only holds
/// metadata + paths for watching its state.
public struct KickoffSession: Sendable, Identifiable, Equatable {
    /// 8-char short ID printed in the dispatch banner; used by
    /// `claude agents`, `claude attach`, `claude stop`, and as
    /// the directory name in `~/.claude/jobs/<short>/`.
    public let shortID: String
    public let workspace: URL
    public let prompt: String
    public let startedAt: Date
    public let displayName: String

    public var id: String { shortID }
    public var stateFileURL: URL {
        let home = FileManager.default.homeDirectoryForCurrentUser
        return home.appendingPathComponent(".claude/jobs/\(shortID)/state.json")
    }
    public var timelineFileURL: URL {
        let home = FileManager.default.homeDirectoryForCurrentUser
        return home.appendingPathComponent(".claude/jobs/\(shortID)/timeline.jsonl")
    }
}

/// One snapshot of a kickoff's state. Read from
/// `~/.claude/jobs/<short>/state.json` by `StateFileWatcher`.
public struct KickoffState: Sendable, Equatable {
    public enum Kind: String, Sendable {
        case working, blocked, done, idle, unknown
    }
    public let kind: Kind
    public let detail: String?  // one-line agent-written summary
    public let output: String?  // final response (when done)
    public let needs: String?   // what agent waits on (when blocked)
}

public extension KeyboardShortcuts.Name {
    static let agentKickoff = Self("openquack.agentKickoff")
}

How dispatch actually happens

let workspace = try ensureWorkspace(at: defaultWorkspace)
let claudeBin = resolveClaudePath()!  // URL

let task = Process()
task.executableURL = claudeBin
task.currentDirectoryURL = workspace
task.arguments = [
    "--bg",
    "--permission-mode", "bypassPermissions",
    "--name", "OpenQuack: \(prompt.prefix(40))",
    prompt,
]

let stdout = Pipe()
let stderr = Pipe()
task.standardOutput = stdout
task.standardError = stderr
task.standardInput = FileHandle.nullDevice  // explicitly no TTY

let start = Date()
try task.run()
task.waitUntilExit()  // --bg returns in seconds after daemon accept

let stdoutData = stdout.fileHandleForReading.readDataToEndOfFile()
let stderrData = stderr.fileHandleForReading.readDataToEndOfFile()
let stdoutStr  = String(data: stdoutData, encoding: .utf8) ?? ""
let stderrStr  = String(data: stderrData, encoding: .utf8) ?? ""

if stderrStr.contains("requires accepting the disclaimer") {
    throw Error.disclaimerNotAccepted
}
if task.terminationStatus != 0 {
    throw Error.launchFailed
}

// Parse: "backgrounded · <short> (idle — send a prompt to start)"
guard let shortID = parseBackgroundedBanner(stdoutStr) else {
    throw Error.bannerParseFailed(stdout: stdoutStr)
}

return KickoffSession(
    shortID: shortID,
    workspace: workspace,
    prompt: prompt,
    startedAt: start,
    displayName: "OpenQuack: \(prompt.prefix(40))"
)

Banner parser:

/// Matches "backgrounded · <8-hex-id> (idle — …)" with tolerance for
/// surrounding whitespace, ANSI escapes, and version-to-version drift
/// in the parenthetical suffix.
static func parseBackgroundedBanner(_ stdout: String) -> String? {
    let pattern = #"backgrounded\s*[·•]\s*([0-9a-f]{6,12})"#
    let cleaned = stripAnsi(stdout)
    guard let match = cleaned.range(of: pattern, options: .regularExpression)
    else { return nil }
    let snippet = String(cleaned[match])
    let idPattern = #"[0-9a-f]{6,12}"#
    guard let idMatch = snippet.range(of: idPattern, options: .regularExpression)
    else { return nil }
    return String(snippet[idMatch])
}

The dispatch process completes quickly because the daemon acknowledges acceptance once the session is spawned; the agent’s actual work happens inside the daemon-owned worker process. Nothing holds the OpenQuack-side spawn open after that.

Why claude --bg rather than claude -p

The v2 design used claude -p (print mode). v3 abandons that because:

This matches the spec’s “centralized session management” requirement (user feedback 2026-05-23): voice-launched sessions visible alongside any other background work in the claude TUI.

Why --permission-mode bypassPermissions

The agent runs unattended — there’s no UI for per-action approvals during the run. Bypass mode is required for actuator tasks (“set a timer”, “move these files”) which need bash. The cost is real and called out in the consent prompt (Privacy contract below).

This mode requires a one-time disclaimer acceptance via claude --dangerously-skip-permissions (the system-level claude opt-in). OpenQuack’s consent flow surfaces this as a guided two-step: first OpenQuack’s own consent modal, then a Terminal popup for the claude-side disclaimer.

Session manager (file-watcher-based)

@MainActor
public final class AgentSessionManager: ObservableObject {
    @Published public private(set) var liveSessions: [String: KickoffSession] = [:]
    @Published public private(set) var liveStates:   [String: KickoffState]   = [:]
    @Published public private(set) var completedResults: [KickoffResult] = []

    private var watchers: [String: StateFileWatcher] = [:]

    public func track(_ session: KickoffSession) {
        liveSessions[session.shortID] = session
        let watcher = StateFileWatcher(url: session.stateFileURL) { [weak self] state in
            Task { @MainActor in self?.handle(state: state, session: session) }
        }
        watchers[session.shortID] = watcher
        watcher.start()
    }

    private func handle(state: KickoffState, session: KickoffSession) {
        liveStates[session.shortID] = state
        let isTerminal = state.kind == .done || state.kind == .idle || state.kind == .blocked
        if isTerminal {
            // Convert to result, archive, notify.
            let result = KickoffResult(from: state, session: session)
            completedResults.append(result)
            cap(&completedResults, at: 20)
            watchers[session.shortID]?.stop()
            watchers[session.shortID] = nil
            liveSessions[session.shortID] = nil
            postNotification(result: result)
        }
    }
}

StateFileWatcher wraps FSEventStream for one state.json path, parses JSON on each write, calls back with the parsed state. Has a small (~100ms) debounce because state writes can come in bursts during fast tool sequences.

Workspace lifecycle + agent environment context

~/OpenQuackAgent/ is created on first use with mode 0700. On every dispatch, OpenQuack (re)writes two files in the workspace:

Why CLAUDE.md rather than --append-system-prompt:

Content focuses on teaching the agent to reach for bash + osascript for macOS-flavored tasks rather than reflexively refusing with “I don’t have access to that”. Recurring user-feedback failure mode: the agent says it has no clock / browser / etc. access, when bash + osascript would have handled it.

MCP server inheritance: the dispatch does NOT pass --strict-mcp-config, so any MCP servers the user has registered via claude mcp add are loaded into the kickoff session by default. (Most v1 users won’t have local MCP servers configured beyond Anthropic-managed ones, but those who have computer-use / browser MCPs get them for free.) A future “Quack capabilities” Settings pane could let users opt-in to specific MCP server bundles per-session.

Tests

How the spawn actually happens (shell-injection safe)

Dispatch writes the shell command into a .command file and hands the file to /usr/bin/open. macOS LaunchServices recognises the .command extension as a terminal-executable script and opens it in the user’s default terminal (Terminal.app on a stock install, respecting an override like iTerm or Warp if the user set one). No AppleEvents are involved, so this path does not trigger the Automation TCC prompt that osascript-based dispatch would — open is a launchd helper, callable by any process.

Sketch:

// All quoting is in Swift; bash receives the prompt as a literal arg.
let command = "cd " + shellQuote(workspace.path) + " && claude " + shellQuote(prompt)
let script = """
#!/bin/bash
set -e
\(command)
"""

let url = NSTemporaryDirectory()
    .appending("openquack-kickoff-\(UUID().uuidString).command")
try script.write(toFile: url, atomically: true, encoding: .utf8)
try FileManager.default.setAttributes([.posixPermissions: 0o700], ofItemAtPath: url)

let task = Process()
task.executableURL = URL(fileURLWithPath: "/usr/bin/open")
task.arguments = [url]
try task.run()
task.waitUntilExit()  // open returns nearly instantly; we just check exit code.
// Best-effort delete the script ~30s later (Terminal reads it at open time;
// the file isn't needed once the spawned shell has it).

Why .command files rather than osascript-driven do script:

Why we still want a visible window (vs. invoking claude headless via Process): the user must see the agent work to follow approval prompts, abort, type follow-ups. Headless dispatch hides the agent.

Tests: unit-test shellQuote, buildShellCommand, buildCommandScript, and writeCommandScript against a corpus of prompts that should not break out of the quoted argument — backticks, double quotes, single quotes, $(rm -rf /), newlines, NUL, Unicode/emoji. The file write is verified to produce an executable file at mode 0700, and the tricky shell command is verified to round-trip through write → read intact.

Workspace lifecycle

~/OpenQuackAgent/ is created on first kickoff with mode 0700. It is not wiped between kickoffs — the user may have files there from a prior session. A small README.md is written on creation explaining:

This directory is OpenQuack’s default workspace for voice-launched agent sessions. Each kickoff opens here. It’s gitignored by default — files you want to keep, move out. Files you don’t, delete. Nothing here is touched by OpenQuack itself.

A future Settings pane (M3) lets users override this. For v1 the path is hardcoded in AgentKickoffService.defaultWorkspace.

Privacy contract (binding)

This is the load-bearing section. OpenQuack’s privacy promise has been “audio and transcript stay on your Mac.” Agent kickoff to Claude Code changes that for kickoff-mode utterances — the transcript is handed to claude, which routes through Anthropic’s API under the user’s existing Claude Code auth. Additionally, kickoff runs the agent with --permission-mode bypassPermissions, meaning the agent executes shell commands, edits, and side effects without per-action prompts.

Constraints derived from docs/VISION.md and AGENTS.md’s hard rules:

  1. Default off. The kickoff hotkey ships unbound. A user opting in sees a one-time consent modal naming both destinations:

    “Agent kickoff sends what you say to Claude Code, which routes through Anthropic’s API under your Claude Code credentials. The agent then runs unattended with permission bypass — it will execute commands, edit files, and take system actions in ~/OpenQuackAgent/ without asking you first. Your normal dictation hotkey is unaffected and continues to paste locally. Continue?” Stored as a UserDefaults flag, revocable by clearing the kickoff hotkey in Settings → Shortcut.

  2. Recording overlay shows a network indicator any time the kickoff hotkey is the one that started recording. Same indicator SPEC-006 reserves for requiresNetwork == true agents.
  3. AGENTS.md hard rule. “Adding a network call in the dictation or agent-dispatch hot path that wasn’t there before” — this spec adds a network hop on the kickoff hot path. Explicit human approval is required before the implementation PR merges. The spec PR (this file) does not add a network call by itself.
  4. Audio is still deleted by AudioRecorder after transcription, identical to dictation. The agent never sees audio, only the transcript.
  5. Workspace isolation. The agent runs with cwd ~/OpenQuackAgent/. It can still touch the wider filesystem (claude itself doesn’t sandbox tool calls), but the default cwd is a contained location away from user repos. The README in that dir tells the user as much. Per-app or per-project workspace override is deferred (M3).
  6. No second-party retention by OpenQuack. We don’t log kickoff prompts or responses to disk beyond the existing HistoryStore (SPEC-014), which the user can disable; the same toggle covers both dictation and kickoff transcripts. The session itself is persisted by Claude Code (so --resume works), not by us.
  7. Notification permission is requested in-context — the first time a kickoff completes — rather than at app launch. Declining keeps kickoff functional; completion surfaces as a menu-bar dot instead.

Settings (deferred; v1 ships minimal)

For v1 the Settings → Shortcut pane gains exactly one new row: a recorder for the kickoff hotkey, plus the one-time consent prompt the first time it’s bound. No agent picker, no workspace picker, no “always allow approvals” toggle. Those are M3 material.

Implementation order (atomic PRs)

# Title SPEC cite Effort Notes
1 docs(SPEC-031): one-shot agent kickoff SPEC-031 XS this spec; merges before any impl
2 feat(agents): AgentKickoffService + Claude Code spawn SPEC-031 S adds OpenQuackKit/Agents/, unit tests on AppleScript escaping; no UI yet
3 feat(hotkey,settings): kickoff hotkey + consent prompt SPEC-031 S wires the new KeyboardShortcuts.Name, Settings row, consent modal, overlay mode chip
4 docs(integrations): one-line note on kickoff in claude-code.md SPEC-031 XS the integration doc gains a “Quick voice-to-action” section pointing at the kickoff hotkey

PR #2 is shippable on its own — it’s a service with tests, no user- visible surface, no network call yet (it would only run when invoked, which the absent hotkey prevents). PR #3 is the one that wires up the hot path; PR #3 is the one that needs Larry’s explicit OK per AGENTS.md’s network hard rule, called out in the PR description and blocked on his comment.

PR #4 lands last, after the build is validated.

Codex (codex) is a follow-up: PR #5 is feat(agents): CodexKickoff + agent picker. Spec-side, it’s already named in non-goals; an addendum to this spec covers the picker UX when we open that PR.

Open questions

Acceptance criteria (M2 ship gate)

A reviewer can validate the feature by running through the following on an M-series Mac with claude installed and authenticated:

References