openquack

SPEC-031a — Voice reply to live agent sessions

Status: draft Extends: SPEC-031 Owner: OpenQuackKit/Agents/ + OpenQuackApp/{AgentSessionManager,RecordingOverlay,ResponseWindow}.swift Last updated: 2026-05-24

Goal

After a kickoff is in flight (or has just blocked asking the user for input), let the user reply by voice to that specific session without typing or opening a terminal. Two trigger surfaces:

  1. Modifier-key reply — pressing the kickoff hotkey while holding ⌥ (Option) targets the most-recent live session instead of spawning a new one. Visible in the recording overlay as *“↳ replying to "* so the user can confirm before speaking.
  2. Notification voice-reply action — the macOS notification that fires on completion (or “needs input”) gains a second action “Voice reply” alongside “Open response”. Click → OpenQuack starts recording in reply-to- mode → release → transcript appended to that specific session as the next turn.

Voice → action → notification → voice → continued action. The full loop without leaving wherever the user is.

Why now, and why not SPEC-006

SPEC-006 (multi-turn closed-loop sessions) is the long-game design for in-app conversation panels, approval routing, voice-approval, and session reuse from the dictation hotkey. It’s a big surface and remains deferred per the adoption-pivot ROADMAP.

SPEC-031a is much smaller in scope. It composes with SPEC-031’s existing primitives (live sessions on the claude daemon, FSEventStream watcher, response window) and adds:

No conversation panel, no in-app approval routing, no session-reuse from the dictation hotkey. Those stay with SPEC-006.

Non-goals (explicit)

UX

Trigger 1: modifier-key reply

                    ┌─ ⌃⇧Space alone  → existing SPEC-031 kickoff (new session)
Kickoff hotkey ──┤
                    └─ ⌃⇧Space + ⌥   → SPEC-031a voice reply (most recent live session)

Implementation: the kickoff hotkey handler reads NSEvent.modifierFlags.contains(.option) at handler-call time to decide which path. No second registered shortcut name — the modifier is overlaid on the existing kickoff binding. (KeyboardShortcuts package fires only the bound combo, but inside the handler we can introspect modifiers via NSEvent.modifierFlags synchronously.)

If the user has rebound the kickoff hotkey to something that doesn’t include a free ⌥ slot (e.g. they bound it to ⌥+something), the modifier-overlay scheme falls back to: any kickoff-press while at least one live session exists shows a brief overlay hint “⌥+press to reply instead”; modifier-key reply remains opt-in.

Auto-target selection:

  1. If there’s a live session in state blocked → reply to it
  2. Else if there’s at least one live session in state working → reply to the most-recently-started
  3. Else → show a brief overlay error “No live session to reply to” and fall through to spawning a new session (SPEC-031 default behaviour)

The recording overlay shows the target while recording:

[●●● Recording 2.4s · ↳ replying to "OpenQuack: set timer for 10am" (a3c4272d)]

so the user can confirm before speaking. Esc / cancel hotkey aborts the reply without sending.

Trigger 2: notification voice-reply action

The existing notification (SPEC-031) gains a second action button beside the default “Open response”:

┌──────────────────────────────────────────────────────┐
│ 🦆 OpenQuack                                          │
│ ──────────────────────────────────────────────────── │
│ claude needs input                                    │
│ "I see two main.py files — which one do you want me  │
│  to fix?"                                             │
│ [Open response]  [Voice reply]                        │
└──────────────────────────────────────────────────────┘

Click “Voice reply”

  1. Notification dismisses
  2. OpenQuack focuses (briefly — accessory app, doesn’t steal full focus)
  3. Recording starts in reply-to-<short-id> mode (overlay’s mode chip shows the target session)
  4. User speaks, releases (or cancels)
  5. Transcript injected into the target session

The session ID is unambiguous (carried in the notification’s userInfo["shortID"]), so there’s no ambiguity even if multiple sessions have notifications outstanding.

Response window also gets a voice-reply button

When the response window is showing a session that’s still in blocked state, the button row gains “Voice reply” alongside the existing buttons:

[Copy] [Continue in Terminal] [All kickoffs] [Voice reply] [Stop] [Close]

Click → window closes (or stays open with a small recording chip) → OpenQuack starts recording in reply mode → same path as Trigger 2.

What the user sees during reply

The recording overlay’s mode chip changes from:

[claude]              ← new kickoff (existing SPEC-031)

to:

[↳ a3c4272d]          ← voice reply targeting that session

Same colour palette / globe icon (still network-bound), with a small return-arrow glyph signaling “reply” semantics. Hovering or long-pressing the chip surfaces the session’s displayName.

Backend

Public surface

// Sources/OpenQuackKit/Agents/AgentKickoffService.swift

public extension AgentKickoffService {
    /// Append `prompt` as the next turn in an already-running
    /// daemon-managed session. The session continues; its state.json
    /// transitions back to `working` (then to `done` or `blocked` on
    /// completion). The same FSEventStream watcher established at
    /// kickoff time picks up the new transition and fires the next
    /// notification.
    ///
    /// Mechanism: TBD at impl time — verify whether
    /// `claude --resume <id> -p <text>` cleanly appends to a live
    /// bg session or whether we need a daemon-socket call. See open
    /// questions.
    static func voiceReply(prompt: String, to shortID: String) throws
}

public enum AgentKickoffService.Error /* extended */ {
    /// Voice reply targeted a session that's no longer live.
    case noLiveSession(shortID: String)
    /// Reply injection failed (subprocess error / daemon protocol).
    case replyInjectionFailed(underlying: Swift.Error?)
}

Mode coordination

AppState.RecordingMode gains a third case so the recording pipeline knows where to route the transcript:

public enum RecordingMode: Equatable {
    case dictation
    case agentKickoff
    case agentReply(shortID: String, displayName: String)
}

The post-transcribe dispatch fork (currently a switch over .dictation / .agentKickoff) gains a third arm for .agentReply that calls voiceReply(prompt:to:). Failure handling: stash the transcript on the clipboard with a clear error label (same shape as existing kickoff failures).

Notification action registration

Update AgentSessionManager.registerNotificationCategory() to register a second UNNotificationAction with identifier "openquack.kickoff.voiceReply". The category becomes:

let openAction = UNNotificationAction(
    identifier: "openquack.kickoff.open",
    title: "Open response",
    options: [.foreground]
)
let replyAction = UNNotificationAction(
    identifier: "openquack.kickoff.voiceReply",
    title: "Voice reply",
    options: [.foreground, .authenticationRequired]  // ensure app is unlocked
)
let category = UNNotificationCategory(
    identifier: notificationCategory,
    actions: [openAction, replyAction],
    intentIdentifiers: [],
    options: []
)

The click handler in AppDelegate’s UNUserNotificationCenterDelegate gains a switch on response.actionIdentifier:

If the session is no longer live by the time the action fires (user ignored the notification for hours, session ended), the action shows a brief overlay error and dismisses.

Reply-injection mechanism

This is the load-bearing technical question for v1: how do we append a turn to a live daemon-managed session?

Two paths to evaluate at impl time:

Path 1: claude --resume <id> -p <text> (documented).

Path 2: Daemon control socket (undocumented).

Path 3: Write to the session’s JSONL transcript directly

Recommendation (v1): try Path 1 first — empirically test whether claude --resume <id> -p "<reply>" cleanly appends and the bg worker picks it up. If it does, ship. If it conflicts, fall back to Path 2 with explicit version-pinning + a brittleness note. Path 3 is the last resort.

State coordination after reply

After a reply lands:

  1. Session’s state.json transitions: blocked → working (or working → working if reply arrived during work)
  2. OpenQuack’s existing StateFileWatcher for that session picks up the transition
  3. Watcher logic (AgentSessionManager.handle(state:session:)) treats the working re-entry as “session resumed” — clears any cached terminal KickoffResult for that session
  4. When the session next reaches a terminal state (done/blocked), notification fires as usual

No new watcher needed. No new file paths. The reply just unblocks the existing observation loop.

Failure modes

Failure Behaviour
Modifier-key reply, no live session Brief overlay error “No live session to reply to”; falls through to new kickoff (per SPEC-031 default)
Notification action, session ended between notify + click Brief overlay error “Session a3c4272d has ended”; transcript stashed on clipboard
Reply injection fails (subprocess error) Clipboard fallback; overlay shows “Reply failed — transcript on clipboard”
User Esc / cancels mid-recording No reply sent; session remains in its previous state
Reply transcript empty after trimming No-op; brief overlay error “Nothing to say”
Two replies dispatched in rapid succession Both queue against the same session; second waits for first to land. Document but don’t prevent.

No new consent prompt. Voice reply uses the same network hop + permission-mode bypass that the user already consented to at kickoff time. The recording overlay’s mode chip + the notification action title (“Voice reply”, not generic “Reply”) keep the user aware that voice is going to a network agent.

Open questions

Implementation order

# Title SPEC Effort Notes
1 docs(SPEC-031a): voice reply to live sessions SPEC-031a XS this spec; merges before any impl
2 feat(agents): voiceReply + injection probe SPEC-031a S adds voiceReply(prompt:to:); resolves Path 1 vs 2 vs 3 empirically; ships standalone with tests for argv assembly + state-coordination
3 feat(hotkey,overlay): modifier-key reply trigger + overlay chip SPEC-031a S reads NSEvent.modifierFlags in the kickoff handler; adds .agentReply recording mode; overlay chip changes to “↳ shortID”
4 feat(notification,response-window): voice-reply action + button SPEC-031a S registers the action, wires the click handler, adds the button to the response window for blocked sessions

PR #2 lands first (mechanism settled), then #3 + #4 in either order.

Acceptance criteria (M2 ship gate)

Reviewer validates on an M-series Mac with claude ≥ 2.1.143 and notification permission granted:

References