Status: draft
Extends: SPEC-031
Owner: OpenQuackKit/Agents/ + OpenQuackApp/{AgentSessionManager,RecordingOverlay,ResponseWindow}.swift
Last updated: 2026-05-24
After a kickoff is in flight (or has just blocked asking the user for input), let the user reply by voice to that specific session without typing or opening a terminal. Two trigger surfaces:
Voice → action → notification → voice → continued action. The full loop without leaving wherever the user is.
SPEC-006 (multi-turn closed-loop sessions) is the long-game design for in-app conversation panels, approval routing, voice-approval, and session reuse from the dictation hotkey. It’s a big surface and remains deferred per the adoption-pivot ROADMAP.
SPEC-031a is much smaller in scope. It composes with SPEC-031’s existing primitives (live sessions on the claude daemon, FSEventStream watcher, response window) and adds:
No conversation panel, no in-app approval routing, no session-reuse from the dictation hotkey. Those stay with SPEC-006.
--permission-mode bypassPermissions;
reply doesn’t change that.claude attach <id> in Terminal as today. v3 of SPEC-031
considered inline text in the response window and deferred it. ┌─ ⌃⇧Space alone → existing SPEC-031 kickoff (new session)
Kickoff hotkey ──┤
└─ ⌃⇧Space + ⌥ → SPEC-031a voice reply (most recent live session)
Implementation: the kickoff hotkey handler reads
NSEvent.modifierFlags.contains(.option) at handler-call time to
decide which path. No second registered shortcut name — the modifier
is overlaid on the existing kickoff binding. (KeyboardShortcuts
package fires only the bound combo, but inside the handler we can
introspect modifiers via NSEvent.modifierFlags synchronously.)
If the user has rebound the kickoff hotkey to something that doesn’t include a free ⌥ slot (e.g. they bound it to ⌥+something), the modifier-overlay scheme falls back to: any kickoff-press while at least one live session exists shows a brief overlay hint “⌥+press to reply instead”; modifier-key reply remains opt-in.
Auto-target selection:
blocked → reply to itworking →
reply to the most-recently-startedThe recording overlay shows the target while recording:
[●●● Recording 2.4s · ↳ replying to "OpenQuack: set timer for 10am" (a3c4272d)]
so the user can confirm before speaking. Esc / cancel hotkey aborts the reply without sending.
The existing notification (SPEC-031) gains a second action button beside the default “Open response”:
┌──────────────────────────────────────────────────────┐
│ 🦆 OpenQuack │
│ ──────────────────────────────────────────────────── │
│ claude needs input │
│ "I see two main.py files — which one do you want me │
│ to fix?" │
│ [Open response] [Voice reply] │
└──────────────────────────────────────────────────────┘
Click “Voice reply” →
<short-id> mode (overlay’s mode
chip shows the target session)The session ID is unambiguous (carried in the notification’s
userInfo["shortID"]), so there’s no ambiguity even if multiple
sessions have notifications outstanding.
When the response window is showing a session that’s still in
blocked state, the button row gains “Voice reply” alongside the
existing buttons:
[Copy] [Continue in Terminal] [All kickoffs] [Voice reply] [Stop] [Close]
Click → window closes (or stays open with a small recording chip) → OpenQuack starts recording in reply mode → same path as Trigger 2.
The recording overlay’s mode chip changes from:
[claude] ← new kickoff (existing SPEC-031)
to:
[↳ a3c4272d] ← voice reply targeting that session
Same colour palette / globe icon (still network-bound), with a small return-arrow glyph signaling “reply” semantics. Hovering or long-pressing the chip surfaces the session’s displayName.
// Sources/OpenQuackKit/Agents/AgentKickoffService.swift
public extension AgentKickoffService {
/// Append `prompt` as the next turn in an already-running
/// daemon-managed session. The session continues; its state.json
/// transitions back to `working` (then to `done` or `blocked` on
/// completion). The same FSEventStream watcher established at
/// kickoff time picks up the new transition and fires the next
/// notification.
///
/// Mechanism: TBD at impl time — verify whether
/// `claude --resume <id> -p <text>` cleanly appends to a live
/// bg session or whether we need a daemon-socket call. See open
/// questions.
static func voiceReply(prompt: String, to shortID: String) throws
}
public enum AgentKickoffService.Error /* extended */ {
/// Voice reply targeted a session that's no longer live.
case noLiveSession(shortID: String)
/// Reply injection failed (subprocess error / daemon protocol).
case replyInjectionFailed(underlying: Swift.Error?)
}
AppState.RecordingMode gains a third case so the recording
pipeline knows where to route the transcript:
public enum RecordingMode: Equatable {
case dictation
case agentKickoff
case agentReply(shortID: String, displayName: String)
}
The post-transcribe dispatch fork (currently a switch over
.dictation / .agentKickoff) gains a third arm for .agentReply
that calls voiceReply(prompt:to:). Failure handling: stash the
transcript on the clipboard with a clear error label (same shape as
existing kickoff failures).
Update AgentSessionManager.registerNotificationCategory() to
register a second UNNotificationAction with identifier
"openquack.kickoff.voiceReply". The category becomes:
let openAction = UNNotificationAction(
identifier: "openquack.kickoff.open",
title: "Open response",
options: [.foreground]
)
let replyAction = UNNotificationAction(
identifier: "openquack.kickoff.voiceReply",
title: "Voice reply",
options: [.foreground, .authenticationRequired] // ensure app is unlocked
)
let category = UNNotificationCategory(
identifier: notificationCategory,
actions: [openAction, replyAction],
intentIdentifiers: [],
options: []
)
The click handler in AppDelegate’s UNUserNotificationCenterDelegate
gains a switch on response.actionIdentifier:
openquack.kickoff.open (or default) → open response window
(existing path)openquack.kickoff.voiceReply → look up session by shortID, set
recordingMode = .agentReply(shortID, displayName), start recordingIf the session is no longer live by the time the action fires (user ignored the notification for hours, session ended), the action shows a brief overlay error and dismisses.
This is the load-bearing technical question for v1: how do we append a turn to a live daemon-managed session?
Two paths to evaluate at impl time:
Path 1: claude --resume <id> -p <text> (documented).
--bg.-p semantics from v2 of SPEC-031.-p is “print and exit” semantics — does the bg worker
pick up the appended turn and continue?Path 2: Daemon control socket (undocumented).
/tmp/cc-daemon-501/<id>/control.sock accepts
ops like dispatch (per subagent 1’s reverse-engineering).
There likely exists an inject / submit / turn op for the
agent view’s “user types into a live session” path.Path 3: Write to the session’s JSONL transcript directly
~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl is the
source of truth for the session’s conversation history.Recommendation (v1): try Path 1 first — empirically test whether
claude --resume <id> -p "<reply>" cleanly appends and the bg worker
picks it up. If it does, ship. If it conflicts, fall back to Path 2
with explicit version-pinning + a brittleness note. Path 3 is the
last resort.
After a reply lands:
state.json transitions: blocked → working (or
working → working if reply arrived during work)StateFileWatcher for that session picks up
the transitionAgentSessionManager.handle(state:session:)) treats
the working re-entry as “session resumed” — clears any cached
terminal KickoffResult for that sessiondone/blocked),
notification fires as usualNo new watcher needed. No new file paths. The reply just unblocks the existing observation loop.
| Failure | Behaviour |
|---|---|
| Modifier-key reply, no live session | Brief overlay error “No live session to reply to”; falls through to new kickoff (per SPEC-031 default) |
| Notification action, session ended between notify + click | Brief overlay error “Session a3c4272d has ended”; transcript stashed on clipboard |
| Reply injection fails (subprocess error) | Clipboard fallback; overlay shows “Reply failed — transcript on clipboard” |
| User Esc / cancels mid-recording | No reply sent; session remains in its previous state |
| Reply transcript empty after trimming | No-op; brief overlay error “Nothing to say” |
| Two replies dispatched in rapid succession | Both queue against the same session; second waits for first to land. Document but don’t prevent. |
No new consent prompt. Voice reply uses the same network hop + permission-mode bypass that the user already consented to at kickoff time. The recording overlay’s mode chip + the notification action title (“Voice reply”, not generic “Reply”) keep the user aware that voice is going to a network agent.
done. What if the
user replies to a session that has already settled to done?
Probably the session takes the reply and transitions back to
working (claude sessions can always be re-entered). Verify; if
not, voice-reply against a done session falls back to spawning a
new session that includes the prior transcript as context (i.e.
claude --resume <id> style).claude agents --json
list is per-user, so we’re fine.| # | Title | SPEC | Effort | Notes |
|---|---|---|---|---|
| 1 | docs(SPEC-031a): voice reply to live sessions |
SPEC-031a | XS | this spec; merges before any impl |
| 2 | feat(agents): voiceReply + injection probe |
SPEC-031a | S | adds voiceReply(prompt:to:); resolves Path 1 vs 2 vs 3 empirically; ships standalone with tests for argv assembly + state-coordination |
| 3 | feat(hotkey,overlay): modifier-key reply trigger + overlay chip |
SPEC-031a | S | reads NSEvent.modifierFlags in the kickoff handler; adds .agentReply recording mode; overlay chip changes to “↳ shortID” |
| 4 | feat(notification,response-window): voice-reply action + button |
SPEC-031a | S | registers the action, wires the click handler, adds the button to the response window for blocked sessions |
PR #2 lands first (mechanism settled), then #3 + #4 in either order.
Reviewer validates on an M-series Mac with claude ≥ 2.1.143 and
notification permission granted:
blocked (e.g. “ask me a question”). Without
releasing the previous notification, press ⌃⇧⌥Space, speak
the answer, release. Overlay shows *“↳ replying to claude stop <id>. Click the still-visible
“Voice reply” action on the prior notification. Overlay
shows *“Session swift build && swift test green; new test
cases cover modifier-key dispatch logic, target-session
auto-selection (blocked > working > none), and notification
action routing.