Status: draft v2 (M2 — the differentiating feature)
Owner: OpenQuackKit/Agents/
Last updated: 2026-04-27 (rewritten for multi-turn sessions; original
v1 modelled one-shot dispatch which lost the multi-turn nature of Claude
Code and similar agents — see “Why the rewrite” below.)
After the user speaks, dispatch the transcript to a configured agent backend (Claude Code, Ollama, MLX-LM, …) which works on the request and streams events back — status, partial output, tool-use, approval prompts, side effects — that OpenQuack surfaces in a persistent conversation panel. Subsequent voice presses go to the same session by default, so multi-turn interaction works naturally.
Voice → action, sustained — not voice → action, fire-and-forget.
The earlier v1 of this spec modelled dispatch as a single call:
agent.dispatch(_ utterance: String) async throws -> AgentResult
That works for “type this” or a simple “run that” command. It does NOT work for the actual high-leverage flows the product is for:
The user explicitly flagged this on 2026-04-27: “for agent dispatch, i think some thoughts can be put into it for a close loop agent experience. otherwise it seems to be only one way, single shot and how do we get back or interactive from the session.”
This v2 design accepts that the unit is a session, not a request.
public protocol Agent: AnyObject {
static var agentName: String { get }
/// Whether dispatch involves any network IO. Surfaced in the overlay.
var requiresNetwork: Bool { get }
/// Begin a session. Long-lived; multiple turns submitted into it.
func startSession(context: AgentContext) async throws -> any AgentSession
}
public protocol AgentSession: AnyObject {
var id: String { get }
var startedAt: Date { get }
var agentName: String { get }
/// Submit a turn. Yields events as the agent works; completes (with
/// either `.done` or a thrown error) when the turn finishes. The session
/// remains open after — call `submit` again for the next turn.
func submit(_ utterance: String) -> AsyncThrowingStream<AgentEvent, Error>
/// Reply to an `.approvalRequest` event from the current or prior turn.
func reply(approval: ApprovalReply, for requestID: String) async throws
/// Cancel the in-flight turn (e.g. user pressed cancel hotkey while
/// agent was working). Session stays open.
func cancelCurrentTurn() async
/// End the session. Releases subprocess, frees resources, makes the
/// session no longer usable.
func close() async
}
public enum AgentEvent: Sendable {
/// Free-form status string (e.g. "Reading main.py", "Searching for usages").
case status(String)
/// A streamed text chunk to append to the agent's response. Multiple
/// `partialText` events per turn are normal.
case partialText(String)
/// The agent invoked a tool. Surfaced for transparency.
case toolUse(name: String, summary: String)
/// The agent wants approval before doing something. The session is
/// PAUSED until `reply(approval:for:)` is called.
case approvalRequest(prompt: String, id: String)
/// The complete final answer for this turn (also reachable by
/// concatenating `partialText` chunks; emitted for convenience).
case finalText(String)
/// The agent did something off-screen (opened a PR, ran a command).
/// Surfaced as a chip in the conversation panel.
case sideEffect(summary: String)
/// Turn complete. The next `submit` starts a new turn in the same session.
case done
}
public enum ApprovalReply: Sendable {
case approve
case deny
case approveAlways // remember for this session
}
public struct AgentContext: Sendable {
public let foregroundApp: String? // best-effort
public let workspaceDirectory: URL? // configured per agent
public let timestamp: Date
}
public actor AgentRouter {
public init(active: AgentKind, context: AgentContext)
public var activeKind: AgentKind { get }
public var currentSession: (any AgentSession)? { get }
/// Submit a turn into the current session, starting one if none exists.
public func submitTurn(_ utterance: String) -> AsyncThrowingStream<AgentEvent, Error>
/// Tear down the current session and start fresh on next submit.
public func endCurrentSession() async
/// Switch agent backend. Ends the current session if any.
public func setActive(_ kind: AgentKind) async
}
public enum AgentKind: String, Sendable, CaseIterable {
case passthrough // no agent — paste at cursor (current dictation behaviour)
case claudeCode // local `claude` CLI subprocess
case ollama // local Ollama HTTP
case mlxLM // in-process via mlx-swift-lm (M3+)
}
PassthroughAgent — default, no networkModels a session as “every utterance becomes a single .finalText event,
followed by .done.” There’s no state across turns. This is OpenQuack-
as-dictation and remains the default until the user explicitly picks an
agent. requiresNetwork == false.
ClaudeCodeAgent — M2 targetclaude (the user’s CLI) as a long-running subprocess in
context.workspaceDirectory. The session owns the process for its
lifetime; close() terminates it cleanly.claude --output-format
stream-json or the equivalent at implementation time) so each event
the agent emits maps onto an AgentEvent..toolUse events; surfaces approval prompts as
.approvalRequest (the session pauses until reply(approval:for:)
unblocks it).requiresNetwork == true. Network indicator is mandatory in the
conversation panel and the overlay.OllamaAgent — M3http://localhost:11434).requiresNetwork == false (loopback is treated as not-network).MLXLMAgent — M3+mlx-swift-lm. Best privacy story for “fully local”.Three new surfaces compose the closed-loop experience. None of these exist yet; each gets its own follow-up spec.
Conversation panel (new, M2) — a separate floating window (NOT the menu-bar popover) that shows the full turn history of the active session. Opens automatically when a non-passthrough agent has been used. Closing the window does not end the session; an explicit “End conversation” button does.
Layout: chat-bubble style. Each turn shows the user’s transcribed utterance, then the agent’s status updates, tool-uses, partial text that aggregates into the final answer, and any side-effect chips. Approval prompts get an inline Approve / Deny / Always button row, keyboard-navigable, and a “yes / no” voice command also resolves them (see voice-approval below).
Compact reply overlay — when the agent is mid-turn and emits a
.approvalRequest event, the floating recording overlay (SPEC-004)
morphs into an “approval pill” with the prompt + buttons. Provides
keyboard- or voice-driven approval without opening the conversation
panel.
Voice approval (M3) — while an approval is pending, pressing the
hotkey and saying “yes” / “no” / “approve” / “deny” / “always” maps
to the corresponding ApprovalReply. Off by default; opt-in via
Settings → Agent. When off, only buttons / keyboard work.
no-session ──hotkey+speak──▶ session A starts ──hotkey+speak──▶ turn 2 in A
──"end conv"───▶ session closed
──switch agent─▶ A closed → B starts
--resume <session_id> so a hot restart re-attaches.passthrough. Fresh installs do not
route transcripts anywhere external.requiresNetwork == true.localhost is NOT treated as network. (Ollama/MLX local servers don’t
trigger the indicator.)AudioRecorder after
transcription, same as dictation mode. The agent never sees raw
audio, only the transcript.claude --output-format stream-json or its successor). The
AgentEvent mapping must be deterministic and testable.--permission-mode mode flag
let us route approvals through OpenQuack’s UI vs. its own CLI prompt?
Settle before the agent ships.PassthroughAgent + AgentRouter + conversation panel skeleton —
ratify the protocol against the simplest agent. The conversation panel
shows just user turns (no agent reply pane) since passthrough has no
meaningful response. (S, separate spec for the panel.)ClaudeCodeAgent MVP — single-turn-at-a-time, no approvals yet,
surface partial text + tool-use. Confirm session reuse works across
hotkey presses. (M)OllamaAgent — once the protocol shape is proven by Claude Code. (S)MLXLMAgent — last; needs the LLM weights story. (M)thinker.py had a one-shot LLM-call pattern; concepts only,
not code.Sources/OpenQuackKit/ will gain Agents/ once this spec ratifies.