Status: draft (M2.5 — companion to SPEC-007 polish engine)
Owner: OpenQuackApp/SettingsView.swift + OpenQuackApp/RecordingOverlay.swift
Last updated: 2026-05-03
Define the user-facing experience for the optional LLM polish step (SPEC-007 + SPEC-007a + SPEC-016). The pipeline now works at acceptable quality on a stock 4.6B Gemma 4 model with a tight formatting prompt; the remaining work is making it discoverable, controllable, and trust- worthy from the user’s perspective.
The current llmPolishEnabled boolean is too coarse. Replace with a
4-level picker:
| Setting | What runs | Latency add | Best for |
|---|---|---|---|
| Off | Regex TextPolisher only (current SPEC-007 baseline) |
0 ms | Default; users who want exact Whisper output |
| Light | Regex polish + LLM only on inputs that match a specific pattern (long, multi-sentence, list-shaped) | 0–700 ms (most cases skip LLM entirely) | The recommended default once Light mode is shipped |
| Standard | Regex polish + LLM on every dictation | 500–900 ms | Users who want consistent formatting on every dictation |
| Preview | Same as Standard, but show raw vs polished side-by-side; user picks which to paste | 500–900 ms + user decision | High-stakes work; also the data-capture mode (SPEC-007c future) |
Default: Off at v0.2.0 ship; recommend Light once the gating logic lands. Preview becomes opt-in for users who want the safety net.
┌── Polish ─────────────────────────────────────────────────┐
│ │
│ Mode: ○ Off ◉ Light (recommended) ○ Standard ○ Preview │
│ │
│ Light mode applies LLM polish only when the input would │
│ benefit (long text, lists, multi-sentence). Short inputs │
│ paste immediately without going through the LLM. │
│ │
│ Model: [gemma4-textonly:Q4_K_M ▾] │
│ • ~3 GB resident, ~0.7 s polish on M-series │
│ • Required: Ollama running with this model │
│ • Status: ✓ available │
│ │
│ ✓ Show "polish applied" indicator in overlay │
│ │
│ When polish fails: │
│ ◉ Paste raw text silently (recommended) │
│ ○ Show error notification │
│ ○ Cancel paste, copy raw to clipboard │
│ │
│ ───────────────────────────────────────────────────── │
│ Advanced │
│ │
│ Custom system prompt: [Edit...] │
│ Trigger threshold (Light mode): [Auto ▾] │
│ │
└────────────────────────────────────────────────────────────┘
Today the overlay disappears at “Pasted.” The polish step (700ms or so) is invisible. Add a brief “Polishing…” intermediate state:
[●●● Recording 3.4s]
↓ (user releases hotkey)
[Transcribing 0.8s] ← already present
↓
[Polishing 0.7s] ← NEW — shows when LLM polish runs
↓
[Pasted ✓] ← already present
Sub-1-second states should still flash visibly so the user understands what’s happening. The “Polishing” state never appears in Off mode and appears only when the LLM is actually invoked (in Light mode).
In Preview mode, after Whisper + polish complete, instead of pasting, show a small comparison overlay:
┌── Polish preview ──────────────────────────────────────────┐
│ │
│ Raw: │
│ the build is failing on CI again because of the test │
│ that flakes │
│ │
│ Polished: │
│ The build is failing on CI again because of the test │
│ that flakes. │
│ │
│ [⌘1 Use polished] [⌘2 Use raw] [⌘3 Edit & paste] │
│ │
└────────────────────────────────────────────────────────────┘
Whichever the user picks gets logged locally as a (raw, chosen, polished)
triple in ~/Library/Application Support/OpenQuack/polish-pairs.jsonl.
Stays on the user’s machine — never leaves. After ~100-200 real triples
accumulate, that becomes the v4 training data: real preferences > our
synthetic guesses.
(This is SPEC-007c material — separate spec when we’re ready to ship.)
Per SPEC-007a’s tier matrix, the default mode should depend on RAM:
| Detected RAM | Default mode | Rationale |
|---|---|---|
| 8 GB | Off (toggle to Light if user opts in) | 3 GB Gemma 4 + 1.5 GB Whisper + macOS overhead = pressure-warning; users should opt in only after seeing the resident impact |
| 16 GB | Light | Comfortable headroom; “polish only when useful” matches expected behavior |
| 24 GB+ | Standard | Memory headroom is generous; users can always step down |
The Settings → Polish pane should display the detected tier and explain the recommendation. Override is always available.
The polish step has three failure modes:
Current behavior: silent fall-through to regex-only paste (good — paste must never block). Improvement: log to system log + accumulate counts for the Settings pane so users can see if polish is failing often:
Polish status: ✓ available (last failure: never in last 100 calls)
⚠ frequent failures (38 of last 100 calls failed) [Diagnose…]
✗ unavailable (Ollama not running) [Open Ollama docs]
The “Diagnose…” button surfaces:
curl localhost:11434/api/tags)The LLM polish runs over loopback HTTP to a local Ollama daemon. The existing privacy contract (nothing leaves the device) holds, but be explicit about it in the Settings pane:
Polish runs entirely on your Mac via a local Ollama daemon at http://localhost:11434. The model never sends your transcript over the network. You can verify with Little Snitch or by reading
Sources/OpenQuackKit/Polish/OllamaPolishEngine.swift.
llmPolishMode UserDefault replaces llmPolishEnabled booleanbench/distill_corpus/runtime_cases.jsonl
pass with the shipped default model + prompt