openquack

SPEC-008 — In-context transcript rewrite

Status: draft (M3 candidate; benched alongside SPEC-007 in M2.5) Owner: OpenQuackKit/Polish/ (extends TextPolishEngine from SPEC-007) Last updated: 2026-04-29

Goal

Augment the LLM polish step with active-app context so the polished transcript reads correctly for where it’s about to be pasted. The same spoken sentence should produce a Slack-shaped DM in Slack, prose in Pages, and code/comment style in VS Code, and domain terms should resolve correctly given the context (e.g. “income tax” — heard by Whisper from “in-context” — gets corrected when the foreground app is VS Code).

This is the M3 row from docs/ROADMAP.md:

Active-app context: feed the foreground app + focused input field’s surrounding text into Whisper’s prompt bias and the polish/agent prompt, so domain terms resolve correctly and the agent has the same context the user does.

Why now (benched in M2.5, shipped in M3)

Picking the SPEC-007 default model without testing in-context behaviour risks shipping a model that polishes well in isolation but ignores context when given it. We bench in-context cases now so the model recommendation considers all three dimensions (see SPEC-007 §Quality gates). Implementation lands in M3 once SPEC-007 has shipped its base polish UI.

Non-goals

Public surface

Extend PolishContext from SPEC-007:

public struct PolishContext: Sendable {
    public let language: String?
    public let foregroundApp: AppContext?  // nil if user disabled in Settings
    public let timestamp: Date
}

public struct AppContext: Sendable {
    public let bundleID: String         // e.g. "com.tinyspeck.slackmacgap"
    public let displayName: String      // "Slack"
    public let category: AppCategory    // coarse bucket — see below
}

public enum AppCategory: String, Sendable {
    case chat        // Slack, Discord, iMessage, Teams
    case email       // Mail, Outlook, Spark
    case code        // VS Code, Xcode, JetBrains, Cursor
    case docs        // Pages, Word, Google Docs (Safari/Chrome)
    case terminal    // Terminal, iTerm, Warp
    case browser     // generic
    case other
}

The category is what the prompt actually consumes; the bundle ID is kept for telemetry and future per-app tuning.

Prompt augmentation

Append a single context line before the user message:

[Context: writing in {category} ({displayName})]
{raw_transcript}

Per-category nudges baked into the system prompt:

These nudges are appended to the SPEC-007 system prompt, not a replacement.

Bench (paired references)

In bench/polish_corpus/cases.jsonl, in-context cases use the in_context category and group N raws × M contexts:

{"id": "ctx_001_chat", "category": "in_context", "language": "en",
 "raw": "ok so I'm thinking we drop the model and pick one of the smaller ones",
 "app_context": "chat",
 "references": ["thinking we drop the current model and pick a smaller one"],
 "must_contain": [], "must_not_contain": []}

{"id": "ctx_001_email", "category": "in_context", "language": "en",
 "raw": "ok so I'm thinking we drop the model and pick one of the smaller ones",
 "app_context": "email",
 "references": ["I'm thinking we should drop the current model and pick a smaller one."],
 "must_contain": [], "must_not_contain": ["ok so"]}

{"id": "ctx_001_code", "category": "in_context", "language": "en",
 "raw": "ok so I'm thinking we drop the model and pick one of the smaller ones",
 "app_context": "code",
 "references": ["// drop the current model and pick a smaller one",
                "Drop the current model and pick a smaller one."],
 "must_contain": [], "must_not_contain": ["ok so"]}

Initial set: 10 raws × 3 contexts = 30 paired cases. The judge prompt sees the app_context slot and scores whether the output fits.

Settings (M3 implementation, not M2.5)

Open questions

References