openquack

SPEC-002 — Transcription

Status: ratified (M1 complete; iterations expected) Owner: OpenQuackKit/Transcription/ Last updated: 2026-04-26

Goal

Transcribe a discrete audio buffer (file or PCM samples) into text behind a stable abstraction so engines can be swapped, compared, and benchmarked.

Non-goals

Public surface

public protocol TranscriptionEngine: AnyObject {
    static var engineName: String { get }
    static var suggestedModels: [String] { get }
    var modelID: String { get }
    func transcribe(audioFile url: URL, language: String?) async throws -> EngineTranscription
}

public struct EngineTranscription: Sendable {
    var text: String
    var detectedLanguage: String?
    var audioSeconds: TimeInterval
    var wallSeconds: TimeInterval
    var timeToFirstToken: TimeInterval?
}

public enum EngineKind: String, CaseIterable, Sendable { case whisperkit, lightning }

Engines

Engine Status Use case
WhisperKit (argmaxinc/argmax-oss-swift) shipped Primary runtime engine for the app
Lightning (lightning-whisper-mlx via subprocess) shipped, bench-only Comparison baseline; not for app runtime
WhisperCpp future Non-MLX reference for cross-platform later
MLXAudioEngine (mlx-audio-swift) future Voxtral / Qwen3-ASR / Parakeet variants

Quality gates (M2 default model selection)

A model that fails any of these is not the default. BENCHMARKS.md is the source of truth.

Open questions

References