Going Native: A 67MB Binary, On-Device Models, and Cleaner Transcripts

One promise has never moved for lognote: your meeting audio stays on your Mac. The recording is captured locally, transcribed locally, and written into your own notes. No upload, no bot in the call — your audio never leaves your Mac.

What changed recently is underneath that promise. We rebuilt lognote’s engine to run natively on Apple Silicon instead of going through a Python stack. The promise is identical. Everything else got better: the install is dramatically smaller, the work runs on Apple’s own frameworks instead of a pile of third-party libraries, and the transcripts came out cleaner.

One small binary instead of a Python stack

The old engine shipped a Python runtime plus a virtual environment of machine-learning libraries — about 1.2GB on disk before you recorded a single thing. The native engine is a single signed binary: 67MB. (It was 3MB before we linked in the on-device summarizer; even with that, it’s a rounding error next to what it replaced.)

The size is the least of it. There’s no Python to install, no external dependency stack to drift or break, and a much shorter path to shipping lognote as one standalone app you drag into Applications. Fewer moving parts means fewer things that can go wrong between you and a transcript.

The models run on-device

Both halves of the pipeline now run locally through Apple’s frameworks.

Transcription uses WhisperKit — Whisper (large-v3 turbo) running on Apple’s own ML stack instead of the old Python path. Summaries run on a local LLM that auto-scales with the length of the transcript: a short call uses Llama-3.2-3B, a longer one steps up to Llama-3.1-8B, and a very long one to Qwen2.5-14B. By default that all happens on your machine. You can still point summaries at a cloud provider if you want a bigger model — and if you do, only the text of the transcript is sent, never the audio.

The transcripts got cleaner

A smaller, simpler engine would be a hollow win if the output got worse. It got better.

Here is a real before-and-after from a test recording. To compare the two engines fairly without using anyone’s actual meeting, the input was a YouTube golf clip playing on the speakers while I talked over it. The exact same recording went through both the old engine and the new one. Only the transcription engine differs; everything after it is the same.

The old engine produced this:

[others]  They just step up there. You can see they're not thinking about too much stuff
[others]  And kind of like unbelievably effortlessly and really relaxed. They rip it dow
[me]      And when?
[me]      Oh, the northern lights are happening right now?
[me]      No, I think it's like on its way.
[others]  That was a special impression inrimrimrimrimrimrimrimrim

The native engine produced this:

[others]  You stand on the first tee and you stood watching your playing partner casually
[others]  They just step up there.
[others]  You can see they're not thinking about too much stuff and then they do this.
[me]      Wow.
[others]  And kind of like unbelievably effortlessly and really relaxed.
[others]  They rip it down the middle almost 300 yards.
[me]      Where'd you see that? And when?

Three things are worth pointing out.

Speaker labels work. Audio from the speakers is tagged others, audio from the microphone is tagged me, and the two are interleaved in the order they were spoken. That is lognote’s two-track approach to telling who said what, and the native output gets it right.

The cleanup actually fires. When you talk over a video, your microphone also picks up the video coming out of your speakers, so the same words can land on both tracks. The native engine recognizes that overlap and drops the duplicates, and it drops the stretches where the microphone wasn’t really hearing you speak. The result is a transcript with one clean copy of each thing that was said, not two noisy ones.

The garbage is gone. Look at the last line of the old output: inrimrimrimrimrim. That is the kind of degenerate loop a transcription model can fall into, and the old engine left it in the transcript. The native engine does not produce it. The note you get is one you can actually read, capitalized and punctuated, broken into sensible lines.

That last point matters more than it looks. A meeting note is only useful if you trust it enough to skim it later instead of replaying the recording. Cleaner output is not a vanity metric. It is the difference between a note you read and a note you ignore.

And it runs on a lot more Macs

One more win came along for free. The old engine leaned on libraries that needed a very recent version of macOS — macOS 26. Apple’s frameworks support far older releases, so the native engine runs on macOS 14.4 and up. That’s about twelve major macOS releases lower than before. lognote needs Apple Silicon either way (M1 or later), but if you’ve got one, it likely runs on the Mac you already have instead of asking you to upgrade for it.

None of this changed the part that matters most. The audio is still captured on your Mac, still transcribed on your Mac, and still written as plain Markdown into your own vault. Going native shrank the install, moved the models on-device, cleaned up the output — and, as a bonus, made that same local pipeline run on a lot more machines. The promise held still while the engine underneath it got better.