LokalOptima

Neural networks don't need to run on someone else's computer to be useful. Custom CUDA inference engines. No frameworks, no dependencies — just raw GPU compute.

Projects Optimized for: RTX 5070 Ti

speech → text

Parakeet

1370x RTFx

real-time on a single utterance

Faster than any other framework
No Python, no PyTorch, no cuDNN
V3 multilingual: 25 EU languages

1.2 GBVRAM

FP8604 MB weights

cudac++cutlassasr

text → speech

Kokoro

290x RTFx

3x faster than best public benchmark

Custom neural grapheme-to-phoneme
Kokoro TTS fused into a single binary
Custom teacher for adding new spellings

Web UIincluded

WAVoutput

cudac++ttsg2p

speech → wake

Jarvis

~10ms on CPU

per wake word check

Whisper Tiny feature extractor
Silero VAD ported to Whisper
CoreML support for Whisper Tiny encoder
Subsequence DTW template matching

~150 MBmemory

CPUonly

c++whispervadcoreml

home assistant

Orkestrator

100% Local

no cloud, only your tailnet

Combines Parakeet, Kokoro, and Jarvis
Wake word → transcribe → respond → speak
Fully private, runs entirely on-device

orchestrationC++CUDACoreML

text → text

Qwen 3.5 0.8B

600 tok/s

with temperature sampling

Rebuilt on llama.cpp
DeltaNet + full attention hybrid
MTP speculative decoding
GPU-native text corrector

497 MBweights

Q6_Kquantization

llama.cppc++cudallm