LokalOptima

Neural networks don't need to run on someone else's computer to be useful. Custom CUDA inference engines. No frameworks, no dependencies — just raw GPU compute.

Projects Optimized for: RTX 5070 Ti

speech → text

Parakeet

1370x RTFx
real-time on a single utterance
  • Faster than any other framework
  • No Python, no PyTorch, no cuDNN
  • V3 multilingual: 25 EU languages
1.2 GBVRAM
FP8604 MB weights
cudac++cutlassasr
text → speech

Kokoro

290x RTFx
3x faster than best public benchmark
  • Custom neural grapheme-to-phoneme
  • Kokoro TTS fused into a single binary
  • Custom teacher for adding new spellings
Web UIincluded
WAVoutput
cudac++ttsg2p
speech → wake

Jarvis

~10ms on CPU
per wake word check
  • Whisper Tiny feature extractor
  • Silero VAD ported to Whisper
  • CoreML support for Whisper Tiny encoder
  • Subsequence DTW template matching
~150 MBmemory
CPUonly
c++whispervadcoreml
home assistant

Orkestrator

100% Local
no cloud, only your tailnet
  • Combines Parakeet, Kokoro, and Jarvis
  • Wake word → transcribe → respond → speak
  • Fully private, runs entirely on-device
orchestrationC++CUDACoreML
text → text

Qwen 3.5 0.8B

600 tok/s
with temperature sampling
  • Rebuilt on llama.cpp
  • DeltaNet + full attention hybrid
  • MTP speculative decoding
  • GPU-native text corrector
497 MBweights
Q6_Kquantization
llama.cppc++cudallm