Neural networks don't need to run on someone else's computer to be useful. Custom CUDA inference engines. No frameworks, no dependencies — just raw GPU compute.
Projects Optimized for: RTX 5070 Ti
speech → text
1370x RTFx
real-time on a single utterance
- Faster than any other framework
- No Python, no PyTorch, no cuDNN
- V3 multilingual: 25 EU languages
1.2 GBVRAM
FP8604 MB weights
text → speech
290x RTFx
3x faster than best public benchmark
- Custom neural grapheme-to-phoneme
- Kokoro TTS fused into a single binary
- Custom teacher for adding new spellings
Web UIincluded
WAVoutput
home assistant
Orkestrator
100% Local
no cloud, only your tailnet
- Combines Parakeet, Kokoro, and Jarvis
- Wake word → transcribe → respond → speak
- Fully private, runs entirely on-device