Real-Time Inference Systems Engineer
PolarGrid
The Role
We are seeking a Real-Time Inference Systems Engineer to push the limits of end-to-end conversational latency.
This is a deeply technical role focused on collapsing voice-to-voice latency across GPU execution, model inference, and real-time audio pipelines. You will be turning what is normally a serial, jitter-dominated stack into a fully streaming system capable of conversational latency.
If you enjoy operating close to the metal and making systems feel instantaneous, this role is for you.
What You Will Work On
- Deep optimization of GPU inference pipelines for real-time workloads
- Streaming transformer inference for low-latency STT → LLM → TTS systems
- GPU kernel scheduling, execution overlap, and CUDA stream concurrency
- Kernel fusion, quantization, and speculative decoding techniques
- KV-cache management, paging strategies, and memory locality optimization
- Pinned memory, zero-copy transfers, and host/device overlap
- Real-time audio pipelines, jitter buffer control, and streaming I/O
- Converting serial inference stacks into fully overlapped, streaming systems
What We Are Looking For
- CUDA, GPU kernels, and performance tuning in production systems
- Low-latency or real-time systems (audio, video, networking, or inference)
- Transformer inference internals and serving optimization
- Streaming systems where milliseconds matter
- Profiling and debugging complex, multi-stage pipelines
Bonus points for experience with:
- STT or TTS systems or voice agents
- Real-time audio or media systems
- Distributed inference or edge compute
- Compiler, runtime, or systems-level optimization
Who You Are
- You think in timelines, not just throughput
- You care deeply about where every millisecond goes
- You enjoy ambiguity and building systems without existing playbooks
- You are comfortable owning hard, open-ended problems end to end
Why Join PolarGrid
- Work on a first-of-its-kind distributed inference platform
- Solve problems that directly shape the future of real-time AI
- Small, elite team with meaningful ownership and autonomy
- Direct influence on product architecture and technical direction
- Competitive compensation and equity