hero

Venture into tech

Want to get in early on the next big thing or join a tech rising star? Search our curated, vetted list of job opportunities at high-growth Ottawa-led and Ottawa-founded technology companies. Get notified of new opportunities - sign up for alerts belowCareers at Invest Ottawa
companies
Jobs

Real-Time Inference Systems Engineer

PolarGrid

PolarGrid

Software Engineering
Ottawa, ON, Canada · Remote
Posted on Feb 19, 2026

The Role

We are seeking a Real-Time Inference Systems Engineer to push the limits of end-to-end conversational latency.

This is a deeply technical role focused on collapsing voice-to-voice latency across GPU execution, model inference, and real-time audio pipelines. You will be turning what is normally a serial, jitter-dominated stack into a fully streaming system capable of conversational latency.

If you enjoy operating close to the metal and making systems feel instantaneous, this role is for you.

What You Will Work On

  • Deep optimization of GPU inference pipelines for real-time workloads
  • Streaming transformer inference for low-latency STT → LLM → TTS systems
  • GPU kernel scheduling, execution overlap, and CUDA stream concurrency
  • Kernel fusion, quantization, and speculative decoding techniques
  • KV-cache management, paging strategies, and memory locality optimization
  • Pinned memory, zero-copy transfers, and host/device overlap
  • Real-time audio pipelines, jitter buffer control, and streaming I/O
  • Converting serial inference stacks into fully overlapped, streaming systems

What We Are Looking For

  • CUDA, GPU kernels, and performance tuning in production systems
  • Low-latency or real-time systems (audio, video, networking, or inference)
  • Transformer inference internals and serving optimization
  • Streaming systems where milliseconds matter
  • Profiling and debugging complex, multi-stage pipelines

Bonus points for experience with:

  • STT or TTS systems or voice agents
  • Real-time audio or media systems
  • Distributed inference or edge compute
  • Compiler, runtime, or systems-level optimization

Who You Are

  • You think in timelines, not just throughput
  • You care deeply about where every millisecond goes
  • You enjoy ambiguity and building systems without existing playbooks
  • You are comfortable owning hard, open-ended problems end to end

Why Join PolarGrid

  • Work on a first-of-its-kind distributed inference platform
  • Solve problems that directly shape the future of real-time AI
  • Small, elite team with meaningful ownership and autonomy
  • Direct influence on product architecture and technical direction
  • Competitive compensation and equity