Backend Engineer
PolarGrid
Role Overview
We're seeking a Backend Engineer to build and scale our edge inference infrastructure. You'll architect distributed compute systems handling GPU-accelerated AI workloads across edge nodes with sub-10ms latency requirements.
Core Responsibilities
- Infrastructure Engineering
- Design and implement Kubernetes-native distributed compute platforms
- Build GPU resource management and allocation systems
- Develop edge deployment pipelines with automated testing
- Create high-performance inference serving infrastructure
Backend Systems
- Architect microservices for distributed model serving
- Implement API gateways with OpenAI and Hugging Face-compatible endpoints
- Build dynamic resource allocation and load balancing
- Design multi-backend systems with mutual exclusivity enforcement
Performance & Optimization
- Optimize GPU memory utilization and inference latency
- Implement streaming inference with TensorRT acceleration
- Build comprehensive monitoring and observability systems
- Design automatic scaling based on workload patterns
Required Technical Skills
Core Infrastructure
Kubernetes: Production experience with cluster management, resource allocation, networking
Containerization: Docker, container security, multi-stage builds, optimization
Distributed Systems: Service mesh, load balancing, distributed consensus, fault tolerance
Cloud: GitOps, infrastructure as code, AWS, CDK
Backend Development
Languages: TypeScript, Go, Python, or Rust
APIs: RESTful services, gRPC, WebSocket streaming, rate limiting
Databases: Distributed databases, caching systems, data consistency
Message Queues: Kafka, Redis, SQS, distributed event systems
AI Inference Infrastructure
GPU Computing: NVIDIA CUDA, TensorRT, GPU memory management
AI/ML Serving: Triton Inference Server, model optimization, batch processing
Performance: Latency optimization, throughput tuning, resource profiling
Preferred Experience
Infrastructure Platforms
- Edge computing deployments
- Multi-region distributed systems
- Hardware acceleration (GPUs)
- Container security (Kata, gVisor)
Monitoring & Operations
- Prometheus, Grafana, distributed tracing
- SRE practices, incident response
- Capacity planning, cost optimization
- Automated testing and deployment
What You'll Build
Edge Inference Platform
- Multi-tenant GPU inference clusters serving 10,000+ concurrent requests
- Sub-10ms latency requirements with geographic distribution
- Automatic model loading and resource optimization
- Comprehensive health monitoring and alerting
Backend Architecture
- Microservices handling model lifecycle management
- API gateway with authentication and rate limiting
- Dynamic backend switching (Python/TensorRT-LLM)
- Streaming inference with WebSocket support
DevOps Infrastructure
- Kubernetes operators for inference workload management
- Automated testing covering performance and reliability
- GitOps deployment with rollback capabilities
- Cloud and edge resource monitoring and cost optimization