Open Demo
30 min

Autoscaling Inference on AWS with Run:ai

EMEA: March 8th 1:00 PM CET Americas: March 9th 2:00 PM EST On Demand: Register to get the recording

Reserve Your Spot

This session is intended for:

AI Infrastructure / DevOps Managers

  • Automatically scale up and down Pods and Nodes on AWS
  • Run multiple Inference Workloads on a single GPU
  • Monitor Latency, Throughput, and Compute-Utilization in one Dashboard

MLOps Managers

  • Streamline Model Deployment
  • Guarantee optimal SLA and uptime for Model Serving

Run:ai features you will see:

  • GPU Fractioning
  • Compute Utilization Monitoring
  • Using NVIDIA Triton with Run:ai
  • Native AWS Integration

Presented By:

guy_webinar
Guy Salton

Director of Solution Engineering, Run:ai

rob_webinar
Robert Magno

Solution Engineer, Run:ai