This session is intended for:
AI Infrastructure / DevOps Managers
- Automatically scale up and down Pods and Nodes on AWS
- Run multiple Inference Workloads on a single GPU
- Monitor Latency, Throughput, and Compute-Utilization in one Dashboard
MLOps Managers
- Streamline Model Deployment
- Guarantee optimal SLA and uptime for Model Serving
Run:ai features you will see:
- GPU Fractioning
- Compute Utilization Monitoring
- Using NVIDIA Triton with Run:ai
- Native AWS Integration