📋 Main Topics

Introduction & Motivation
- Operational challenges with LLMs
- Differences between traditional ML and LLM pipelines

Model Optimization & Inference Acceleration
- Techniques: Quantization, Speculative Sampling, MoE
- Cold start mitigation strategies

Infrastructure, Observability, Continuous Deployment & Automation
- Hardware selection, scaling, and CI/CD with Kubeflow
- Load balancing and redundancy
- Performance monitoring and canary deployments

🧠 Class Activity - Labs

  • Lab 1: Optimized Inference, Quantization, Speculative Sampling and MoE
  • Lab 2: Build an LLM inference pipeline using Kubeflow and serve with FAST API