Week 10: LLMops or GenAIops

📋 Main Topics¶

Introduction & Motivation
- Operational challenges with LLMs
- Differences between traditional ML and LLM pipelines

Model Optimization & Inference Acceleration
- Techniques: Quantization, Speculative Sampling, MoE
- Cold start mitigation strategies

Infrastructure, Observability, Continuous Deployment & Automation
- Hardware selection, scaling, and CI/CD with Kubeflow
- Load balancing and redundancy
- Performance monitoring and canary deployments

🧠 Class Activity - Labs¶

Lab 1: Optimized Inference, Quantization, Speculative Sampling and MoE
Lab 2: Build an LLM inference pipeline using Kubeflow and serve with FAST API

📚 Recommended Readings¶

Understanding LLMOps: Large Language Model Operations by Weights and Biases
Speculative Decoding for 2x Faster Whisper Inference
FastAPI Tutorial

🎥 Recommended Videos¶

LLMops - Short Course by Deep Learnin AI (1.5 Hours)
Large Language Model Operations (LLMops) Explained by IBM (6 minutes)