Managing Kubernetes Cluster Upgrades with Zero Application Outages
Designed for Site Reliability Engineers (SREs) and DevOps leads responsible for production Kubernetes clusters at mid-to-large tech organizations, with direct accountability for uptime and SLA compliance. to spark real collaboration and high-energy learning.
A 90-minute hybrid workshop, with half the participants onsite in a high-tech conference room and half remote via Zoom. Audience is experienced with Kubernetes basics but anxious about real-world upgrade incidents: they’ve seen outages during upgrades, felt pressured by SLAs, and wish for hands-on, battle-tested approaches instead of theory. The session is interactive, tool-driven, and focused on immediate practical takeaways.
Upgrade Horror Headlines
Kick off with a rapid-fire display of actual headlines and tweets about infamous Kubernetes upgrade outages (e.g., 'Slack goes dark: Cluster upgrade gone wrong'). Ask the group to guess the root cause behind each headline before revealing it.
Tap to view the full activity.
Why this works
Real-world stories spark curiosity and make abstract risks tangible. Guessing builds engagement without pressure.
Upgrade Myths Busted
Display four common myths about Kubernetes upgrades ('Rolling restarts guarantee zero downtime', 'Minor version jumps are always safe', etc.). Invite a quick poll or hand-raise—myth or fact?—and then reveal the truth with concrete counterexamples.
Tap to view the full activity.
Why this works
Surface and correct misconceptions upfront, so participants don’t build on shaky mental models.
No-Wrong-Answer Upgrade Poll
Pose a simple, anonymous poll: 'What’s your biggest fear during a Kubernetes upgrade?' (Options: Data loss, Downtime, Rollback failure, Other). Display results instantly. Emphasize there’s no wrong answer and briefly validate all responses.
Tap to view the full activity.
Why this works
Makes participation safe and inclusive, surfaces concerns to tailor the session, and builds psychological safety.
Blue-Green Battle Drill
Divide participants into breakout groups (in-room/remote), each given a mock cluster YAML and upgrade scenario. Challenge teams: ‘Plan and pitch your blue-green deployment strategy in 5 minutes.’ Top 2 groups share their plans to the full room with rapid feedback.
Tap to view the full activity.
Why this works
High energy, collaborative, and competitive—drives active learning of upgrade pathways and teamwork.
Production Dilemma: To Roll or Not?
Present a short, detailed narrative: 'You discover API deprecations just before a scheduled upgrade window—do you proceed or delay?' Invite pairs to debate their decision and list their next three steps. Groups then share out.
Tap to view the full activity.
Why this works
Connects theory to urgent real-life decision-making. Forces prioritization and trade-off thinking.
My Outage, My Lesson
Invite each participant to privately recall their worst or closest-call production upgrade incident. Prompt them to jot down (on paper or shared doc): 'What would I do differently next time?' End with optional sharing for those who wish.
Tap to view the full activity.
Why this works
Personal reflection cements takeaways and builds emotional ownership of new practices.
Sign up to unlock 3 more activities
Get the full pack, facilitation flow, and more ready-to-run ideas.