Managing Kubernetes Cluster Upgrades with Zero Application Outages

Designed for Site Reliability Engineers (SREs) and DevOps leads responsible for production Kubernetes clusters at mid-to-large tech organizations, with direct accountability for uptime and SLA compliance. to spark real collaboration and high-energy learning.

A 90-minute hybrid workshop, with half the participants onsite in a high-tech conference room and half remote via Zoom. Audience is experienced with Kubernetes basics but anxious about real-world upgrade incidents: they’ve seen outages during upgrades, felt pressured by SLAs, and wish for hands-on, battle-tested approaches instead of theory. The session is interactive, tool-driven, and focused on immediate practical takeaways.

Icebreaker

Activity 1

Upgrade Horror Headlines

Kick off with a rapid-fire display of actual headlines and tweets about infamous Kubernetes upgrade outages (e.g., 'Slack goes dark: Cluster upgrade gone wrong'). Ask the group to guess the root cause behind each headline before revealing it.

Tap to view the full activity.

Why this works

Real-world stories spark curiosity and make abstract risks tangible. Guessing builds engagement without pressure.

Icebreaker

Activity 2

Upgrade Myths Busted

Display four common myths about Kubernetes upgrades ('Rolling restarts guarantee zero downtime', 'Minor version jumps are always safe', etc.). Invite a quick poll or hand-raise—myth or fact?—and then reveal the truth with concrete counterexamples.

Tap to view the full activity.

Why this works

Surface and correct misconceptions upfront, so participants don’t build on shaky mental models.

Icebreaker

Activity 3

No-Wrong-Answer Upgrade Poll

Pose a simple, anonymous poll: 'What’s your biggest fear during a Kubernetes upgrade?' (Options: Data loss, Downtime, Rollback failure, Other). Display results instantly. Emphasize there’s no wrong answer and briefly validate all responses.

Tap to view the full activity.

Why this works

Makes participation safe and inclusive, surfaces concerns to tailor the session, and builds psychological safety.

Icebreaker

Activity 4

Blue-Green Battle Drill

Divide participants into breakout groups (in-room/remote), each given a mock cluster YAML and upgrade scenario. Challenge teams: ‘Plan and pitch your blue-green deployment strategy in 5 minutes.’ Top 2 groups share their plans to the full room with rapid feedback.

Tap to view the full activity.

Why this works

High energy, collaborative, and competitive—drives active learning of upgrade pathways and teamwork.

Icebreaker

Activity 5

Production Dilemma: To Roll or Not?

Present a short, detailed narrative: 'You discover API deprecations just before a scheduled upgrade window—do you proceed or delay?' Invite pairs to debate their decision and list their next three steps. Groups then share out.

Tap to view the full activity.

Why this works

Connects theory to urgent real-life decision-making. Forces prioritization and trade-off thinking.

Icebreaker

Activity 6

My Outage, My Lesson

Invite each participant to privately recall their worst or closest-call production upgrade incident. Prompt them to jot down (on paper or shared doc): 'What would I do differently next time?' End with optional sharing for those who wish.

Tap to view the full activity.

Why this works

Personal reflection cements takeaways and builds emotional ownership of new practices.

Sign up to unlock 3 more activities

Get the full pack, facilitation flow, and more ready-to-run ideas.