BoreNO

Running Chaos Engineering Game Days to Test System Resilience

Designed for Senior Site Reliability Engineers (SREs) and DevOps Leads in organizations adopting microservices architectures who are responsible for system reliability and uptime to spark real collaboration and high-energy learning.

This is a 90-minute hybrid workshop for SRE/DevOps leads at a fintech company rapidly scaling its cloud infrastructure. Participants have high technical fluency but have not run structured chaos experiments before; their main pain points are fear of introducing instability and a lack of psychological safety in surfacing hidden weaknesses. Sessions combine live-demo, small group breakouts, and whole-room debriefs.

Icebreaker
Activity 1

Mystery Outage Story Opener

Open with a dramatic reenactment of a real, high-profile system outage (like Netflix’s Christmas Eve AWS region failure), but pause at the climax and ask teams to predict the actual root cause and how it was found. Let participants brainstorm wild theories in chat or on sticky notes before revealing the outcome.

Tap to view the full activity.

Why this works

Unexpected stories stimulate curiosity and prime brains for learning; guessing before knowing encourages deeper engagement and recall.

Icebreaker
Activity 2

Chaos Engineering Mythbusting

Show three popular statements about chaos engineering (e.g., 'Chaos Engineering is just about breaking things randomly,' 'Only hyperscalers need it,' 'It always creates downtime'). Have participants vote true/false via poll, then discuss which are myths and which have a kernel of truth—with clarifying evidence.

Tap to view the full activity.

Why this works

Surface misconceptions early so the group builds shared understanding; cognitive dissonance helps correct persistent myths.

Icebreaker
Activity 3

Safe Bet Mini Poll

Pose a low-stakes, relatable prompt: 'If you could safely inject a single failure into your system today, which would you choose?' Offer quick-select options (e.g., DB connection drops, network latency spike, service dependency fails). Collect responses via sticky notes or polls, and affirm that all choices are valid.

Tap to view the full activity.

Why this works

Low-pressure participation builds psychological safety, letting quieter voices surface risk concerns without judgment.

Icebreaker
Activity 4

Resilience Rapid-Fire Teams

In small teams, run a competitive round: Who can brainstorm the most realistic failure scenarios in 90 seconds for a given microservice? Use a countdown clock and high-energy music, then have groups shout out their wildest ideas and tally scores for fun.

Tap to view the full activity.

Why this works

Short, intense bursts of collaboration drive energy, break barriers, and encourage lateral thinking—crucial for uncovering blind spots.

Icebreaker
Activity 5

Stakeholder Dilemma Hotseat

Present a scenario where two stakeholders—Product Manager and SRE Lead—disagree on whether to run a Game Day before a major release. Invite two volunteers to role-play each side (with prompt cards), then open the floor: 'If you were in the CTO’s seat, what would you decide?'

Tap to view the full activity.

Why this works

Real-world dilemmas build empathy for competing priorities and highlight the importance of cross-team communication.

Icebreaker
Activity 6

Resilience Wins & Fails Gallery Walk

Prompt everyone to recall a moment when a system they worked on either withstood a surprising failure (win) or crumbled unexpectedly (fail). Collect 1-2 sentence vignettes on sticky notes or an online board. Then, do a gallery walk: read others’ stories and comment one takeaway or action they’d try as a result.

Tap to view the full activity.

Why this works

Active reflection cements learning and personalizes risk; sharing stories fosters connection and vulnerability.

Sign up to unlock 3 more activities

Get the full pack, facilitation flow, and more ready-to-run ideas.

Sign up with email