A SRE Guide to Managing Database Connection Pool Starvation
Designed for Site Reliability Engineers (SREs) and Platform Engineers responsible for high-availability, production-grade database infrastructure in fast-scaling SaaS companies to spark real collaboration and high-energy learning.
A 90-minute, hands-on virtual workshop for SREs who have recently experienced incidents involving database bottlenecks. Attendees are technically advanced but may lack specialized knowledge in connection pooling and often feel pressure during live incidents when databases become unresponsive, leading to confusion between genuine outages and connection exhaustion.
Visual Mystery: Connection Pool Tales
Start with a quick, interactive poll showing anonymized connection pool monitoring graphs from real incidents. Ask participants to guess what happened in each scenario—was it true database downtime, a network hiccup, or pool starvation? No right answers yet—just hypotheses.
Tap to view the full activity.
Why this works
This sparks curiosity, primes attention, and taps into prior experience without pressure. Visual guessing ignites both recognition and inquiry.
Busting Pool Starvation Myths
Present three common statements about connection pool behavior (e.g., 'Raising the max pool size always helps'). Have the group vote 'True' or 'False', then reveal the surprising realities with supporting data and short anecdotes.
Tap to view the full activity.
Why this works
Confronts persistent misconceptions head-on, reducing overconfidence and preventing habitual mistakes.
Blameless 'What Would You Try?'
Show a sanitized incident Slack thread where an SRE team troubleshoots a live pool starvation event. Invite each participant to suggest (in chat or sticky notes) the next command or dashboard they'd check—no wrong answers—then discuss as a group.
Tap to view the full activity.
Why this works
Lowers the stakes for sharing ideas. Encourages broad participation and surface diverse strategies, normalizing incomplete knowledge.
Race to the Root Cause
Break into small teams for a 5-minute 'diagnosis dash.' Each group gets a mini-case (e.g., service timeouts, 100% pool usage, app errors). They must write: 1) their diagnosis, 2) top two checks, and 3) first mitigation step. Teams share rapid-fire, with playful leaderboard points for creativity or speed.
Tap to view the full activity.
Why this works
Injects energy, friendly competition, and team bonding—while reinforcing diagnostic thinking under time pressure.
The Pager Duty Dilemma
Share a true or anonymized case where an SRE woke up for a pool starvation alert, only to discover it was a misconfigured client, not a DB issue. Have the group discuss, 'Should you escalate, patch, or silence?' and debate the tradeoffs in small groups.
Tap to view the full activity.
Why this works
Hooks participants with a relatable stressor, prompting deep thinking about real-world stakes, escalation paths, and the cost of false alarms.
Your Starvation Story Snapshot
Invite each participant to jot (privately or via chat) a one-line story: a moment when they discovered or solved a connection pool issue, or a fear they have about it. Volunteers (or facilitator) share a sampling. End with a reflection: 'What’s one thing you want to remember or try differently after today?'
Tap to view the full activity.
Why this works
Facilitates metacognition, anchors learning in personal narrative, and boosts psychological safety by normalizing vulnerability.
Sign up to unlock 3 more activities
Get the full pack, facilitation flow, and more ready-to-run ideas.