AWS Monthly (May '25): The Death of the War Room
May brought CloudWatch Investigations, which have fundamentally changed someone's on-call rotation. Instead of manual log correlation, ...
May brought CloudWatch Investigations, which have fundamentally changed someone's on-call rotation. Instead of manual log correlation, this service uses AI to perform Automated Root Cause Analysis (RCA).
When an alarm triggers, Investigations automatically traces the error. It correlates metric spikes with concurrent events, like a specific Git commit, a Terraform apply, or an RDS parameter change. Instead of a dashboard showing "500 Errors," you get a report saying: "The latency spike in Service A was caused by a configuration change in Service B that triggered a connection leak in RDS."
Combined with CloudWatch RUM Session Replay, you can now visually reproduce the user's journey leading up to a crash. It’s effectively a "Time Machine" for your infrastructure. If you're still doing manual log-diving during incidents, May gave you back your weekends.
More from Ercan
Two more sites, same author, different ground.
AI, LLMs, agents, applied ML.
Field notes on AI workloads. Bedrock cost analysis, agent patterns, vector storage trade-offs, production failure modes.
Visit ercan.ai →The hub. About, consulting, contact.
Personal hub for both writing tracks. Who I am, how the consulting works, how to reach me.
Visit ercanermis.com →