Episode 67 - Dealing With Failure Special

AWS TechChat - A podcast by Shane Baldacchino

Categories:

In this themed episode of AWS TechChat we are explored how one deals with failure because as we say, everything fails all the time. We started level setting with some acronyms to ensure we were all the on the same page. RTO, RPO’s and will now mean something to everyone by the end of this episode. DR is often thought in many organisations as an insurance policy and we spoke about impact vs risk and how you can put some structure around your decision making. We then spoke about various approaches you can use or DR * Pilot Light - ensuring you replicate your statement of records and are able to instantiate your stacks via infrastructure as code * Warm Standby, allowing you to run a scaled down version of your stack, but allowing you to scale up with Auto Scaling Groups and increasing the number of running tasks in your containers. * Before speaking about a traditional backup and restore approach, which is still very valid. LTO may be dead but you can use most backup applications in 2020 with S3 and Glacier as a target and if that's not an option there is also a VTL option in Storage Gateway Pete told us about nifty trick for auto recovery with Min 1|Max 1 auto scaling groups as well as EC2 recovery options. We then pivoted to what it would it take to architect for multi-region application allowing you to run your solution across multiple AWS regions in an active/active topology speaking through the challenges you may face and what tools are available. Before closing out with Multi AZ architectures which is a key differentiator of AWS from other providers. We gave a refresher on what AZ’es are and explained that all AWS services are either multi-AZ by default or a tick-box offering allowing you to build robust architectures than with stand AZ failure.

Visit the podcast's native language site