Skip to main content
14 events
when toggle format what by license comment
Jan 26, 2024 at 18:01 comment added JimmyJames @simonalexander2005 To elaborate, if we assume 24/7 operation and no scheduled downtime, that SLA means the system can be down around 20 minutes during an entire year. I would expect that your SLA would be evaluated over a shorter period e.g., a week. That means less than a minute of downtime. The question is: how fast can Fargate detect a failure and start a new instance of this system?
Jan 26, 2024 at 15:57 comment added JimmyJames @simonalexander2005 That would seem to be the challenge with this approach i.e., it presupposes some downtime in order to work, unless I am missing something. I think you will be hard-pressed to guarantee that SLA with this approach.
Jan 26, 2024 at 8:48 comment added simonalexander2005 @JimmyJames for the PoC at the moment, not much - but eventually 99.996%
Jan 26, 2024 at 3:08 answer added RibaldEddie timeline score: 2
Jan 25, 2024 at 19:29 comment added JimmyJames What are your uptime requirements?
Jan 25, 2024 at 18:41 answer added Jon Raynor timeline score: 3
Jan 25, 2024 at 17:07 comment added simonalexander2005 @JonRaynor leadership election means that only one instance is active at a time - the other instances are running as a standby, so that if the leader instance stops for some reason another instance can pick it up. Therefore effectively only one instance is doing the work
Jan 25, 2024 at 16:22 comment added Jon Raynor Can you explain why you mention not running multiple instances in the first paragraph, but you are currently running multiple instances with the leadership election in the 3rd paragraph. This seems to be a contradiction.
Jan 25, 2024 at 14:09 comment added Philip Kendall @simonalexander2005 because distributed computing is hard, and particularly hard over an unreliable network. And all networks are unreliable. AWS can hide a lot of the complexity, but underneath it all you can't magically solve the Two Generals Problem.
Jan 25, 2024 at 14:07 comment added PMah During deployments, depending on configuration, Fargate might start up the new task and wait for it to be healthy before stopping the old one. You can configure it to stop the old one first though, if that's what's needed. However, you'd also need to make sure your container health checks are rock solid; I've seen (admittedly rare) situations where a task lost network connectivity due to some issue with the underlying host, but because the health check didn't check for that, the task was still considered healthy, and it wasn't replaced.
Jan 25, 2024 at 13:47 review Close votes
Jan 30, 2024 at 3:08
Jan 25, 2024 at 13:45 comment added simonalexander2005 Well, yeah, if they give you the option to choose the max number of running containers to be 1, why wouldn't you assume that?
Jan 25, 2024 at 13:26 comment added Philip Kendall Do you regard "blindly trusting that Fargate will never actually run 2 copies of a container when the instance count is set to 1" as "missing something"? For avoidance of doubt, I have no particularly knowledge on what Fargate does here but it's possible it's been engineered to behave differently under some circumstances than you seem to be assuming.
Jan 25, 2024 at 13:05 history asked simonalexander2005 CC BY-SA 4.0