TL,DR: My client have too many Fargate tasks running all the time. The reason is that CloudWatch alarms to scale in these idle tasks had disappeared. Adding these alarms would solve the issues
One fine day, my manager told my team that there is a sudden surge in AWS cost for my client and none of the senior developers can handle it. So I took on the challenge and started to investigate the issue.
![]() |
---|
Out of control AWS cost |
Due to the lack of devops, we relied on AWS fargate to simplify the deployment process while accepting the heavy cost. But the problem arised when the cost was beyond our anticipation.
![]() |
---|
AWS Architecture |
After some quick investigation, I found out that Fargate tasks and RDS instances are two of the biggest cost contributors.
![]() |
---|
Fargate tasks |
It was the start of my journey to find the root cause of the issue. The rabbit hole was deep and it took me weeks to find the root cause. I received a lot of help from my team.
![]() |
---|
My team’s suggestions |
I took some of the advices like switching to ARM instance instead of X86, use YJIT for more proficient code execution, remove redundant context calculation with graphql, etc.
![]() |
---|
Use YJIT |
![]() |
---|
Remove redundant context calculation with graphql |
But the cost was still high and no significant improvement was seen. We still had 32 Fargate tasks running most of the time.
![]() |
---|
Fargate tasks |
After a lot of investigation, I found out that the root cause of the issue was that CloudWatch alarms to scale in these tasks when they are idle had disappeared. Adding these alarms would solve the issues.
![]() |
---|
The missing alarms is in the red box |
After adding these alarms, the cost was reduced by almost 40%.
![]() |
---|
Cost after optimization |