Published on 01/08/2025
Written by Robert Koch
6 min read
I was scrolling on Linkedin the other day when I came across a sponsored post with a clickbait headline and while this seems like useful advice for setting appropriate retry limits it doesnât address the main issue which is recursion is bad - especially in the cloud.
This got me thinking about Lambda best practices and how functions in the cloud should be used. I think due to a general lack of education and âvibe codingâ mentality people have built excessively complicated lambda workflows to solve a myriad of problems. A common example Iâve seen around are lambda functions that invoke themself in a recursive mess.
The issue has only gotten worse over the years, to the point where AWSâŚ
Published on 01/08/2025
Written by Robert Koch
6 min read
I was scrolling on Linkedin the other day when I came across a sponsored post with a clickbait headline and while this seems like useful advice for setting appropriate retry limits it doesnât address the main issue which is recursion is bad - especially in the cloud.
This got me thinking about Lambda best practices and how functions in the cloud should be used. I think due to a general lack of education and âvibe codingâ mentality people have built excessively complicated lambda workflows to solve a myriad of problems. A common example Iâve seen around are lambda functions that invoke themself in a recursive mess.
The issue has only gotten worse over the years, to the point where AWS has even created a custom tool to detect invocation recursion but that still hasnât deterred people from making this common mistake - so I guess itâs my turn.
The problem isnât just theoretical. A quick scan of LinkedIn and X reveals engineers actively promoting recursive patterns in cloud workflows.
Iâm not sure why recursion has emerged as valid cloud computing paradigm. I think it has a lot to do with recursion not being properly taught or engineers not having been explained the dangers in using recursion.
You absolutely should not write a lambda function that calls itself.
Cloud costs make recursive Lambdas a terrible idea. Theyâre slow, inefficient, and expensive. Lambdas are a tool that should be used when you have a short compute processes or asynchronous job (such as uploading files or writing to a database) and the frequency of the job does not warrant a full time compute resource like a container or instance.
Recursion is taught in Computer Science/Software Engineering fairly early as a paradigm to solve complicated tasks by reducing the problem with each layer. But as you can see from this example it can be incredibly expensive from a resource perspective to compute an answer using recursion. In this fibonacci code a lambda function essentially does one multiplication operation per invocation1.
Loading languages
So why shouldnât lambda functions call other lambda functions? There are two main reasons in my opinion.
- Since lambdaâs are charged per invocation and duration the time it takes to invoke and run a function can quickly eat into your budget. Even though you arenât charged for spinning up the lambda if your code is written in such a way that the first lambda has to wait for the second one you will need to pay for that wait time, this is a waste of both time and money.
- Lambdaâs should not be responsible for their own execution. If you have to keep the state of the lambda workflow inside a lambda function this will lead to inconsistencies that will break your workflow.
Recursive lambdas are also an indication in my opinion of poorly written code and badly defined requirements.
Nearly all recursive functions can be transformed to a non recursive form so itâs unlikely that what youâre trying to do is a fundamentally recursive problem. That being said there are truly recursive operations - however if your code cannot escape using recursion you should be running it in one lambda. This Computerphile video explains one example of a non primate recursive function.
If you need retry logic or some type of loop you should use a step function or SQS queue to handle the state as these systems are designed to handle edge cases much better than your code.
Iâve talked about the power of step functions in the past, in my opinion they are the optimal choice when creating complicated workflows using lambda functions.
Hyperscaling - Recursion Goes Wrong
Years ago when I worked at AWS one of the new grads had to create a project for their onboarding, as part of the project they created a recursive lambda function that quickly spiralled out of control. If I remember correctly this was before recursion detection but still when Lambda had a invocation limit of 1000 at any given time. This limit was the only thing that stopped the Lambdas from using the entire regions compute resources.

Billing dashboard after a runaway Lambda event
The good news for this Cloud Architect was that since this was running in an internal account the actual costs were zero. But if this was an external customer account there was nothing in place to prevent this runaway cost scenario at the time.
What people might find really interesting here is how auxiliary services such as KMS, CloudTrail, and CloudWatch take up significant costs as well as the Lambda. This is because by default these services are enabled in a somewhat noisy configuration so when a Lambda function runs it will log activity to CloudWatch, API calls will be logged in CloudTrail, and KMS will be used if there are any encryption keys required. CloudWatch is notorious for cost overruns because most of the time itâs free or almost free, but after you pass the free tier limit the costs quickly skyrocket.
This little case study is why I will never recommend using a recursive lambda in any context. The dangers are to great and there are better alternatives that can be included in your design. So next time Copilot generates a lambda for you, make sure that it doesnât call itself.
Footnotes
This is an especially heinous example because the lambda needs to wait for all the nested lambdas to complete before it can return a result. This means that the time the first lambda is running for is the sum of all the nested functions, and the time of the second is the sum of all the below functions and so on. âŠ