21 min read11 hours ago
–
In our previous guide, we treated every tool as a typed morphism with a contract — essentially, each tool was like a function with a clear input and output type, and a “contract” (or specification) about what it does. This structured view helps a GPT-4 based AI agent know exactly how to use a tool correctly. Now it’s time to take the next step: how do we combine these tool-functions to solve more complex tasks? In this tutorial, we’ll explore how to compose tools both sequentially and in parallel, using the GPT-4 (0613) function-calling API. We’ll introduce the idea of a monoidal structure for tool usage — which is a fancy way of saying we can combine …
21 min read11 hours ago
–
In our previous guide, we treated every tool as a typed morphism with a contract — essentially, each tool was like a function with a clear input and output type, and a “contract” (or specification) about what it does. This structured view helps a GPT-4 based AI agent know exactly how to use a tool correctly. Now it’s time to take the next step: how do we combine these tool-functions to solve more complex tasks? In this tutorial, we’ll explore how to compose tools both sequentially and in parallel, using the GPT-4 (0613) function-calling API. We’ll introduce the idea of a monoidal structure for tool usage — which is a fancy way of saying we can combine functions in sequence (one after another, denoted by ∘) or in parallel (side by side, denoted by ⊗). Don’t worry if these symbols seem abstract; we’ll break down their meaning with examples, code snippets, and intuitive diagrams. By the end, you’ll see how GPT-4 can orchestrate multiple functions step-by-step, just like a well-coordinated workflow, and how this relates to concepts like applicative and monadic patterns for handling effects (think: managing I/O, state, or even retries).
Press enter or click to view image in full size
Recap: Tools as Typed Functions (Morphisms with Contracts)
Before jumping into composing tools, let’s quickly recap the key idea from last time. We established that each tool can be viewed as a function — in programming terms, a function has a type signature like f: A -> B (meaning it takes an input of type A and produces an output of type B). We also give each tool a contract: a description of what it does and any assumptions or requirements for using it. For example, you might have a tool that retrieves weather data:
def get_current_weather(city: str) -> dict: """Fetches the current weather for the given city and returns a dictionary of weather info.""" # ... implementation ...
Here get_current_weather is a tool (function) that expects a city name (a string) and returns a dictionary (say, containing temperature, humidity, etc.). The contract would include what the function does (in the docstring or description) and maybe conditions like “city name must be valid”. In the context of GPT’s function calling, we define a JSON schema for this function’s parameters and let the model know this tool is available.
Why think of it this way? Because if the AI (our GPT-4 model) knows the exact input and output of each tool, it can plan how to use them to answer complex queries. Each tool is like a building block (a typed morphism), and the contract ensures the block is used correctly (no guessing what it does or what format it expects). This sets the stage for combining these blocks: just like in programming we combine functions to build bigger logic, our AI agent can combine tool calls to tackle multi-step tasks.
In summary: We have tools = functions with types. Now let’s see how two or more tools can be combined to achieve something that one tool alone cannot.
Sequential Composition (∘): Chaining Tools in Sequence
Often, solving a user’s query requires multiple steps in a particular order. For instance, first we might need to fetch some data with one tool, then process or format that data with another tool. This is where sequential composition comes in. We denote sequential composition as g ∘ f (pronounced “g after f”). If we have:
- A function f that maps A -> B (takes type A, returns type B), and
- Another function g that maps B -> C,
then the composition g ∘ f is a new operation that effectively maps A -> C. In other words, g ∘ f means “first apply f, then apply g to f’s result.”
Textually, we can depict this flow as:
A --f--> B --g--> C
Here, the output of f becomes the input to g. The types must align: f’s output type (B) should be the same type that g expects as input. This is just like plugging the output of one tool into the next tool.
Sequential Tool Use in GPT-4
How does this look with GPT-4’s function calling? Let’s say a user asks a question that naturally breaks into two steps. For example:
User:* “What is the population of France, and can you give me a summary of France’s Wikipedia page?”*
This query actually asks for two things: (1) population of France, and (2) a summary of the France Wikipedia page. We might have two separate tools available to handle these: one tool to get population data, and another tool to fetch or summarize a Wikipedia page. A sensible sequence would be: first get the population, then get the summary, then combine the results.
Let’s define two example tools for this purpose:
def get_population(country: str) -> int: """Return the population of the given country (as an integer).""" # ... imagine this calls an API or database ... return 67500000 # (for example, France ~67.5 million)
def get_wikipedia_summary(topic: str) -> str: """Return a brief summary of the Wikipedia page for the given topic.""" # ... imagine this calls Wikipedia API ... return "France, officially the French Republic, is a country primarily located in Western Europe..." # etc.
In a GPT-4 function calling setup, we would register these functions in the API with their names, descriptions, and JSON parameter schemas. For instance:
functions = [ { "name": "get_population", "description": "Retrieve the population of a specified country.", "parameters": { "type": "object", "properties": { "country": {"type": "string", "description": "Name of the country"} }, "required": ["country"] } }, { "name": "get_wikipedia_summary", "description": "Fetch a short Wikipedia summary for a given topic.", "parameters": { "type": "object", "properties": { "topic": {"type": "string", "description": "Topic to summarize (e.g., country name)"} }, "required": ["topic"] } }]
Now, when the user asks their question, we pass the user message and the function definitions to GPT-4 (specifically using the gpt-4–0613 model, which supports function calling):
import openaimessages = [ {"role": "system", "content": "You are an assistant that can use tools."}, {"role": "user", "content": "What is the population of France, and can you give me a summary of France's Wikipedia page?"}]response = openai.ChatCompletion.create( model="gpt-4-0613", messages=messages, functions=functions, function_call="auto" # let the model decide which function (if any) to call)
When this call returns, GPT-4 may decide it needs to use a function. In this case, it likely recognizes it should use get_population first (since the query mentions population). The response we get will indicate a function call. We can inspect it like so:
first_message = response['choices'][0]['message']print(first_message)# {"role": "assistant", "content": None, "function_call": {"name": "get_population", "arguments": "{ \"country\": \"France\" }"}}
GPT-4 has cleverly chosen to call get_population and provided the argument “France”. Notice: it knew this from the user’s request. At this point, the ball is in our court as the developer — the model is asking us to execute the get_population function.
So our code should detect that a function was called and then actually run the get_population function:
if first_message.get("function_call"): func_name = first_message["function_call"]["name"] # e.g. "get_population" func_args = first_message["function_call"]["arguments"] # e.g. "{ \"country\": \"France\" }" func_args = json.loads(func_args) # Assuming we have a mapping from function name to our Python function: result = None if func_name == "get_population": result = get_population(**func_args) elif func_name == "get_wikipedia_summary": result = get_wikipedia_summary(**func_args) # Format result as a string (if it's not already) to pass back to the model result_str = str(result)
Now we package the result and send it back to GPT-4 so it can continue the conversation. In the API, we do this by adding a special message with role “function” and the name of the function, containing the result:
# Append the function result messagemessages.append({ "role": "function", "name": func_name, "content": result_str})# Call GPT-4 again with the new message (which includes the function result)response = openai.ChatCompletion.create( model="gpt-4-0613", messages=messages, functions=functions, function_call="auto")
On this second call, GPT-4 now has access to the result of get_population. The conversation history the model sees is essentially:
- User: “What is the population of France, and can you give me a summary of France’s Wikipedia page?”
- Assistant (function call): *called *
get_population("France") - Function (
get_population): returned 67500000 (for example)
Given this context, the model will incorporate the number 67,500,000 into its reasoning. But the user also wanted a Wikipedia summary. GPT-4 might now decide to call the second function get_wikipedia_summary, because it still needs that part to fully answer the question. Indeed, if we inspect the new response, we might see something like:
second_message = response['choices'][0]['message']print(second_message)# {"role": "assistant", "content": None, "function_call": {"name": "get_wikipedia_summary", "arguments": "{ \"topic\": \"France\" }"}}
It picked up that it should summarize “France”. So we repeat the process: execute get_wikipedia_summary("France"), get the summary text, append it as a function result message, and call the model again:
# Execute the second function callfunc_name = second_message["function_call"]["name"] # "get_wikipedia_summary"func_args = json.loads(second_message["function_call"]["arguments"]) # {"topic": "France"}result = get_wikipedia_summary(**func_args)messages.append({ "role": "function", "name": func_name, "content": result})# Final call to get the answer after all needed functions are doneresponse = openai.ChatCompletion.create( model="gpt-4-0613", messages=messages, functions=functions, function_call="auto")final_answer = response['choices'][0]['message']['content']print(final_answer)
At this point, GPT-4 has both pieces of information (population and summary) in the conversation. It should now produce a final answer for the user, something along the lines of:
Assistant:* “France has a population of about 67,500,000 people. Here’s a brief summary of France from Wikipedia: France, officially the French Republic, is a country primarily located in Western Europe…”*
This demonstrates sequential composition: the model first used one tool, then the next, each depending on the previous step’s result. We effectively created a pipeline get_wikipedia_summary** ∘ **get_population to fulfill the user’s request. The type flow was:
User query (asking for two things)--> [get_population: Country -> Number] --> (intermediate result: population as Number) --> [get_wikipedia_summary: Topic -> Text] --> (intermediate result: summary Text)--> (final answer composed using both Number and Text)
GPT-4 handled the logic of when to call which function; our job as developers was just to execute those calls and feed back the results until the model had everything it needed. In practice, we’d generalize this with a loop to keep calling the model and executing functions until we get a final answer (content message) instead of another function call. But the key takeaway is: Sequential tool use allows an AI agent to handle multi-step queries by chaining functions one after the other.
When to Chain Tools Sequentially?
Whenever a task’s solution naturally breaks down into an ordered list of steps where each step’s output is needed for the next, you’ll use sequential composition. Some common scenarios:
- Fetching data then analyzing it (as we did above).
- Parsing something, then formatting it (first get raw info, then prettify).
- Performing a calculation, then using the result in another calculation.
- Example: User asks to “find the average age of people in a list and then tell if they’re above 18.” You might first use a calculate_average_age tool, then use a check_majority tool on the result.
The GPT model can coordinate these steps as long as it knows the tools and their I/O types. It will ensure the outputs line up with the next function’s inputs. Essentially, the model is reasoning: “First I need to do X, then with that result I can do Y.” This is exactly how we think of g ∘ f in math: do f, then do g.
Keep in mind that order matters in sequential composition: g ∘ f is generally not the same as f ∘ g. For example, “summarize then translate” is different from “translate then summarize.” The AI will pick the right order based on the query (and of course, based on how you describe the functions’ purposes in their descriptions).
Now that we’ve seen sequential composition, let’s look at another way to combine tools: in parallel.
Parallel Composition (⊗): Independent Tools in Tandem
Some tasks can be split into independent parts that don’t rely on each other. In our notation, we use ⊗ (tensor product) to denote parallel composition. If we have:
- f: A -> B
- g: C -> D
then f ⊗ g is an operation on a pair of inputs (A and C) that gives a pair of outputs (B and D). In other words: (f ⊗ g): (A, C) -> (B, D). It means “apply f to its part A and apply g to C, in parallel, and combine the results.” We can visualize this as two arrows side by side:
A --f--> BC --g--> D
(f ⊗ g) takes (A, C) and returns (B, D).
How does this idea apply to tool use with GPT? Let’s imagine a user request that naturally splits into two independent sub-tasks. For example:
User:* “1. Translate the phrase ‘Good morning’ to Spanish, and 2. Calculate 5 * 12.”*
Here we have two tasks: a translation and a multiplication. They have nothing to do with each other — you could do them in either order, or even simultaneously. The final answer just needs both results. We could equip our assistant with two tools:
translate(text, language)— for translations,multiply(x, y)— for arithmetic multiplication.
Define them (conceptually) as:
def translate(text: str, language: str) -> str: """Translate the given text into the target language.""" # ... (calls some translation API) ... return "Buenos días" # for example, "Good morning" -> "Buenos días"
def multiply(x: float, y: float) -> float: """Return the product of x and y.""" return x * y
Each tool has its own input and output: translate: (Text, Language) -> Text, and multiply: (Number, Number) -> Number. The user’s combined request effectively is asking for something like (translate ⊗ multiply) applied to ( “Good morning”, 5 and 12 ) — except the inputs are not a single tuple, it’s two separate parts of the query. The crucial observation is that these sub-tasks don’t depend on each other’s results. We could translate the phrase and multiply the numbers independently and then combine the answers in the final response.
Currently, the OpenAI GPT model will normally handle one function call at a time, sequentially. If a prompt clearly asks for two unrelated things, the model might choose one function to call first, then after getting the result, call the second. This is sequential execution, but logically it’s parallelizable because order doesn’t matter. In fact, OpenAI has introduced an (experimental) ability for the model to request multiple function calls at once by returning an array of function calls (using a parameter like [parallel\_tool\_calls](https://cookbook.openai.com/examples/reasoning_function_calls#:~:text=Some%20OpenAI%20models%20support%20the,handle%20arbitrarily%20complex%20reasoning%20workflows)). If that capability is enabled, the model could essentially say “I need to use translate and multiply” in one go, and the developer could execute both, perhaps even truly in parallel if threading or async is used, then return both results for the model to finish the answer. Without that, we as developers could still detect that these two calls are independent and optimize by running them concurrently in our code. But from the perspective of writing the AI agent’s logic, we treat it as conceptually parallel tasks.
Let’s illustrate the straightforward way GPT-4 might handle the above query without special parallel call features, just normal sequential calls:
- User asks: translation and multiplication.
- GPT-4 sees two distinct asks. It might call translate(“Good morning”, “Spanish”) first (just a guess of order).
- We execute translate and return the result (“Buenos días”), feeding it back.
- GPT-4 then still owes the multiplication answer, so it calls multiply(5, 12).
- We execute multiply (5*12=60) and feed it back.
- Now GPT-4 has both pieces and can respond: “‘Good morning’ in Spanish is ‘Buenos días’, and 5 * 12 = 60.”
Even though the model did them one after the other, there was no dependency. We could swap steps 2 and 4 and it would still work. This is what we mean by parallel composition: the tasks are independent and can be done in tandem. If our system or the model were smart enough to do both at once, it could have. In a theoretical perfectly parallel agent, step 2 and 4 would happen simultaneously, and step 6 combines the results.
Parallel Composition with Shared Input
Sometimes parallel tasks might even start from the same input. For example, recall the earlier scenario from a previous guide: given a string “Hello”, get me both its uppercase and lowercase forms. We could think of two functions:
to_uppercase(text)-> textto_lowercase(text)-> text
Both take the same input (“Hello”) and produce different outputs. These can also be done independently: one doesn’t affect the other. In our mathematical notion, we have one input type (Text) and two outputs (Text, Text). This can be seen as a parallel composition if we imagine a function that duplicates the input to feed both f and g. Formally, you might consider it as:
Text --f--> Text_upperText --g--> Text_lower
Here f = uppercase, g = lowercase, and we use the same original Text for both. While GPT-4 function calling doesn’t natively spawn two functions at once, we could approach it by either having a single tool that returns both forms (like we did with a decompose_request in the prior guide), or let the model call one then the other. The key is the concept: these operations don’t interfere, so order is unimportant except for making the conversation flow logically.
How GPT Handles “Parallel” Intents
If a user question inherently has parallel sub-tasks, GPT-4 will typically still do them one by one, but you as the designer should be aware that you could execute them concurrently for efficiency. The monoidal tensor notation (⊗) is a mental model to identify when tasks are independent. It tells us:
- We could reorder them if needed.
- We might combine their results at the end in a different structure.
- Each tool deals with its own part of the input.
In practice, consider merging results: after doing f and g, the model might need to compose the final answer. Usually, the model itself will handle phrasing the final answer using both results (as it did: combining translation and arithmetic result in one sentence). If you had a very complex scenario, you could also have a dedicated function to merge results, but often it’s not needed — the model can just produce the answer text once it has all needed info.
A note on the OpenAI API: The parallel_tool_calls option (if available in newer models) allows the model to return something like function_call: [ {…call1…}, {…call2…} ] in one response. This explicitly indicates parallel needs. The OpenAI cookbook suggests handling this by executing both and then returning both outputs to the model. This feature is experimental, but it aligns nicely with our understanding of ⊗. Even without it, thinking in terms of parallelizable sub-tasks is useful for designing tools and prompts.
Managing Effects and State in Tool Chains
When we start combining multiple tools, especially in sequence, we have to think about effects — things like making I/O calls, maintaining some state between steps, handling errors, etc. In functional programming terms, there are patterns like Applicatives, Arrows, Monads, and Algebraic Effects that formalize how to sequence or parallelize operations with such side effects. Let’s unpack this in the context of our GPT tool usage (don’t worry, we’ll keep it high-level and practical):
- Applicative (Parallel Independent Effects): Applicative functors in FP allow you to perform multiple independent actions and then combine their results. This is analogous to the parallel (⊗) case we discussed. If two tool calls don’t depend on each other, you can treat them like independent effects and execute both, then collect the outcomes. For example, calling a translation API and a math API independently is applicative-like. The important part is: no step’s result influences the other’s execution. In our GPT scenario, if the model knows two calls are needed but unrelated, it could, in theory, prepare both calls at once (if allowed), or the developer can execute them concurrently under the hood. We didn’t explicitly label anything “Applicative” in code, but conceptually recognizing independent tasks is using the applicative mindset — it’s about parallel composition of effects.
- Arrows (Structured Pipelines with Inputs/Outputs): Arrows are a generalization of functions that can be composed, often used when you want to pre-define a computation flow. An Arrow can be thought of like a pipeline with multiple inputs and outputs. In tool terms, an arrow could represent a pre-defined workflow of tools: for instance, a single arrow that takes an input, then splits it, feeds one part to one tool and another part to a second tool (parallel), then feeds those outputs into a third tool (sequentially). This is more advanced, but you might design an aggregate tool that internally uses multiple other tools — effectively encoding a mini workflow. For instance, an arrow could encapsulate “take user query, use tool A on part of it, tool B on another part, then combine”. While GPT’s function calling doesn’t natively let one function call another (the model has to initiate each), as developers we can create functions that themselves call other functions inside (thus acting like an arrow composition behind the scenes). Arrows are about structuring such multi-step computations where you might want static analysis or to enforce a certain pattern. In our context, think of it as designing a composite tool for common multi-step patterns.
- Monads (Sequential Dependent Effects): Monads are the go-to abstraction for chaining operations where each step can depend on the previous one’s result. This is exactly our sequential composition (∘) case. A monadic workflow allows later steps to use earlier outcomes (like we did feeding population into the next function). If you’ve heard of the Promise or async/await pattern in JavaScript or futures in Python, those are monadic patterns for sequencing asynchronous calls. In GPT tool usage, you can think monadically: the conversation context carries state (the outputs of previous function calls) that influences what to do next. Our loop of calling GPT, getting a function call, executing it, and feeding it back is essentially manually handling a monadic bind: the model’s state is updated with the new info and then it continues. Each function call is an effect (like I/O), and the conversation history context is analogous to a monadic context carrying that effect’s result forward. If we were to formalize it, we could say our AI agent is using a tool-monad where bind takes the output of one tool and provides it to the next tool’s invocation.
- Algebraic Effects (Flexible Effect Handling): Algebraic effects are a more recent concept in programming languages, which allow you to declare operations (effects) like “do a web search” or “get current time” without specifying how to handle them, and then have handlers that define what to actually do. This decouples the effect specification from its implementation. If we draw an analogy, GPT’s function calling is kind of like an algebraic effect system: the model can invoke an effect (call a tool) and it’s up to our code (the handler) to execute it and return a result. We could swap out implementations or handle errors in the middle, and the model wouldn’t know — it just knows that the effect was carried out. For example, we might implement a search_web function tool now, but later replace it with a different search engine — the AI doesn’t care as long as the contract is honored. We can also inject custom logic like retries or fallback inside the handler that the AI didn’t explicitly ask for, which is similar to handling an algebraic effect. This gives a lot of flexibility. We could model things like retries, state, or rate limiting as effects too: e.g., an effect for “Log this step” or “Wait 5 seconds” could be tools that do those tasks, and the AI could invoke them when needed (or the handler could enforce them automatically).
Modeling practical concerns: Let’s connect these concepts to real issues:
- IO and State: Tools often involve I/O (network calls, file access) and maybe state (e.g., a tool that keeps memory of previous calls). Using a monadic approach, you might chain calls and carry some state (like an authentication token or accumulated data). For example, a state monad could carry a running summary that the AI updates with each step. With GPT, we actually carry state via the conversation history (which includes prior results).
- Nondeterminism: Some tools might return one of many possibilities (like a tool that fetches a random trivia, or if you query multiple sources). You can think of this as a nondeterministic effect — in a formal sense, a list of possible outcomes. Monads like the list monad handle branching computations. In our agent scenario, GPT might have to handle uncertainty, e.g., it might call a search tool and get back multiple results. We as designers could either pick one or let the model decide which result to use, or even let it call another tool to disambiguate. Designing for nondeterministic outputs can be tricky; it might involve calling the same tool multiple times or ranking results. This is a more advanced use case, but it’s worth noting.
- Retries and Errors: Suppose a tool fails (maybe an API is down or returns an error). We can build a retry mechanism as part of the tool’s contract or the function handler. For instance, we might catch exceptions in the Python code that calls the tool and decide to call it again or return an error message. In a monadic effect model, this could be akin to an Error monad (where the computation can short-circuit with an error) or having a special effect for retries. With GPT function calling, one approach is: if a function fails, we return an error message as the function result and let GPT decide what to do (it might apologize or ask the user for something else). Alternatively, we might not even tell GPT and automatically retry internally, which is more like handling the effect behind the scenes. This kind of design is part of making the system robust.
- Rate Limiting: If some tool should not be overused (maybe it’s expensive or limited to N calls per minute), we must enforce that. This could be done outside the model entirely (the handler delays or refuses calls when over limit). Or you could inform the model of cost via the tool description (contract) so it avoids calling it too frivolously. Rate limiting can be viewed as a side-effect that you want to control in the composition of calls. An algebraic effect style solution might be to have a special effect “TokenBucket” or so, but practically, you’ll likely implement this in code. Still, conceptually, it’s part of the contract: e.g., “this tool should only be called once per conversation unless necessary.” GPT might then strategize accordingly.
Putting It All Together
We’ve covered a lot of ground, so let’s summarize how these abstract concepts map to our GPT-4 tool-using AI:
- Sequential (Monadic) composition: The agent calls one tool, gets result, calls the next — ideal for dependent steps. We ensure the output of one matches input of the next (type alignment). GPT-4 handles the reasoning chain, we handle passing the data along. This allows modeling of dependent effects (later steps see earlier results).
- Parallel (Applicative) composition: The agent can handle independent requests, potentially calling tools one after another in any order. As designers, we know these could be parallelized. This is useful for independent effects (tasks that don’t interfere). We might manually parallelize execution to save time, but logically the model treats them as separate calls and then composes the answer.
- Structured workflows (Arrow style): We can predefine combinations of tools if needed. While GPT can figure it out dynamically, sometimes we might want to guide it or provide a single function that wraps a known multi-step routine. That function internally does the steps (possibly not exposing intermediate results to GPT). This can simplify the model’s job at the cost of flexibility. It’s like giving a shortcut tool for a common pattern.
- Handling side effects (Algebraic effect thinking): We treat tool calls as effectful operations that are resolved by an external handler (our code). We can enrich this handler with logic for retries, error handling, logging, or enforcement of limits without burdening the model with those details. The model just knows the idealized effect (e.g., “
get_weather” gives weather data). The real world complexities (like maybe the first API call failed and we tried again) are managed by the handler code, invisible to the model unless we choose to inform it. This separation of concerns is powerful.
A Tiny Example with Pseudocode
To cement understanding, here’s a miniature pseudocode combining some of these ideas in a loop, akin to how you’d implement a full tool-using agent:
tools = { "toolA": toolA_func, "toolB": toolB_func, ... } # mapping names to implementationsmessages = [ {"role": "system", "content": "You can use functions toolA, toolB to help."}, {"role": "user", "content": "User's request ..."} ]while True: response = openai.ChatCompletion.create(model="gpt-4-0613", messages=messages, functions=function_schemas, function_call="auto") msg = response['choices'][0]['message'] if msg.get("function_call"): # There's a function the model wants to use name = msg["function_call"]["name"] args = json.loads(msg["function_call"]["arguments"]) try: result = tools[name](**args) # Execute the tool (could have internal retries, etc.) except Exception as e: result = f"ERROR: {e}" # Append the function result for the model messages.append({"role": "assistant", "content": None, "function_call": msg["function_call"]}) messages.append({"role": "function", "name": name, "content": str(result)}) # Loop continues, model will see the new info and decide next step continue else: # It's a normal message (final answer or clarification) answer = msg.get("content") print("Assistant:", answer) break
In this pseudocode:
- We let the model go step by step.
- If it calls a function, we execute it. The try/except is where we could implement error handling or retries (for instance, if a call fails, maybe try again or return an error string).
- We append both the assistant’s function call (for completeness) and the function’s result to the history. The model, on the next iteration, sees that and can use it.
- This continues until the model stops requesting functions and gives a final answer.
This loop is effectively orchestrating a monadic sequence of tool calls: each iteration depends on the history (state) accumulated so far, including all previous effects. If there were opportunities to parallelize, we could identify them. For example, if msg[“function_call”] could contain multiple calls (hypothetically), we would execute all before appending results. Or we might notice two recent calls were independent and handle them specially. Those are optimizations; the core logic is sequential for correctness.
Conclusion
In this guide, we expanded our toolbox (pun intended) by looking at how multiple tools can be used together to handle complex queries. We learned about sequential composition (∘) — chaining tools when one’s output is another’s input — and parallel composition (⊗) — handling independent tool calls that could be done in tandem. We kept the focus on GPT-4’s function-calling system, showing how an AI agent can autonomously decide to use tools step-by-step, while we ensure the plumbing (executing functions and feeding results back) works correctly.
Along the way, we touched on important concepts of effects and abstract patterns:
- Thinking in terms of Applicatives helped us recognize independent tasks (parallelizable work).
- Monadic thinking helped with sequential dependent tasks (one step at a time, carrying forward state).
- Arrows hinted that we can design higher-level tool flows or composite functions.
- Algebraic Effects drew a parallel to how GPT’s tool use is implemented: the model invokes an abstract action and our code fulfills it, which is a clean separation of concerns. This means we can handle things like retries, errors, or rate limits behind the scenes, making the AI’s life easier while ensuring robustness.
By treating tools as typed morphisms with contracts, and now understanding how to compose them, you have a framework to build powerful AI workflows. You can safely let a GPT-4 agent navigate multi-step problems, knowing it will respect tool interfaces and combine them logically. In future guides, we can dive even deeper into designing complex tool ecosystems and handling advanced effect patterns. But even with what we covered here, you should be able to craft AI solutions that go beyond single-step question answering — into the realm of multi-step reasoning, planning, and action.
Happy composing! Just like good software is built by composing simple functions, good AI behavior can be achieved by composing simple tools. With GPT-4 as the conductor and your tools as the orchestra, even complex symphonies of tasks can be performed with grace.