Published on

Configuring LLM Tool Use - Understanding tool_choice Options

Authors
  • avatar
    Name
    J-AI
    Twitter
    @

Large Language Models can be enhanced by connecting them to external tools or functions (for example, web search, databases). In an LLM application, the tool_choice setting (sometimes called the function calling mode) determines when and how the model uses those tools. Choosing the right tool_choice option is important for building reliable chatbots, AI agents, or any LLM-powered system that uses external functions. This post breaks down the four tool_choice configurations—"auto", "required", a specific forced function, and "none"—and explains the behavior of each. We’ll look at how each mode works, when to use it, and tips for prompt design to make the most of LLMs with tools.

What is tool_choice?

When you enable tools (functions) for an LLM, tool_choice is a parameter that controls the model’s tool-using policy. In other words, it tells the model whether it can, must, or must not call a tool in its response, and if so, how it should select the tool. The available options for tool_choice are:

  • none: Completely disable tool use. The model will not call any function.
  • auto: Let the model decide if a tool is needed. The model may choose to call one of the provided tools, or answer directly with no tool call.
  • required: Force the model to use at least one tool. The model must call a function (it will choose the most relevant one if multiple are available) before giving a final answer.
  • Specific tool (by name): Force the model to use a particular tool. This is done by specifying the exact function name in the tool_choice parameter, ensuring the model calls that function (and not any other) as part of its answer.

These settings give us fine-grained control over the LLM’s behavior. In practice, they map to how the model decides on function calling in the underlying API (for instance, OpenAI’s function calling API uses similar concepts, where auto is default, none disallows functions, and you can force a specific function by name). The newer required option (sometimes called any in certain frameworks) ensures the model always invokes a tool if one is available.

Next, we’ll dive into each option, exploring what actually happens when the LLM processes a user query under each setting. First, Table 1 below provides a quick high-level comparison of the four modes and their behaviors.

tool_choice Options at a Glance

To summarize the differences, the table below compares the four tool_choice modes in terms of tool usage policy, how the tool is selected, and the LLM’s response flow:

ModeTool Usage PolicyLLM Behavior & Call Flow
noneTools are disabled. Model will not call any tool.Direct answer only: The model responds from its own knowledge in one step (no function calls).
autoTools are optional. Model decides whether to use a tool and which tool to use.Adaptive flow: If no tool is needed, the model answers directly in one step; if a tool is needed, it makes one function call and then returns a final answer using the tool’s result (two-step interaction).
requiredTool use is mandatory. Model must call at least one of the provided tools.Tool-first flow: The model always responds with a function call first (choosing the most relevant tool). After getting the tool’s output, the model provides the final answer (always a two-step process).
specific toolA particular tool is forced. The model must call the specified function (format: depends on the llm provider).Fixed tool flow: The model calls the designated tool (one function call) and then delivers the final answer. No other tools will be used (two-step process, assuming the tool is called as instructed).

Table 1: Comparison of tool_choice settings and their effect on model behavior.

Each mode influences whether the LLM will use an external tool, and if so, how it selects the tool and structures its response.

With this overview in mind, let’s explore each mode in detail with examples to illustrate how the LLM’s response flows differ.

"none": No Tool Usage

Figure 1: tool_choice="none" – The model never calls external tools. The user prompt goes to the LLM, and it directly returns an answer without any function call.

When tool_choice is set to "none", you are essentially telling the LLM not to use any tools at all. Even if functions are defined in the API call, this setting disables them. The model will treat the conversation like a standard prompt-completion task using only its internal knowledge and reasoning.

  • Behavior: The LLM will always produce an answer directly, and never output a function call. The entire interaction is a single step: the user asks a question or gives a prompt, and the model responds in natural language.
  • LLM call structure: Only one call to the model is needed. Since no tool is invoked, there’s no follow-up call. The flow is simple: User prompt → LLM → Answer. Figure 1 above illustrates this straightforward flow.
  • Use case: Use tool_choice="none" when you want to ensure the model doesn’t rely on external data or functions. For example, if you’re generating creative content (stories, poems) or handling queries that the model should answer from its training data only, disabling tools can be appropriate. It’s also useful if your environment cannot make external calls (for instance, an offline setting) or if you want to prevent certain tools from being used due to trust or safety reasons.
  • Pitfall to watch: With no tools allowed, the model might hallucinate answers for queries that would normally require an external lookup or calculation. For instance, if asked for the current weather or a live fact, a tool-disabled model might guess or produce outdated information. As a developer or product manager, you should ensure that if tool_choice="none", either the queries truly don’t need external info or the model is instructed to politely refuse such requests (rather than invent an answer).

In summary, "none" mode gives the most straightforward behavior—the LLM acts on its own. However, it forfeits the accuracy and capability benefits that tools can provide for certain tasks.

"auto": Model Decides on Tool Use

The default and most flexible setting is tool_choice="auto", which allows the model to decide for itself whether to use a tool for a given query. Think of this as giving the model the freedom to call a function if it believes one is needed. If no function seems relevant, it will just answer normally; if a function can help, the model will call it.

Figure 2: tool_choice="auto" – The LLM can either answer directly or invoke a tool. In this scenario, the model determined a tool was needed, so it issues a function call, receives the result, and then produces the final answer. (If no tool was needed, the flow would look like Figure 1, going straight from prompt to answer.)

Under auto mode, the model evaluates the user’s prompt and the available tool definitions to decide on the best course:

  • Behavior: The LLM dynamically chooses whether to use a tool. It will call a function only if it concludes that using that tool is necessary to fulfill the request. Otherwise, it will answer directly with no tool. In effect, "auto" lets the model use its judgement. By default, many LLMs are trained to use provided functions when relevant – for example, OpenAI’s GPT-4 model will consider function calling if the prompt + instructions seems to require it.

  • Tool selection: If the model decides to use a tool, it also chooses which tool to call. It will pick the one whose purpose (from the function description) best matches the user’s request. For instance, if the user asks, “What’s 15 times 37?” and you have a calculator function available, the model will likely choose that function and pass the appropriate arguments. If multiple tools are available, the model’s internal reasoning (guided by the tools’ descriptions and the system instructions) will determine the most relevant function.

  • LLM call structure: This mode can result in either a one-step or two-step interaction:

    • If no tool is called, the flow is just like normal: User prompt → LLM → Answer (similar to Figure 1).
    • If a tool is called, the interaction becomes two-step: User prompt → LLM (model outputs a function call) → [Tool executes externally] → LLM (model receives tool result and returns final answer). In the first step, the model’s answer will contain a special JSON indicating the chosen function and arguments instead of a direct answer. The developer (or system) then executes the tool and sends back the tool’s output for the model to incorporate into a final answer.
  • Use case: auto is ideal for general-purpose assistants and chatbots. For example, a customer support chatbot might mostly answer from a knowledge base, but occasionally use a tool like a database lookup for an account status if asked. Or a conversational assistant like ChatGPT in browsing mode uses "auto" logic – it will answer straightforward questions from memory, but if asked to fetch live information (like today’s news or weather), it will call the appropriate function. This mode provides a good balance of efficiency and capability: the model uses tools when needed but doesn’t incur the overhead if it can answer on its own.

  • Pitfalls: While flexible, "auto" can sometimes lead to the model not using a tool when it should, or vice versa, if the prompt isn’t clear. The model might attempt to answer from memory (potentially incorrectly) if it’s not absolutely sure a tool is necessary. To mitigate this, prompt/system instruction design is key – you should clearly instruct the model that tools are available and when to use them (more on this later). Another edge case is when multiple tools are available and the model picks the wrong one or an imperfect one; careful function descriptions can help guide the model’s choice. Overall, monitoring the model’s behavior in auto mode and refining instructions or tool definitions is important to get the desired outcomes.

In essence, "auto" mode empowers the LLM to be smart about tool use. It’s the recommended default when you want the model to use tools only on an as-needed basis. It keeps interactions efficient (no unnecessary calls) while still leveraging tools to improve accuracy and capability when required.

"required": Tool Use is Mandatory

Sometimes, you may want to ensure the model always uses a tool for certain queries or in a certain application. This is where tool_choice="required" comes in. In this mode, the LLM is forced to call a function as part of its answer every time, even if the query might seemingly be answerable without one. Essentially, “required” means at least one tool call is guaranteed in the model’s response.

Figure 3: tool_choice="required" – The model is forced to invoke a tool in its answer. The flow always begins with a function call to one of the available tools (chosen by the model), then the tool result is used to produce the final answer.

Here’s what happens under the hood with required:

  • Behavior: The LLM will always invoke a tool first, no matter what the user asks. It does not have the option to answer directly. Even if the question is something simple like “Who wrote 1984?”, if tool_choice is required and you’ve provided, say, a wiki_search function, the model will call that function to look up the answer (rather than relying on its internal knowledge). This setting was introduced to force a function call in cases where you want external verification or data retrieval for every query. Under required the model will determine which function is relevant and call it, even if none is a perfect fit.
  • Tool selection: The model still has to pick which tool to call (unless you only gave it one tool). With required, it will choose the function that best matches the user’s request. If multiple tools are available, the model’s first response will contain a call to one of them. Importantly, because it must call something, the model might sometimes call a tool with somewhat tenuous relevance if the prompt doesn’t obviously match any provided function – this is a potential pitfall we’ll discuss shortly.
  • LLM call structure: Two steps are always needed. The first LLM response will be a function call (never a direct answer), and a second call to the LLM (after executing the tool) will produce the final answer. In other words, every query triggers a tool-followed-by-answer flow. The model will not output a final answer until after at least one tool has been called and its result fed back.
  • Use case: Use required when you always want to ground the model’s answer in an external process. A real-world example is a chatbot connected to a company database: you might require it to use a search_knowledge_base function for every user question, to minimize any chance of hallucinating an answer. Even if the question seems simple, the bot will fetch a corroborating answer from the database via the tool. Another example is a math or calculation assistant: you might force any arithmetic to go through a calculator API, ensuring the model doesn’t do math in its head (which can sometimes be wrong) but instead always computes via the trusted tool. Essentially, required is common in high-accuracy or compliance-focused applications, where you prefer the model to not rely on its own knowledge at all, but to always fetch or compute using a tool.
  • Pitfalls: One downside is that if a user’s request truly doesn’t need a tool, the model might still awkwardly use one. For instance, if asked “What is the capital of France?” and you have a wiki_search tool, a required-setting model will dutifully call wiki_search("capital of France") rather than just answering "Paris" from memory. This can introduce slight latency (the tool call is an extra step) and might seem inefficient. Worse, if none of the tools matches the query well, the model could make a random or irrelevant function call just to satisfy the requirement. (For example, I have observed that with required mode the model might call an unrelated function with empty or guess parameters if it has no better option – simply because it isn’t allowed to respond without using a tool.) To mitigate this, you should ensure the set of tools you provide covers the expected queries well, and instruct the model properly. In practice, required is best used when you have a specific function that really should be used for every query or you trust the model’s judgment that one of the available tools will improve the answer.

In summary, required guarantees an extra level of external grounding at the cost of flexibility. It’s a powerful setting for certain applications, but use it with care: make sure it fits your use case and that your tools (and prompts) are prepared for the model to call a function every time.

Forced Specific Function: Targeting One Tool

The fourth configuration is when you want the model to use a particular tool and no other. This isn’t a single keyword like the others; instead, you specify the exact function name in the tool_choice parameter (often via a JSON object). We’ll call this forced specific function mode. Essentially, it’s saying: “No matter what the user asks, have the model call this one specific function as part of its answer.”

In practice, using this mode might look like setting tool_choice to a value such as:

{ "type": "function", "name": "my_function" }

This would force the model to call my_function (assuming my_function is one of the tools you defined). The model will not consider any other tools, nor will it consider skipping the tool – it must use that one.

Figure 4: Forced Specific Tool – The LLM is constrained to call a particular function (Tool X in this diagram). The user prompt triggers a call to the specified tool, then the tool result is used by the LLM to generate the final answer.

Key points about forcing a specific function:

  • Behavior: The LLM will always call the specified tool in response to the prompt. It behaves similarly to the "required" mode in that a function call is guaranteed, but here it’s not just “any tool” – it’s a particular one that you’ve dictated. If the user’s question is not actually relevant to that tool, the model might still attempt to use it in some way (since it has no choice).
  • Tool selection: There is no real selection by the model – the selection has been made for it. The model’s job is to figure out what arguments to pass to the forced function so that it can help answer the query. For example, if you forced the get_current_weather function, and the user asks, “What’s the weather in New York tomorrow?”, the model will call get_current_weather with the location set to "New York" (and perhaps date = tomorrow). If the user instead asked, “Who won the 2010 World Cup?”, the model still must use get_current_weather – which is obviously not ideal. (It might then produce a useless call like get_current_weather("2010 World Cup") if it’s forced; so, forcing a specific tool is only sensible in the right context.)
  • LLM call structure: As with any scenario where a tool is used, this will be a two-step process. The first model output will be the function call (to the specific tool), and after executing it and getting results, the second model output will be the final answer that hopefully addresses the user’s query. There is no direct answer in the first step.
  • Use case: Forcing a specific function is useful in very controlled workflows. One use case is when your application has already determined the exact tool needed based on external logic or the conversation context, and you want the LLM to just execute that tool. For instance, imagine an AI that assists with travel booking: if the user clicks a "Find Hotels" button, your system might set tool_choice to the search_hotels function so that the next model response definitely calls that function (perhaps with details filled from the conversation). Another scenario is testing or debugging – if you’re developing a new function and want to ensure the model can call it correctly, you might force the model to use it during trials. It’s also used in multi-step agent systems where an outside planner decides “Now the LLM should invoke Tool X” and thus you force that action. Product managers might use this mode to keep a conversation on a specific track (e.g., always log user info via a logging function at a certain turn).
  • Pitfalls: The major risk is mismatch between the forced tool and the user’s actual need. If the user’s request doesn’t logically require that specific function, the model will still try to use it, leading to irrelevant or failed calls. For example, forcing a calculator function for every query means the model will call the calculator even when asked “Tell me a joke” – obviously not useful. Thus, you should only use the specific tool mode when you’re confident the query should go through that tool. Often this is managed by your application logic (like the UI example where the user explicitly triggered a certain action). Also, as with required, there’s a performance aspect: you always incur the overhead of that function call even if it wasn’t necessary. Finally, ensure you format the tool_choice parameter correctly when forcing a function (the JSON format must match the API’s specification); a mistake in the name or format could result in an error or the model not doing what you expect.

In short, forcing a specific tool gives you tight control – the LLM becomes a driver for one particular function. It’s powerful for integrating the LLM as a controller in a larger system where you already know what action is needed. Just use this mode sparingly and in the right contexts, because it removes the model’s ability to adapt to unforeseen queries.

The Importance of Instructions and Prompt Design

No matter which tool_choice mode you use, one thing remains crucial: how you prompt the model and instruct it about the tools. The system message (or initial instruction) and the tool descriptions you provide have a huge impact on whether the model uses tools effectively. Even the smartest LLM won’t use a tool correctly if it doesn’t understand when to use it or how to use it.

Here are some best practices for prompt design and instructions in a tools-enabled setup:

  • Clearly explain the tool(s) in the system message: Don’t assume the model will figure it out from the function signature alone. A good system prompt might say: “You are a helpful assistant. You have access to the following tools: Weather API (use it to get current weather), Calculator (use it for math), and WikiBrowser (use it to look up facts). Use these tools when necessary to answer the user’s questions. Only call a tool if it’s needed, otherwise answer directly.” This kind of guidance frames the model’s decision-making. In fact, official guidance suggests including a sentence like “Only use the functions you have been provided with” in the system message to prevent the model from ignoring the tools. Also, if a tool has a specific usage trigger (e.g., search_hotels when user asks about hotels), mention that: for example, “When the user asks to find a hotel, call the search_hotels function.” Such explicit cues greatly increase the chance the model will invoke the tool appropriately.
  • Provide descriptive function metadata: The tools definitions (function name, description, parameters) are part of the prompt the model sees. Write the function descriptions in natural, clear language that tells the model what the tool does and when to use it. For example, a function for weather might have a description: “Gets the current weather. Use this when the user asks about weather or temperature.” The better the description aligns with user intents, the more likely the model will pick the right tool at the right time. Remember that these descriptions consume tokens in the prompt, so be concise but informative.
  • Handle ambiguous inputs with instructions: Sometimes the user’s query might be vague about what they want. A good system prompt can instruct the model to clarify or choose a tool wisely. For example: “If the user’s request is ambiguous, ask a clarifying question before choosing a tool or answering. Don’t make assumptions about missing information.” This prevents the model from either calling a tool with incomplete data or from guessing an answer.
  • Reinforce the mode if needed: If you’re using tool_choice="required", you might add a line in the instructions explicitly saying “Always use at least one of the available tools to find the answer before responding.” Conversely, if tool_choice="none", you might state “Do not call any functions in your response; just answer directly.” Often the API will enforce it anyway, but the model’s behavior is better when the instructions and the tool_choice setting align. In some cases, if a model is not following the desired pattern (e.g., not calling a tool in auto mode when it should), adding a firmer instruction in the system message can help correct that.

In summary, system instructions and prompt design are what enable the model to use tools effectively. The tool_choice setting gives the high-level rule, but the fine-grained decision of when exactly to invoke a tool and how to use the result comes from how the model has been prompted and the information it has about the tools. Taking the time to craft a good system message and clear tool descriptions is essential—this can make the difference between a model that ignores a perfectly good tool and one that uses it to produce a far more accurate answer.

Parallel Tool Calls and tool_choice

So far, we’ve considered scenarios where the model either doesn’t call a tool, or calls one tool at a time. However, some advanced models and frameworks support parallel tool calls – meaning the LLM can request multiple tools in one go. This feature is controlled by a separate flag (often called parallel_tool_calls) in systems that support it. It’s worth understanding how parallel_tool_calls interacts with the tool_choice modes, as it can affect the execution flow and performance of your application.

First, what does parallel tool calling look like? Imagine a user asks a complex question that could benefit from using two tools. For example: “Compare the current weather in New York and London, and tell me who was the president of the US when the colder city’s temperature was last freezing.” This contrived query might require both a weather API and a historical knowledge lookup. A model with parallel call capability might do something like: call the weather API for New York and London simultaneously, and also call a history tool if needed – all in one response – then wait for all results to return before formulating the final answer.

Figure 5: Parallel tool calls – An example conceptual flow where the LLM requests multiple tools at once. In this illustration, the LLM sends out two function calls in a single step (to Tool A and Tool B concurrently), receives both results, and then produces one final answer using both pieces of information. Parallel calls can save time when multiple independent queries are needed from different tools.

Now, how does this relate to tool_choice?

  • Auto mode with parallel: In auto mode, if parallel calls are allowed and the model is capable, it may decide to call more than one tool at the same time to answer a query. The default behavior for many models (like Claude or some open-source models) is actually to allow multiple tool calls if needed, unless constrained otherwise. For example, given a multi-part question, an LLM could output a response that includes two function call requests in a single turn. The developer would then execute both and feed the results back. With parallel_tool_calls=True, these can be done concurrently, reducing total latency. The key point is: tool_choice="auto" doesn’t limit the number of tools – it just lets the model decide freely. If the model thinks two tools are both needed (and it’s been trained or instructed on parallel usage), it can use two. Without parallel support, the model might instead call one tool first, get the result, then in a second turn call the second tool (sequentially).
  • Required mode with parallel: required simply mandates at least one tool, but doesn’t forbid more. So with parallel calls enabled, a required model might also call multiple tools in one go if that makes sense. The model will at least call one, but it could decide to call two or three concurrently if the question warrants it. For example, if asked to retrieve two different pieces of information that require different tools, a parallel-capable LLM in required mode might just fire off both tool calls together (since it knows it must use tools anyway). This would still satisfy the “at least one” requirement – in fact it would be “at least one and possibly several.” If parallel calls are disabled, a required model would have to do it in steps (one tool first, then the next) if multiple functions are needed.
  • Forced specific with parallel: If you force a specific tool, parallel calls are basically moot. The model is only allowed to call that one tool, so it won’t be calling multiple different functions in one response. It could theoretically call the same function multiple times in parallel, but in practical terms most implementations wouldn’t do that in one step (and most use cases wouldn’t require simultaneous duplicate calls). So, tool_choice forcing a single tool essentially means only that tool is used – parallel execution doesn’t come into play unless, say, the model were to call that function for two different inputs at once (an unlikely scenario without specific prompting).
  • None mode with parallel: If tools are disabled (none), then parallel setting has no effect at all – no tools will be called, period.

Why use parallel calls? The benefit is mainly efficiency. If a single query requires fetching from multiple sources, doing it in parallel cuts down waiting time, and you still only need two LLM calls in total (one to get all tool requests, one to get the answer after providing results). In contrast, with strictly sequential calls, you might end up doing more LLM turns. For instance, without parallel: the model calls Tool A (1st LLM output), then you call the model again with A’s result, then it calls Tool B (2nd LLM output), then you call model again with B’s result for final answer – that’s three LLM outputs needed. With parallel, the model could have called A and B together in the 1st output, and you’d only need one more call for the final answer, saving a step.

When combining with tool_choice: The interplay can be summarized as: tool_choice governs if/which tools at a high level, while parallel_tool_calls governs how many at once. A sensible combination might be:

  • Use auto + parallel_tool_calls=True for a smart agent that can grab multiple infos simultaneously when needed.
  • Use required + parallel_tool_calls=True in a scenario where the model must always consult tools, and if multiple checks are needed it can do all in one go.
  • Use none or forced single-tool with parallel_tool_calls=False (or True, doesn’t matter) when you either don’t need tools or only want one tool ever.

Finally, keep in mind that parallel calls add complexity in handling responses. You, as the developer, will receive multiple tool outputs that you need to manage and then feed back to the model in a coherent way. Ensure your application aggregates or formats these results properly before the model’s final answer step. And test thoroughly: parallel execution might expose race conditions or combined output formatting issues that wouldn’t appear in one-at-a-time calls.

Common Pitfalls and Tips

To wrap up, let’s highlight some common pitfalls developers face with tool-using LLMs, and how to avoid them:

  • Model doesn’t call the tool when it should: This often happens in auto mode if the model isn’t sufficiently guided. If you see the model answering from memory when you expected it to use a tool, revisit your prompt and tool descriptions. Make sure the system message explicitly authorizes tool use and that the tool’s purpose is clear. Models might also avoid calling tools if the user query is borderline – consider giving gentle reminders like “If the question involves math or current events, use the appropriate tool.” You can also experiment with adding a few-shot example where the assistant does use the tool.
  • Model calls a tool when it shouldn’t: The opposite can occur too. Perhaps the model is overzealous and calls a calculator for a trivial math that it actually could do, or calls a search API for a question it actually knows. This might be due to an overly aggressive instruction or just the model playing it safe. You can tweak instructions to clarify when not to use tools (e.g., “Only use the tool if it’s truly needed – if you already know the answer or it’s simple, answer directly.”). Additionally, ensure you’re using the latest model versions; newer fine-tunings often balance tool use better.
  • Irrelevant or malformed tool calls under required: As discussed, a required setting can lead to the model making a best-guess function call even if it’s not a great fit. If you notice nonsense calls, you might need to adjust your set of tools or your prompt. One strategy is to include a general-purpose search or knowledge tool as a “catch-all” so the model always has at least one tool that could apply to any question. That way, when forced to choose, it chooses the search tool rather than something nonsensical. Also, improve the tool descriptions to widen their apparent applicability if needed.
  • Forcing the wrong tool: If you accidentally use the wrong function name in a forced specific call, the API might throw an error or the model will simply not comply (since it can’t find that name). Double-check spelling and formatting for forced tool calls. Additionally, ensure that the forced tool can actually handle the user’s query; otherwise, you’ll get poor results. It may be better in some cases to use required (any tool) rather than forcing a specific one, if you’re not 100% sure which tool is needed.
  • Token cost and latency: Enabling tools adds overhead. All those function definitions live in the prompt (costing tokens), and calling a tool means an extra round-trip. If you use required indiscriminately, you’ll incur that extra cost even when it wasn’t necessary. Be mindful of the user experience: if every question causes a tool call, responses will generally be slower. Using auto where appropriate can help maintain speed by skipping tools when they aren’t needed. Conversely, if you care mostly about correctness (say, an enterprise setting where accuracy trumps speed), then the extra latency of required/tool calls is acceptable.
  • Parallel call complexity: If you venture into parallel calls, watch out for the complexity it adds in combining results. Make sure the model knows how to handle receiving multiple tool outputs at once. Often, the model will get a list of results in a single input. You may need to format that input clearly, like: “Result from Tool A: ... Result from Tool B: ...” so the model can distinguish them. Some developers choose to have the model call a single aggregator function that then calls others, to keep the logic simpler. Parallel tools are powerful but add debugging overhead – implement them only if you truly need the performance boost for multi-tool queries.