How to Get Stable Outputs from LLM APIs
Building a structured and predictable product using an LLM API is challenging because, unlike traditional APIs, LLM outputs are non-deterministic. In this post, I’ll share what I learned while developing a chatbot that manages financial assets and how I made it stable.
Let’s assume a sample chatbot that manages two assets: USD and KRW. The user can check balances, transfer funds, or exchange currencies.
Building a Prototype with LangGraph#
Using LangGraph, it’s easy to build a working chatbot prototype. With create_react_agent
, you can create an agent by simply adding a prompt and a few tools.
For example, the prompt could be:
“You are a friendly wallet chatbot that manages KRW and USD.”
Then add tools like getBalance
, transferAsset
, and exchange
. LangGraph automatically handles message history, which makes prototyping fast and simple.
Limitations of the Prototype#
However, several issues soon appeared. The chatbot occasionally failed to respond correctly to requests it had handled well before. This happened because the entire behavior depended on a single, huge prompt. For example, improving how asset balances were displayed sometimes broke the transfer feature.
Another issue was incorrect use of chat history. Suppose a user checked their balance and had no assets. After transferring some assets and asking again, the bot would still say there were no assets — because it referred to its previous response instead of calling the balance tool again.
Solutions#
To fix the first issue, I split the prompts. For a balance inquiry, I made multiple LLM API calls:
- One to detect the user’s intent.
- If the intent was “check balance,” query the database.
- Then call another LLM to format the balance output nicely.
This modular approach prevented one function from breaking others.
For the second issue, I added control over history usage. When detecting user intent or extracting parameters, history helps — users often give instructions across multiple messages. But when generating final responses, history should be excluded. Otherwise, the LLM might reuse outdated information. For example, when formatting a balance result, I only provided the tool output as input — not any past messages.
With these changes, the chatbot became much more reliable. During this process, I switched from LangGraph to OpenAI’s official library for more control. LangGraph can still achieve the same result, but after comparing pros and cons, I decided to go with the official API directly. I’ll cover that comparison in the next post.