At ThoughtSpot, we built the most accurate Natural Language Search solution for real-world enterprise use cases in the industry. Our application did a great job of returning accurate answers for fully-formed questions. Next, we wanted to improve the experience for business users who are not well-versed with the underlying dataset. Their queries looked simple but were often vague, complex and incomplete. Additionally, they had non-data queries like "save the answer to a liveboard". To support such questions, the system requires extensibility, a deep technical understanding and ability to ask clarifying questions. We hit a wall when trying to push user experience and accuracy further and realized the limits of what we can do with a single LLM query paradigm. Hence, we decided to transform our architecture and adopt an agentic approach.
LangGraph is a powerful framework for building agentic applications with large language models (LLMs). We use the LangGraph library to power our conversational agents. It makes the user experience more natural, intelligent and accurate with an infrastructure that is easy to extend and maintain. The library is invaluable, but we encountered several challenges along the way.
In this post, we’ll share some best practices, tips, and key considerations for developing production-ready conversational agentic systems with LangGraph. This article assumes that you are familiar with LangGraph concepts and have explored the library. We hope these insights will help you refine your agentic AI systems.
What is LangGraph?
LangGraph is a Python library specifically designed for building agentic systems. It has gained popularity for its granular controls, flexible workflow design, support for streaming, and its built-in persistence layer.
What is an agentic AI system?
Agentic systems are AI systems that let LLMs control application workflow. The LLM autonomously decides the steps required to accomplish a task based on the input and context.
Here are some scenarios where an LLM can control the application workflow:
decide the route between two potential paths
decide which tools to call
decide whether the generated answer is sufficient
Why Agentic AI?
Traditional LLM applications are often designed with a rigid structure where the application code dictates the flow, thereby limiting the LLM to specific tasks like summarizing or classifying text. The LLM's vast potential for reasoning and handling diverse queries is underutilized with this approach.
LLMs have powerful reasoning capabilities and can deal with a wide range of queries. However, on their own, they can only generate text. They cannot do anything that the application isn’t prepared to do. If the system expects LLM to generate SQL queries but the user types in ‘good morning’, the system breaks.
Agentic systems flip this dynamic. They place the LLM at the forefront, transforming it into the "brain" of the operation. The application code becomes the "limbs", carrying out the LLM's instructions. This shift unlocks greater flexibility and power, enabling more human-like interactions. The LLM can dynamically adapt to different situations, reason through complex problems, and generate creative solutions, while the application code provides the necessary tools and executes its commands.
Top tips and practices
Here are some tips, tricks, and best practices for building production-ready agentic AI systems with LangGraph.
1. Streaming first
LLM calls can introduce latency, especially in multi-agent workflows, where multiple agents are involved. It's essential for users to feel that the system is making progress and hasn't stalled. The Server-Sent Events (SSE) protocol is an excellent way to stream content to users, communicating each step the agent takes. This provides users with real-time feedback and helps create a sense of progress.
A streaming-first approach also calls for a fundamental change in mindset when designing systems. Instead of a clean request and response flow, you may stream events from multiple workflows that are running concurrently in the system. It is important to define a standardized interface for events. The events must have sufficient identifiers to enable the client to consolidate the streams correctly.
2. Use custom stream events
LangGraph’s event_stream API returns predefined events, but you can also dispatch custom events using the dispatch_custom_event API. Custom events allow better control over content and help determine how events are handled before being returned to the client.
3. Design for High-Availability
Keep in mind that a single conversation may be processed by different service instances. Do not assume that one server instance will handle the entire conversation. To allow your system to accommodate this, ensure that all relevant data is stored in the conversation state.
4. Benchmarking
Establish benchmarks to track improvements and prevent regressions. Follow the testing pyramid pattern:
Create global end-to-end benchmarks for overall system performance.
Develop component-level benchmarks, such as testing the accuracy of tool selection within the agent. This will help identify performance bottlenecks and optimize specific system areas.
5. Separation of checkpoint and state
LangGraph allows you to set checkpoints for the state of the conversation at different stages. While it may seem tempting to use these checkpoints to revert to an earlier point for edit or delete operations, this can introduce complexity.
A cleaner approach is to handle changes directly within the state object rather than relying on checkpoints. This keeps your application logic simple and manageable.
6. User actions on interactive elements
If your agent generates interactive elements on the UI, ensure that key user interactions, such as clicks and selections, are captured in the conversation state. This allows the agent to incorporate user actions when answering follow-up questions and ensures a coherent user experience.
7. Error scenarios
Agentic systems can automatically correct errors. Look for the scenarios where this ability can be leveraged. For example, if a tool errors out and complains that the date format is incorrect and it expects the input to be in MM-DD-YYYY format, the agent will be able to correct the date format and recover on its own.
Even if the error is not recoverable, the agent can communicate the error to the user in an easy-to-understand manner.
Creating granular error codes for different components is crucial to achieve this.
8. Build conversation review mechanisms
Invest in tools that allow your team to review past conversations/interactions.
Track user feedback, such as upvotes or downvotes, to help identify what works well and the areas that require improvement. This feedback loop is crucial for iterating and refining your agentic systems.
LangSmith offers an easy integration with LangGraph to get you off the ground quickly in analyzing LangGraph workflows.
9. Build custom checkpointers
While LangGraph provides a few ready-to-use checkpointer implementations, it is best to build your own so that you can control the schema of the underlying database and leverage it for other purposes like powering a chat history feature or a conversation review system.
Leverage the postgres checkpointer implementation as the base class for your own implementation if possible as it has a number of optimizations built in.
10. Single vs Multi-Agent
Depending on the functionality, you may have to decide whether your application should be implemented as a tool for the agent or as a sub-agent that works with the main agent.
In general, if the functionality is simple and doesn’t involve a lot of ambiguity, design a single agent. If achieving a functionality involves multiple tool calls and a complex decision-making process, you can choose to build a sub-agent.
A separate agent means you can have a purpose-specific prompt, state, and the tools crucial for completing complex tasks.
The downside of an agent could be a loss in expressibility as the coordinator agent may not be able to convey instructions that are specific to the state or tools of the sub-agent. It also increases system complexity and overall latency, and adds more points of failure. Visit this page for a detailed analysis of multi-agent systems.
11. Dealing with hallucinations
In applications, where users are directly exposed to LLM-generated text, hallucinations are always a possibility. While the risk can never be completely eliminated, we can reduce it by using prompting techniques like chain of thought prompting and few-shot examples.
If hallucinations persist, adding a verifier node in your agentic workflow to conduct self-reflection can improve the results significantly. In the verifier node, you can make another LLM call to process the generated output and assess its veracity. If the output is inaccurate, the verifier node can inform the agent about the problems. The agent can take corrective actions as needed.
However, this does come at a cost. A verifier node requires us to sacrifice streaming and incur higher latencies. Try to limit invocation of the verifier node to the query patterns that are most susceptible to hallucinations.
12. Training the agent
Crafting the right prompt for the agent node in the agentic system is extremely critical for the success of the application. Use chain of thought prompting techniques and few-shot examples to teach the agent how to respond to different query patterns and set a tone for the language it needs to use.
To improve tool selection and coordination, add few-shot examples that involve calling multiple tools to accomplish a task.
Conclusion
LangGraph offers a robust framework for powerful and production-ready agentic systems. By leveraging the insights in this article, developers can create an efficient LLM-based application with a seamless user experience.