← Blog overview

Building an AI Agent That Lives Inside WhatsApp – For Shopkeepers Who Never Needed a CRM

A lot of small shopkeepers in Tier 2 cities in India are not particularly tech-savvy. Modern CRMs and inventory tools are built around an assumption that does not always hold for them – that the user is comfortable navigating dashboards, menus, multi-step forms, and English-first interfaces. Most of the business owners I have spoken to are not.

But they all use one piece of software fluently: WhatsApp.

So I asked myself a simple question. What if a shopkeeper’s entire business – orders, billing, customer ledgers, payments – could live inside WhatsApp? No new app to learn. No web portal. No training session. Just message the way they already message every day.

That was the brief. This is the story of how I built it.

What I Built

At a high level, the system is simple.

I built an AI agent backed by a Postgres database. The database holds the things a small shop actually cares about – customers, orders, payments, ledgers, and inventory. On top of the database, I wrote a set of functions you would find in any REST API: create_order, get_order, get_ledger, record_payment, search_customer, and so on. Standard CRUD plus a few business operations.

The AI agent’s job is the layer between a WhatsApp message and those functions. A message in, a reply out – it handles everything in between.

The agent is connected to the WhatsApp Business API, so users talk to it the same way they message anyone else on WhatsApp. They can type in Hindi. They can type in English. They can mix the two. They can send a voice note instead of typing. Whatever feels natural – the agent handles it.

For someone who already runs their shop through WhatsApp, nothing changes about how they use their phone. The phone just got smarter.

How the Agent Is Structured – A Quick Tour

I built this on OpenAI’s Agents SDK, and the part of the SDK that made the design click was its first-class support for multi-agent handoffs.

Here is the shape of the system:

  • Customer-Facing Agent – the front door. Every WhatsApp message lands here first. This agent does not answer business questions itself. Its only job is to understand the type of query and pass it on.
  • Order Agent – handles anything related to orders, bills, payments, and ledgers. It has access to the operational tools.
  • Customer Agent – handles general queries, product questions, and customer information lookups. It can also pull up the transaction history for a specific customer – useful when a shopkeeper wants to check what a particular buyer has ordered or owes.

When the Customer-Facing Agent reads a message, it decides one thing – is this an operational request or a general one? If the user is asking to create a bill, check a payment, or pull a ledger, the message is handed off to the Order Agent. If the user is asking about a product or a customer’s history, it goes to the Customer Agent.

The receiving agent then does its own round of reasoning. It works out specifically what the user wants – create_order, get_order, get_ledger, record_payment – and calls the right function as a tool with the right parameters. The function hits Postgres, returns clean data, and the agent formats a reply in the same language the user wrote in.

The user, sitting in WhatsApp, just sees a message come back. Three agents and a database disappeared into a single conversational reply.

Building It With the OpenAI Agents SDK

The SDK I used is OpenAI’s TypeScript Agents SDK – @openai/agents. Installation is one line.

npm install @openai/agents zod

The SDK uses Zod for defining tool parameter schemas, so you install both together. Set your API key as an environment variable and you are ready.

export OPENAI_API_KEY=sk-...

That is the entire setup. The SDK is deliberately thin on ceremony.

Defining an Agent

An agent in this SDK is an object with a name, instructions, and a set of tools. The instructions define its role – what it should do and what it should not. The tools are the functions it can call during a run.

Here are the two specialist agents in this system:

import { Agent } from '@openai/agents';

const customerAgent = new Agent({
  name: 'Customer Agent',
  instructions: `You are a helpful assistant for a grocery store.
  Handle product inquiries, availability questions, and customer profile lookups.
  When a customer asks about their purchase history or outstanding balance,
  retrieve and present it clearly.
  Always be concise, friendly, and professional.`,
  tools: [searchCustomer, getLedger],
});

const orderAgent = new Agent({
  name: 'Order Agent',
  instructions: `You are an order management assistant for a grocery store.
  You handle order creation, billing, payment recording, and ledger lookups.
  Before creating an order, confirm the item, quantity, and price with the customer.
  Keep responses clear and structured. Always include the order total
  and order ID in confirmations.`,
  tools: [createOrder, getOrder, recordPayment, getLedger, searchCustomer],
});

What is an Agent?

  • A model paired with a set of instructions that act as a system prompt
  • Instructions define the agent’s role – what it should do and what it should not
  • Each agent has one focused responsibility – the Order Agent handles transactions, the Customer Agent handles lookups
  • Tools are attached to an agent at definition time – tools: […] – and the agent can call any of them during a run. More on how tools are built in the next section.

Building the Function Tools

A tool is the bridge between the agent and your database – it is what gets called when the model decides an action needs to happen.

import { tool } from '@openai/agents';
import { z } from 'zod';

const createOrder = tool({
  name: 'create_order',
  description: 'Creates a new sales order and generates an invoice for the customer. Call this tool when the user wants to record a product sale. Requires the customer name, item, quantity in kilograms, and the price per kilogram.',
  parameters: z.object({
    customer_name: z.string().describe('Full name of the customer placing the order'),
    item: z.string().describe('The product being purchased, e.g. Apples, Mangoes, Bananas'),
    quantity_kg: z.number().describe('Quantity of the item in kilograms'),
    rate_per_kg: z.number().describe('Price per kilogram in the local currency'),
  }),
  execute: async (params) => {
    // call the tool as per the need
    // create_order(params)
    // get_order(params)
    // get_ledger(params)
    // record_payment(params)
    // search_customer(params)
  },
});

Breaking down each part:

  • name – the identifier the model uses to refer to the tool internally. Keep it snake_case and descriptive – create_order not tool1.
  • description – the most important field. The model reads this to decide when to call the tool. Be specific about the use case. Vague descriptions lead to wrong calls or missed calls.
  • parameters – a structured Zod schema defining exactly what inputs the tool expects. Each field gets a type (z.string(), z.number()) and a .describe() that tells the model what value to extract or ask for. If a required parameter is missing from the user’s message, the model will ask for it before calling the tool.
  • execute – the function that runs once all parameters are available. It receives the validated inputs and returns a plain string. The model takes that string and formats it into a natural reply.

Wiring Up Handoffs

A handoff is how one agent passes a conversation to another – it is what lets you split responsibilities across agents without writing any routing logic yourself.

import { Agent } from '@openai/agents';

const customerFacingAgent = Agent.create({
  name: 'Customer Facing Agent',
  instructions: `You are the entry point for a shopkeeper's WhatsApp assistant.
  Your only job is to understand what the user is asking and hand off to the right agent.
  If the message is about an order, bill, payment, or ledger - hand off to the Order Agent.
  If it is a general query - hand off to the Customer Agent.
  Do not answer anything yourself.`,
  handoffs: [orderAgent, customerAgent],
});

Breaking down how it works:

  • handoffs: […] – listing an agent here automatically gives the parent agent the ability to invoke it as a tool. No router, no switch statement, no orchestration logic needed – the model decides which agent to hand off to based on the message.
  • Context is carried over – when a handoff triggers, the full conversation history passes to the receiving agent. It picks up exactly where the previous agent left off.
  • Agent.create() vs new Agent() – use Agent.create() when wiring handoffs. It keeps the output types aligned across agents, which matters once you add type safety to your tool return values.

Running the Whole Thing

import { run } from '@openai/agents';

let history: any[] = [];

// Turn 1 - shopkeeper sends first message
const result1 = await run(customerFacingAgent, "Create a bill for John", { history });
history = result1.history;
// history: [{ user: "Create a bill for John" }, { assistant: "Sure! Which item and how much?" }]

// Turn 2 - follow-up, agent already knows it's about John's bill
const result2 = await run(customerFacingAgent, "5kg apples at $3 per kg", { history });
history = result2.history;
// history: [...prev, { user: "5kg apples at $3 per kg" }, { assistant: "Invoice created. John: 5kg Apples @ $3/kg = $15." }]
  • run(agent, message, { history }) – this is how the agent is called. It takes the entry agent, the current user message, and the history array. The runner handles everything from here – picking the right agent, triggering handoffs, calling tools, and returning the final output.
  • history – an array of all prior messages in the conversation. It starts empty on the first call and grows with every turn. Each entry in the array is a message object – the user’s message and the agent’s reply – appended automatically by the SDK after each run.
  • result.history – the updated array returned after the run. You pass this back in on the next call so the agent knows what was already said. This is what lets a follow-up like “make it 5kg instead of 3” work – the agent can look back at the array and see what order was being discussed.
  • result.finalOutput – the plain text reply from whichever agent handled the request last.

What a Real Trace Looks Like

The SDK generates traces automatically for every run. You can track each agent call – which agent ran, how long it took, which tool was called, and what the handoff chain looked like – in the Traces section of the OpenAI platform.

A typical order creation run looks like this:

  1. Customer-Facing Agent receives: “Create a bill for John for 10 kg of apples”
  2. Agent classifies it as an order request → triggers handoff to Order Agent
  3. Order Agent receives the message. It needs a price. It does not guess – it asks: “What’s the price per kg for apples?”
  4. User replies: “$3 per kg”
  5. Order Agent now has all parameters → calls create_order with customer_name: “John”, item: “Apples”, quantity_kg: 10, rate_per_kg: 3
  6. Tool executes, writes to Postgres, returns confirmation string
  7. Agent sends back: “Invoice created. John: 10kg Apples @ $3/kg = $30. Order ID: 1042”

Three agents. One database write. One round of clarification. The user typed two messages and got a bill.

Build What Actually Matters

The same building blocks – agents, tools, handoffs, history – can power any workflow. A sales rep logging calls by voice. A clinic handling appointment queries. A logistics team tracking deliveries through chat.

You define what the agent knows and what it can do. The model handles the rest.

If there is a workflow in your world that deserves to be smarter – build it.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *