Building an Open-Source Analytics Chatbot on Vercel's Chat SDK
This blog post was written with the assistance of Claude Code. Fitting, given the whole point: AI tooling turns tasks that used to be incredibly tedious into something you can ship in a fraction of the time.
What's the purpose of an analytics chatbot?
An analytics chatbot is not a novelty layer on top of your warehouse. It changes the operating model in three concrete ways:
- Faster turnaround for business users. Executives, sales leaders, and ops folks get answers in seconds instead of waiting on a queue. No ticket, no Slack ping, no "I will get back to you Thursday." The question gets answered while the meeting is still happening.
- Less randomization for the data team. Most ad hoc requests are not analysis. They are lookups: "what was revenue last quarter," "how many active accounts in Texas," "show me churn by segment." A chatbot absorbs that long tail of one-off questions, freeing the data team to work on the things that actually need a human (modeling, instrumentation, deeper analysis).
- A query audit log everyone can learn from. Every question and the SQL behind it is logged. New users can browse what others have asked, see how the chatbot answered, and either trust the result or build on it. Knowledge stops dying inside one analyst's saved-queries folder.
That is the value proposition. Now the build.
The build-vs-buy framing
Enterprise "AI analytics" tools cost $50K and up. Most of what they actually do is: take a question, feed it to an LLM with schema context, run the SQL, render the result. Once you accept that, the build-vs-buy math gets a lot more interesting.
This post walks through forking Vercel's d0-chatbot (the Chat SDK template), swapping the LLM to Claude, and wiring it into Snowflake. The result is a streaming natural language analytics chatbot you can stand up in a few hours for the cost of API calls.
Repo: github.com/joshbricel/analytics-chatbot
The Architecture
Five moving pieces, all open source:
User question
|
v
Next.js Chat UI (Vercel Chat SDK)
|
v
Claude Sonnet 4.5 (via @ai-sdk/anthropic)
|-- system prompt with full schema context
|-- queryDatabase tool registered
|
v
Snowflake (SELECT-only, 500-row cap, 30s timeout)
|
v
Results rendered as:
- inline table in chat
- sheet artifact for larger result sets The Chat SDK gives you the streaming UI, message persistence, artifact system (code, text, sheets), and model switcher for free. The only things you build are the LLM swap, the database connection, the schema context, and the tool. That is roughly 400 lines of code total.
The Build
1. Fork the Chat SDK
Clone vercel/d0-chatbot, install dependencies, and confirm it runs against the default Grok configuration. The template ships with everything you need: streaming chat, sidebar history, vote/feedback hooks, sheet artifacts, code execution. Most of your work is subtraction, not addition.
2. Swap the LLM Provider
The default uses Grok via Vercel AI Gateway, but you can swap in whatever LLM you want. I used Anthropic Claude:
// lib/ai/providers.ts
import { anthropic } from "@ai-sdk/anthropic";
import { customProvider } from "ai";
export const myProvider = customProvider({
languageModels: {
"chat-model": anthropic("claude-sonnet-4-5-20250929"),
"chat-model-reasoning": anthropic("claude-sonnet-4-5-20250929"),
"title-model": anthropic("claude-haiku-4-5-20251001"),
"artifact-model": anthropic("claude-sonnet-4-5-20250929"),
},
});
That is the entire change. Set ANTHROPIC_API_KEY in .env.local and the AI SDK handles the rest.
3. Connect to Snowflake
Use the official snowflake-sdk Node package with RSA key-pair auth. Three guardrails up front: SELECT-only validation, a 500-row cap, and a 30-second timeout.
// lib/snowflake/connection.ts
export async function executeQuery(sql: string): Promise<QueryResult> {
const trimmed = sql.trim();
if (!/^SELECT\b/i.test(trimmed) && !/^WITH\b/i.test(trimmed)) {
throw new Error("Only SELECT and WITH (CTE) queries are allowed");
}
const limited = trimmed.replace(/;?\s*$/, "") + " LIMIT 500";
const conn = createConnection();
return new Promise((resolve, reject) => {
conn.connect((err) => {
if (err) return reject(new Error(`Snowflake: ${err.message}`));
conn.execute({
sqlText: limited,
timeout: 30000,
complete: (err, stmt, rows) => {
conn.destroy(() => {});
if (err) return reject(new Error(`Query: ${err.message}`));
resolve({
columns: stmt?.getColumns().map((c) => c.getName()) ?? [],
rows: (rows as Record<string, unknown>[]) ?? [],
rowCount: rows?.length ?? 0,
});
},
});
});
});
}
Critical: add snowflake-sdk to serverExternalPackages in next.config.ts or Turbopack will try to bundle the native crypto modules and fail with cryptic "Module not found" errors:
// next.config.ts
const nextConfig: NextConfig = {
serverExternalPackages: ["snowflake-sdk"],
}; 4. Build the Schema Context
This is the secret sauce. The LLM cannot write correct SQL against tables it has never seen. lib/snowflake/schema.ts is a 120-line string injected into the system prompt: table names, columns, types, primary and foreign key relationships, and a handful of canonical example queries.
For the demo I used a commercial leasing dataset, a 179K-row star schema modeled on public GSA leasing data:
| Table | Purpose |
|---|---|
FACT_LEASES | ~179K rows. Lease details: rent, square footage, parking, space-use percentages. |
DIM_PROPERTIES | Address, city, state, lat/lon, property type, market FK. |
DIM_TENANTS | Name, industry, credit rating, size band. |
DIM_MARKETS | State, county, GSA region, region name. |
DIM_DATES | Date key (YYYYMMDD), year, month, quarter, fiscal year/quarter. |
Trade-off: the schema context costs roughly 2K tokens per request. With Sonnet 4.5 prompt caching, that is fractions of a cent. Worth every token.
5. Register the queryDatabase Tool
AI SDK tools let the LLM decide when to call out to your code. Define one with a Zod schema for the input and an async handler that runs the SQL:
// lib/ai/tools/query-database.ts
import { tool } from "ai";
import { z } from "zod";
import { executeQuery } from "@/lib/snowflake/connection";
export const queryDatabase = tool({
description:
"Execute a read-only SQL query against the Snowflake analytics database. " +
"Use this to answer any questions about leasing data, properties, " +
"tenants, markets, and dates. Always use this tool when the user " +
"asks a data question rather than guessing.",
inputSchema: z.object({
sql: z.string().describe("A SELECT or WITH (CTE) query"),
title: z.string().describe("Short title describing what the query answers"),
}),
execute: async ({ sql, title }) => {
const result = await executeQuery(sql);
return {
title, sql,
columns: result.columns,
rows: result.rows,
rowCount: result.rowCount,
truncated: result.rowCount >= 500,
};
},
});
The Chat SDK's message.tsx already knows how to render tool output. A small QueryResult component renders the structured response as an inline table with the executed SQL collapsed underneath.
6. Update the System Prompt
Three jobs for the system prompt: define the role (analytics assistant, not general chatbot), inject the schema context, and tell the model when to call queryDatabase versus when to answer from general knowledge. Keep the existing artifact instructions so spreadsheet creation still works.
7. Strip Auth for Demos
The template ships with full NextAuth (login, register, guest flows). For internal demos and client pilots, that is overhead. Replace auth() with a stub that returns a hardcoded anonymous session, gut the SessionProvider from layout.tsx, and turn the NextAuth route file into a self-contained stub. Chat history still works (Postgres does not care that the user ID is a constant). You can re-enable auth per-client later.
8. Run Postgres Locally
Chat history needs Postgres. Homebrew is fine: brew services start postgresql@16, createdb chatbot, run pnpm db:migrate, done. Total local stack: Next.js + Postgres + Snowflake credentials. No cloud services needed for development.
Cost Breakdown
| Component | Cost |
|---|---|
| Anthropic API (Sonnet 4.5) | ~$0.003 per query (schema context + response, with prompt caching) |
| Snowflake compute | Fractional warehouse seconds per query. Use an XS warehouse with auto-suspend. |
| Postgres (chat history) | $0 local, ~$5/mo on Neon free tier |
| Hosting | $0 on Vercel hobby tier for demos |
| Total demo environment | Effectively $0 plus API usage |
Compare to ThoughtSpot ($50K+ annually), Tableau Ask Data (bundled but limited), Mode AI (per-seat enterprise pricing).
Making It Production-Ready
The demo build skips production concerns on purpose. To take this to a real client engagement:
- Re-enable auth (NextAuth or SSO via the marketplace integrations)
- Deploy to Vercel with managed Postgres (Neon, Supabase)
- Add query logging and an audit trail in Postgres
- Inject row-level security context into the system prompt (e.g.
WHERE region = 'NORTHEAST') per user role - Pull table and column descriptions from your data catalog into the schema context
- Cache common queries at the tool layer
- Wire up the existing vote/feedback hooks for query quality tracking
Try It Yourself
Repo: github.com/joshbricel/analytics-chatbot
git clone https://github.com/joshbricel/analytics-chatbot
cd analytics-chatbot
pnpm install
# Configure .env.local with:
# - ANTHROPIC_API_KEY
# - POSTGRES_URL (local or hosted)
# - SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER
# - SNOWFLAKE_PRIVATE_KEY_PATH (RSA pkcs8)
# - SNOWFLAKE_WAREHOUSE, SNOWFLAKE_DATABASE, SNOWFLAKE_SCHEMA
pnpm db:migrate
pnpm dev
Replace the schema context in lib/snowflake/schema.ts with your own tables, point it at your warehouse, and it will answer questions against your data within minutes.
Key Takeaways
- The Chat SDK does the hard parts for free. Streaming UI, message persistence, artifacts, model switching. You write the LLM swap, the SQL tool, and the schema context. That is it.
- Schema context is the difference between right and wrong SQL. Structured table descriptions and example queries gets you accurate results. Skip it and you get hallucinated table names.
- Guardrails go in the tool, not the prompt. SELECT-only validation, row caps, and timeouts belong in
executeQuery. Do not trust the LLM to respect rules in the system prompt. - The build-vs-buy math has changed. Enterprise BI chatbots solve a problem that two AI SDK packages and a schema string now solve at 1% of the cost.
- That said, there is real value in an established BI platform. A vendor team that ships new features, maintains the service, hardens security, handles upgrades, and answers support tickets is doing work you would otherwise own. Rolling your own is cheap on day one and expensive on day 365. Factor total cost of ownership (TCO) into the build-vs-buy decision, not just the sticker price.
Thanks for reading,
Josh