← All Projects

Ask Your Database Anything: The Metabase Version

An AI-powered system that converts plain English questions into complete six-card Metabase dashboards in under 30 seconds, 100% SQL success rate on a 90.5-million-row production database, embeddable in any React application via the Metabase SDK.

Under 30 seconds from question to six-card dashboard
90.5M rows across 48 tables, 561 columns
100% SQL success rate across 10+ query categories
25 saved dashboards reloadable in under 5 seconds
Interactive BI platform with drill-through and scheduled reports
React 18 + TypeScript + Vite Metabase 0.56 + Embedded SDK FastAPI + Python 3.11 Claude Sonnet 4.6 Qdrant + Redis + MS SQL Server

Ask Your Database Anything: The Metabase Version#

One sentence in. Six charts out. On top of a BI platform analysts already know.

Figure 1 - Six-card Metabase dashboard generated from a natural language query, showing profit margin analysis with a sortable data table, horizontal bar ranking, revenue-vs-profit line chart, monthly profit trend, donut distribution, and pin map of customer locations across the United States

Figure 1 - The Complete Dashboard: A user typed “Top 10 customers with the highest profit margin in the last year.” Six coordinated Metabase visualizations appeared in under 30 seconds: a sortable data table ranking customers by profit margin, a horizontal bar chart of margin percentages, a revenue-vs-profit line chart, a monthly profit trend, a donut breaking down customer category mix, and a pin map of customer locations. The Metabase SDK badge in the bottom-left confirms the cards are rendered by a real BI platform. The user wrote no SQL. The analyst wrote no queries. The dashboard was built in the time it takes to read this caption.

Overview#

A natural language business intelligence system that converts plain English questions into complete six-card Metabase dashboards in under 30 seconds. The system retrieves relevant schema from a Qdrant vector database, plans six distinct visualizations with Claude Sonnet 4.6, and generates validated SQL that runs 100% success across 10+ query categories against a production-scale database of 90,544,836 rows across 48 tables and 561 columns. Dashboards are rendered by Metabase, embedded into a standalone React application through the Metabase SDK, and saved into a reusable library that reloads any dashboard in under 5 seconds with fresh data. Users still get drill-through, filters, scheduled reports, and everything else an interactive BI platform provides.

This is Part 1 of a two-part series. This article covers the version we ship to clients who want a full BI platform with natural language on top. Part 2 covers the version we ship to clients who want to drop AI dashboards into an existing product with no BI platform underneath. Same AI pipeline. Different front-of-house, matched to different client needs.

The Gap Between Questions and Answers#

Every organization with a SQL database has unasked questions. Not because the data is missing, but because the path from question to answer requires someone who knows both the schema and the query language. A sales manager wants to know which customers are most profitable. Today, that means filing a ticket, waiting for an analyst, and hoping the query returns the right answer on the first try.

The challenge is not just SQL complexity, it is schema complexity. The WideWorldImporters database contains 48 tables, 561 columns, 98 foreign keys, and 223 indexes across four schemas. Even an experienced developer needs 10 to 20 minutes to write a correct multi-table join against a database of this shape. A business user who does not know SQL has no path at all.

Before: Analyst Bottleneck. After: Self-Service in Seconds.#

BeforeAfter
2-3 day turnaround for ad-hoc data requestsSix-card dashboard in under 30 seconds
Requires SQL expertise and schema knowledgeType a question in plain English
48 tables, 561 columns, 98 foreign keys to navigateViews-first schema, semantic retrieval handles the rest
AI-generated SQL fails silently on T-SQL quirks100% SQL success rate across 10+ query categories
One-off queries, no reuse25 saved dashboards, reloadable in under 5 seconds

We eliminated the analyst bottleneck. The user types a question. The system finds the relevant tables, generates six SQL queries, validates each one against the live database, and renders a coordinated Metabase dashboard. No SQL knowledge required. No report builder opened.

System Architecture#

+---------------------------------------------------------------------------+
NATURAL LANGUAGE DASHBOARD SYSTEM (V1)
+---------------------------------------------------------------------------+
INPUT PROCESSING OUTPUT
User Question -> Schema Retrieval (Qdrant) -> Six-Card Metabase
-> Card Planner (Claude 4.6) -> Dashboard
-> SQL Generation + Validation -> Interactive BI
-> pyodbc Direct Execution -> (drill, filter,
-> Self-Correction Safety Net -> schedule)
-> Metabase SDK Embed
+---------------------------------------------------------------------------+

Figure 2 - Horizontal pipeline diagram showing a question flowing through schema retrieval, card planning, SQL generation with a self-correction safety net, direct execution, and rendering through the Metabase SDK

Figure 2 - End-to-End Pipeline: A natural language question flows through five stages. Schema retrieval identifies relevant views from 561 columns using semantic search. Claude Sonnet 4.6 plans six cards, each with a different visualization type. SQL generation produces six distinct queries. Every query is validated and tested against the live database before rendering. A three-attempt retry loop remains as a safety net for edge cases. The Metabase SDK renders the cards in a standard React application.

The pipeline starts with schema retrieval: a five-stage process that maps business terms to database views. When a user asks about “revenue,” the system knows to query _v_sales_summary. When they ask about “customers,” it pulls from _v_customers. Static keyword matching handles the common cases. Semantic search with precomputed embeddings (all-mpnet-base-v2, cosine similarity threshold of 0.65) catches the rest.

With the schema context in hand, Claude Sonnet 4.6 plans the dashboard. It selects six visualization types from a palette of twelve: table, bar, row (horizontal bar), line, area, pie, scalar, combo (dual-axis), scatter, funnel, waterfall, and geographic map. Card one is always a data table with the direct answer. The remaining five are all different types, chosen based on the question’s analytical shape.

Then, for each card, the system generates a SQL query, validates it, and executes it. The query succeeds on the first attempt. A three-attempt retry loop is wired in for edge cases, but in current testing it no longer fires: prompts and post-processing fixes have closed the gaps that used to trigger it. If a hypothetical worst case exhausted all three attempts, that card would be skipped and the dashboard would render with five cards instead of six. The user never sees a stack trace.

Getting to 100% SQL Accuracy#

Figure 3 - Three-step flow diagram showing prompt rules, post-processing fixes, and a retry safety net with a callout that the user never sees a stack trace

Figure 3 - Three Layers of SQL Reliability: Prompt rules catch the obvious patterns before generation. Post-processing fixes catch the model’s repeat mistakes before execution. A three-attempt retry loop remains as a safety net for anything the first two layers miss. In current testing, the retry layer no longer fires. The user never sees a broken card.

Here is the part most demos skip: what happens when AI-generated SQL is wrong.

Our first prototype generated SQL that looked syntactically correct but threw runtime errors. A column name spelled slightly wrong. A JOIN that produced duplicate rows. A date function that works in PostgreSQL but not in T-SQL. The SQL was valid enough to pass a linter but broken enough to crash against the actual database. A single-prompt “generate and hope” pipeline failed on roughly one query in three.

The fix was a three-layer architecture. Never trust generated SQL. Test every query against the live database before rendering. When edge cases slipped past the first layers, we captured them and taught the prompt and the post-processor to prevent them next time.

Layer 1: 14 T-SQL prompt rules. Claude receives explicit rules in its system prompt for the specifics that trip up most language models: no ORDER BY in subqueries without TOP, never mix TOP with OFFSET/FETCH, window functions must have ORDER BY in OVER(), always add WITH (NOLOCK) to every FROM and JOIN, and ten more.

Layer 2: 7 automatic post-processing fixes. The pipeline applies deterministic string transformations to handle patterns the model gets wrong repeatedly: stripping markdown fences, fixing TOP/OFFSET conflicts, injecting missing OVER() clauses, removing illegal ORDER BY from subqueries, and similar repairs. Cheap, fast, and invisible.

Layer 3: three-attempt retry safety net. If a query still fails against the live database, the error message feeds back into the next generation attempt. In current testing, this layer no longer fires. We keep it wired in anyway. A safety net you never need is cheaper than one you wish you had.

Result: 100% success across 10+ query categories, with retry no longer observed. A seven-point security validation layer runs before every execution regardless: queries must start with SELECT or WITH, forbidden keywords (INSERT, UPDATE, DELETE, DROP) are rejected, system table access is blocked, UNION injection is caught, statement chaining is prevented. All of this runs in pure Python with zero external calls.

KEY INSIGHT: Reliable AI-generated SQL is not one clever technique. It is three layers stacked so the expensive layer (the retry loop) rarely has to run. When prompts and post-processing get good enough, the retry layer becomes the safety net you hope never fires. Currently, it doesn’t.

Why Metabase, and What It Gives the User#

The dashboards in Figure 1 are rendered by Metabase, embedded into our React application through the Metabase Embedded SDK. This is a deliberate choice, not a default. For clients who want a full BI platform alongside the natural language layer, Metabase does things we would not want to rebuild.

Interactive drill-through. Click a bar in the bar chart and Metabase filters the rest of the dashboard to that customer. Click a pin on the map, same behavior. The natural language layer generates the dashboard. The BI platform makes it explorable.

Scheduled reports. A saved dashboard can be emailed to a distribution list every Monday morning. The AI generates the dashboard once. Metabase keeps it flowing to stakeholders without further intervention.

A query builder analysts already know. Non-developers can modify a generated dashboard by hand: add a filter, change a chart type, rearrange the layout. The natural language entry point gets them started. The visual query builder lets them refine without touching the backend.

Permissions and row-level security. Enterprise data access controls map onto the BI platform, not the AI layer. This matters when the same dashboard is served to users who should see different slices of the data.

The cost is footprint. Metabase requires Java 21 and ships as a 493 MB JAR with 17 database driver plugins. The SDK pins the host application to specific React and Vite versions. Four services run in production: the React app, the FastAPI backend, the Metabase server, and Redis. For a client who wants a BI platform, this is reasonable infrastructure. For a client who wants to embed AI dashboards inside an existing product, it is a lot of weight. That trade-off is what drove us to build V2, covered in the follow-up article.

Twelve Chart Types, Six Per Question#

Every dashboard tells a complete analytical story. The AI does not pick chart types at random. It analyzes the question’s intent and data shape, then selects a coordinated mix.

The system enforces that all six types are different. If the AI planner returns two bar charts, the validator automatically replaces the duplicate with an unused type. Map cards only appear when the query targets a filtered subset of customers (Top N). No more useless all-customer maps on salesperson queries.

Card one is always a data table with the direct answer. Card two is usually a ranked comparison (bar or horizontal bar). Cards three through six vary based on the question: a line chart for a trend question, a donut for a distribution question, a scatter for a correlation question, a map for a geographic question. The AI planner composes the mix. The SQL generator executes it.

Bolt-On, Not Rip-and-Replace#

Figure 4 - Diagram showing the dashboard generator connecting to any existing SQL database as a read-only consumer, with no modifications to the source database

Figure 4 - Bolt-On Architecture: The system connects to any SQL database with a read-only connection. It discovers the schema through database views, requires no modifications to existing tables, and runs as an independent service. Swap the connection string and the system adapts.

The system connects to your existing database with a read-only connection. It does not create tables, run migrations, or modify your schema. Your database does not know it exists.

All queries run through database views prefixed with _v_ (e.g., _v_customers, _v_sales_summary). This abstraction layer hides base table complexity and makes the system portable. Point it at a different database, create views that expose the data you want queryable, and the semantic search pipeline handles the rest.

The React frontend embeds the natural language query bar, save/load drawer, and help assistant alongside the Metabase dashboard container. The host application controls authentication and navigation. The embedded components handle the question-to-dashboard pipeline.

A Library of Saved Dashboards, Not One-Off Queries#

Figure 5 - Screenshot of the save/load drawer showing 25 saved dashboards with names, categories, and Load buttons, overlaid on a live dashboard

Figure 5 - Dashboard Library: The save/load drawer shows 25 saved dashboards grouped by category: customers, sales & revenue, salesperson performance, products, and geographic. Each entry shows the dashboard name and a one-click Load button. Loading a saved dashboard re-executes all six SQL queries against the current database state.

A dashboard that takes 25 seconds to generate the first time should not take 25 seconds the second time. Saved dashboards store card specifications (SQL, display type, visualization settings) without row data. Loading a saved dashboard re-executes the queries and returns fresh results. The dashboard a sales manager saved last quarter shows this quarter’s numbers when reloaded.

A three-tier Redis cache prevents redundant work. The first tier caches complete dashboards (24-hour TTL): same question, same cards, skip Claude entirely. The second tier caches individual SQL templates (24-hour TTL). The third tier caches raw query results (1-hour TTL). Cached repeat queries complete in under 5 seconds.

A regenerate button on any saved dashboard clears the cache and re-runs the entire AI pipeline. If the prompts or SQL generation logic improved since the dashboard was first created, regeneration picks up those improvements automatically. Redis is optional. The system works without it. Queries take longer, but nothing breaks. Graceful degradation, not hard failure.

Built-In Help That Knows the System#

Dashboards answer data questions. Users also have product and domain questions: “How do I filter by date range?” or “What does profit margin include?” or “How do orders flow through the system?” The application has a built-in help assistant for those.

Figure 6 - Screenshot of the help chat panel showing a user question about order flow, a structured AI-generated answer on the left, and the categorized help article index on the right

Figure 6 - Help Chat with Source Attribution: A user asked “How do orders flow through the system?” The help assistant returned a structured answer covering the core journey, key stages, dashboard tracking, and the bigger picture, grounded in four source articles shown as clickable chips beneath the response. The right-hand panel shows all 19 help articles organized by category: Company Overview, Customers & Sales, Products & Inventory, Database & Queries, and Using the Dashboard.

The help assistant uses the same semantic search architecture as the dashboard generator. It embeds the documentation corpus into a vector database, retrieves the most relevant articles for any question, and generates answers grounded exclusively in those sources. Response time: 2 to 3 seconds.

Figure 7 - Screenshot of the help article viewer showing the full "How Orders Flow Through the System" article with breadcrumb navigation back to the article list and a question input at the bottom

Figure 7 - Traceable to Source: Clicking any source chip under a help answer opens the full article in the viewer. Breadcrumb navigation shows the article’s place in the category hierarchy. The original formatting is preserved, so users can verify the AI’s answer against the actual documentation instead of trusting an assistant that cannot show its work.

The document ingestion pipeline supports HTML, Markdown, PDF, and DOCX formats. This makes the same help architecture applicable beyond dashboards: point it at a folder of legal contracts, clinical guidelines, manufacturing SOPs, or compliance documentation, and users can ask questions with sourced answers in minutes. The viewer preserves original document formatting, so PDFs render as PDFs and structured documents keep their layout.

KEY INSIGHT: A tool that answers questions but cannot explain itself is a black box users will not trust. Built-in help with source attribution turns “the AI said so” into “here is the specific documentation that supports this answer.”

Key Achievements#

MetricValue
Dashboard generation time (cache miss)Under 30 seconds end-to-end, six cards rendered by Metabase
Cached repeat generationUnder 5 seconds, Claude skipped entirely
SQL success rate100% across 10+ query categories
Database scale90,544,836 rows across 48 tables, 561 columns, 13 years of history
Visualization types12 available, 6 selected per dashboard, all guaranteed distinct
Saved dashboards in the library25 at time of writing, organized across 5 categories
Reliability layers per query7-point security check plus 14 T-SQL generation rules plus 7 post-processing fixes plus retry safety net
Help corpus19 articles, retrieved via semantic search, answers grounded in source

The Full Stack#

Frontend:

  • React 18.3 + TypeScript 5.9 + Vite 5.4
  • Tailwind CSS v4 + shadcn/ui (Radix-based primitives)
  • Metabase Embedded SDK (dashboard rendering, drill-through, filters)
  • TanStack React Query 5.9 (server state management)

Backend:

  • FastAPI + Python 3.11
  • pyodbc 5.0 (direct MS SQL Server execution, zero JDBC round-trip for the AI pipeline)
  • Redis 5.0 (3-tier caching: dashboards 24h, SQL templates 24h, results 1h)

AI:

  • Claude Sonnet 4.6 via Anthropic API (query planning, SQL generation, help answers)
  • Qdrant (semantic schema matching against precomputed view embeddings)
  • Sentence Transformers: all-mpnet-base-v2, 768-dimension embeddings

BI Platform:

  • Metabase 0.56 (dashboard host, 493 MB JAR, Java 21 runtime)
  • 17 database driver plugins bundled, SQL Server used in production

Database:

  • MS SQL Server (WideWorldImporters, 8.7 GB, 90.5M rows, 48 tables, 561 columns)

What This Changes#

The traditional path from question to dashboard takes days. File a ticket. Wait for an analyst. Review the SQL. Request changes. Wait again. The natural language path takes under 30 seconds. Rephrase and iterate in minutes.

The architecture in this article is reproducible. The judgment that shaped it is not. Every SQL post-processing fix exists because an AI model made the same mistake twice on real data. Every concept-to-view mapping was calibrated against a specific client’s business vocabulary. Every one of the 14 T-SQL rules was added after watching a generation fail in a specific way. The code is the easy part. Knowing where the walls are, before you hit them, is the part that takes projects.

KEY INSIGHT: The ROI of natural language dashboards is not measured in analyst hours saved. It is measured in questions that were never asked because the friction was too high. Every organization has hundreds of unasked questions sitting in the gap between curiosity and capability.

Closing#

Thirty seconds, not three days. Twelve visualization types, six picked intelligently per question, all rendered by a BI platform that already ships with drill-through, scheduled reports, permissions, and a visual query builder. Three layers of SQL reliability that have driven the retry safety net’s firing rate to zero. Twenty-five saved dashboards in the library, each reloadable with fresh data in under five seconds. A help assistant that answers product questions with source citations instead of hallucinations.

For clients who want a BI platform with AI on top, this is the shape. The Metabase footprint is not a bug here, it is the feature. Saved dashboards, drill-through, scheduled reports, row-level security, a query builder analysts can learn: that is what clients are paying for when they ask for a BI platform.

Some clients want the opposite. No BI platform. No Java runtime. An AI dashboard capability that drops into a product they already ship. Same pipeline underneath, different front-of-house. That is the subject of the next article.


The Series#

This is Part 1 of a 2-part series on the two versions of our natural language dashboard system:

  1. Ask Your Database Anything: The Metabase Version (this article): The full-BI-platform version, built on Metabase, for clients who want saved dashboards, drill-through, and scheduled reports alongside the natural language layer.
  2. Ask Your Database Anything: Native React Dashboards: The embedded-in-a-product version, native React rendering with ECharts, AG Grid, and Leaflet, no BI platform underneath.
← Back to Projects