Designing a Lean, Edge‑Native NL2SQL Pipeline: Architecture Choices

In this post, I will explore the different NL2SQL pipeline approaches through the lens of system design, tradeoffs in latency, cost, security, model quality, performance, and total cost of ownership.

The research stemmed from my need to identify the right platform to prototype a simple text-to-SQL app for personal eBird data. I’d explain the available options, and how my search converged to a Cloudflare-stack (Worker + D1 + Workers AI) that offered optimal architecture for my scenario.

Background

When I was prototyping a simple text-to-SQL app for personal eBird data, I scoped it to:

be single‑tenant
contain no PII
have simple schema and queries
be fast for interactivity
generate high quality results
be inexpensive to operate and host

My research for a platform to fulfill these needs came up with several distinct architectural families:

Edge‑native stacks (Cloudflare Worker + D1 + Workers AI)
Browser‑native stacks (DuckDB‑WASM or SQLite‑WASM + External LLM Inference Endpoints)
Warehouse‑native stacks (Snowflake Cortex and Copilot, Google BigQuery + Gemini, Redshift + AWS Bedrock, Oracle 26ai with Select AI, Databricks SQL Copilots with its lakehouse)

Each family embodies a different philosophy about where compute should live, how data should flow, and what tradeoffs matter most.

The warehouse-native platforms promise “ask your data anything,” but they’re built for enterprise‑scale warehouses, multi‑tenant governance, and multi‑TB analytics.

But what if your use case is different?
What if you’re building a personal analytics app, with a few tables, simple queries, and a desire for instant interactivity?

I was briefly toying the idea of using a unified stack for everything (e.g., Cloudflare or Oracle 26ai), against a browser-native query engine (e.g., DuckDB-WASM or SQLite-WASM) while making external LLM calls. After pitting the choices against my criteria, I decided that a Cloudflare‑native architecture fits my needs very well.

Architecture Overview

Each architecture family has a clear “sweet spot.”

A. Cloudflare Edge‑Native Stack

Everything runs in the same Cloudflare colocation, so the entire NL→SQL→DB→NL loop happens within a single edge environment. This is the closest thing to a single‑process NL2SQL system available in the cloud.

Browser
  │
  ▼
Cloudflare Worker (NL→SQL orchestration)
  │         ├──► Workers AI (inference)
  │         └──► D1 (SQLite-like DB)
  ▼
Response → Browser

Philosophy:

Colocation: Worker, data, and inference often run in the same Cloudflare colo.
intra‑edge hop execution: Minimal intra‑platform hops within a single edge environment, with no user‑visible clusters, VPCs, multi‑service orchestration layers, connection pools or control planes in the path.
Edge‑native: Low latency globally, not region‑bound.
Minimal surface area: One execution environment, one DB, one inference endpoint.

Best for applications that need:
- predictable performance
- minimal ops
- low cost
- persistent data

B. DuckDB/SQLite‑WASM Browser‑Native Stack

Everything runs in the browser except the LLM call:

Browser (Static App)
  │
  ├──► DuckDB-WASM / SQLite-WASM (local SQL engine)
  │
  └──► External LLM Inference API (LLM for NL→SQL)
        └──► GPT‑5.4, Claude 4.6, Gemini 3.1, etc.

Philosophy:
Push compute to the client. Avoid servers entirely. Treat the browser as the runtime.

Best for:
- zero‑backend prototypes
- offline analytics
- small datasets
- maximum model choice - demos and experiments

C. Industrial Warehouse‑Native NL2SQL Stacks

Most NL2SQL platforms look like this:

Browser → Control Plane → LLM Service → Query Planner → 
Compute Cluster → Browser

For example: Snowflake Cortex:

Browser → Copilot → Cortex LLM → Query Planner → Compute Cluster → Browser

For example: BigQuery + Gemini

Browser → NL2SQL Agent → Gemini → Query Planner → Execution Engine → Browser

Every hop adds:

authentication
routing
serialization
planner overhead
cluster scheduling

Philosophy:
Centralize compute. Optimize for enterprise security, multi‑TB analytics, compliance and auditability, and multi‑team workloads.

📊 Comparison Table

Let’s briefly look at how they stack up:

Rubric	Cloudflare Edge Stack (Worker + D1 + Workers AI)	DuckDB/SQLite‑WASM (Browser‑native + OpenRouter)	Industrial NL2SQL Platforms (Snowflake, BigQuery, Redshift, Oracle, Databricks)
Latency	⭐ Very Low — edge‑colocated compute, DB, and LLM; minimal hops	Medium — local SQL is fast, but NL→SQL requires browser→OpenRouter→LLM round‑trip	Medium–High — heavy planners, cluster orchestration, multi‑service hops
Cost	⭐ Lowest — no warehouse compute, cheap inference, no egress	Low–Medium — DB is free; LLM usage costs	High–Very High — compute, storage, LLM tokens, egress, cluster uptime
Security	Right‑Sized — single vendor trust boundary; no public DB endpoint	Weak–Medium — data in browser; prompts sent to OpenRouter and possibly multiple LLM vendors	⭐ Enterprise‑Grade — RBAC, masking, audit logs, VPC isolation, governance
Model Quality	High — Qwen/Llama/Mistral strong for simple schemas	⭐ Very High — access to GPT‑5.4, Claude 4.6, Gemini 3.1 via External API	⭐ Very High — same top‑tier models, tightly integrated
Performance	⭐ Interactive — predictable, persistent, edge‑scaled	High (analytics) — DuckDB is extremely fast; limited by browser memory and WASM	High (analytics) — optimized for multi‑TB scans, not interactive chat
Persistence	⭐ Built‑in — D1 is durable, consistent, globally replicated	Weak — browser storage is fragile; no multi‑user persistence	⭐ Strong — enterprise‑grade durability and replication
Scalability	⭐ High — edge scaling, predictable concurrency	Low — browser memory limits; no shared state	⭐ Very High — warehouse/lakehouse scale
Ops Overhead	⭐ Near Zero — no servers, no clusters, no IAM	Low–Medium — client‑side migrations, prompt updates, browser storage quirks	High — warehouse sizing, governance, quotas, model serving
Best For	Personal apps, edge analytics, low‑latency NL2SQL	Offline demos, tiny datasets, zero‑backend prototypes	Enterprise BI, multi‑team analytics, governed data environments

Exploring the Criteria

Latency: Where the Compute Lives Determines the Speed

Cloudflare’s edge‑native design is fundamentally different from both browser‑native and warehouse‑native systems. By colocating the Worker, D1, and Workers AI in the same Cloudflare colo, the entire NL→SQL→DB→summary loop happens with almost no network hops. Coupling with edge distribution, this stack produces sub‑100ms response times that warehouse systems simply cannot match due to their heavy query planners, multi‑tenant control planes, and cluster orchestration layers.

Latency: ⭐ Very Low

DuckDB‑WASM is deceptively fast for SQL execution because it runs locally in the browser, but NL→SQL still requires a round‑trip to an external LLM inference endpoint. That network hop becomes the dominant latency factor.

Industrial platforms are optimized for multi‑TB analytics, not interactive chat. Their multi‑tenant control planes, planners and compute cluster orchestration are powerful but heavy, making them slower for small, selective NL2SQL queries.

Latency: Medium → High

Cost: Minimal Infrastructure, Minimal Spend

Cloudflare’s pricing model aligns perfectly with small‑scale analytics: Workers, D1, and Workers AI are all inexpensive (with a generous free-tier for light workloads). There is no warehouse compute, cluster uptime, or egress. This makes Cloudflare the lowest‑cost option by a wide margin.

DuckDB‑WASM is free on the compute side, but NL→SQL requires LLM calls. For small workloads, you can use OpenRouter’s free tier (capped at 200 requests/day). For sustained usage, it becomes medium‑cost.

Industrial platforms are the most expensive because they charge for compute, storage, LLM tokens, cluster uptime and egress. They are designed for enterprise budgets, not personal apps. Even Oracle 26ai’s free‑tier DB still requires paid LLM calls for any real workload.

Security: Trust Boundaries Matter More Than Location

Cloudflare provides a single‑vendor trust boundary: your data stays inside Cloudflare’s network, and the Worker acts as the gatekeeper to D1 (with no public DB endpoint). This is appropriate for personal analytics without sensitive data.

DuckDB‑WASM stores data in browser memory , which is fine for non‑sensitive workloads. But NL→SQL prompts are sent to an external LLM inference endpoint. This creates a multi‑vendor trust boundary that is inappropriate for sensitive or regulated data.

Industrial platforms offer the strongest security posture—RBAC, row‑level security, column masking, audit logs, VPC isolation, and governance catalogs. These are essential for multi‑team environments, but unnecessary for small personal datasets.

Model Quality: More Choices vs. Fit

Using an external LLM inference endpoint gives us the best models in the world: GPT‑5, Claude 4.6, Gemini 3.1, Cohere and more. This is a major advantage if your schema is ambiguous, your queries are more complex with multi‑table joins, you have enterprise data models and complex analytics.

Cloudflare’s Workers AI catalog is smaller, but Qwen/Llama/Mistral are more than sufficient for simple schemas like our birding dataset. For our workload (a few talbes, simple columns and predictable queries), the difference in model quality is negligible.

Industrial platforms also offer top‑tier models, but they wrap them in governance and orchestration layers that add cost and latency.

Performance: Raw Speed vs.Interactivity vs. Batch Analytics

DuckDB‑WASM is the fastest SQL engine of the three for analytical scans and aggregations because it runs locally and uses vectorized execution. But it lacks persistence, concurrency, and durability. It also relies on browser memory and WASM sandbox, making it less suitable as an application database.

Cloudflare’s SQLite-compatible D1 is slower for large scans but more predictable and persistent. For NL2SQL workloads which are typically small with selective queries, D1 is the right fit.

Industrial platforms excel at large‑scale analytics such as multi‑user concurrency, multi‑TB scans, BI dashboards and ETL pipelines, but are slower for sub‑100ms interactive NL2SQL due to planner overhead.

Persistence & Scalability: Where State Lives Matters

Cloudflare provides durable, globally replicated state via D1. This makes it suitable for real applications with multiple users and long‑term data retention.

DuckDB‑WASM stores data in browser memory or IndexedDB, which is fragile, user‑specific, and easily wiped. It is not intended to be used beyond a single user or device.

Industrial platforms offer the strongest persistence and scalability, but at a high cost and operational burden.

Total Cost of Ownership: The Hidden Work Behind Each Architecture

Cloudflare’s TCO is near zero because everything is managed: no servers, clusters, VPC management, IAM policy design, warehouse tuning, patching or model hosting.

DuckDB‑WASM has no backend, but it pushes complexity into the frontend: client‑side schema migrations, prompt updates, browser storage quirks, and offline mode versioning.

Industrial platforms require the highest ongoing operational overhead—warehouse sizing, governance, quotas, model serving, tuning, monitoring, and cost management.

Also Noteworthy: Oracle 26ai

I’d like to call out that Oracle 26ai is also a respectable choice. Not only do they offer 2 instances of their Autonomous Database (26ai) for free, but you can also use 26ai’s Select AI feature as the NL2SQL engine to bridge an LLM endpoint (OpenAI-compatible providers, OCI Generative AI, Cohere, etc.). To ensure everything runs in same zone, you can use its APEX low-code development system to whip together a frontend.

In fact, I have actually done the first 2 steps (setting up all necessary security permissions in the database and configured an AI profile for LLM provider). Everything worked beautifully in SQL worksheet as I could generate SQL statements against my Oracle database using plain English. It’s just the APEX step that was consistently frustrating. Despite touting it as a no-code IDE, I found it very non-intuitive and clumsy. In my opinion, it is way easier to just code a quick frontend with plain HTML/CSS/Javascript or using existing frameworks. In the end I just gave up.

This is not to be negative towards Oracle, as I’ve used its free-tier Compute, Storage, Database and Monitoring offerings with great success. It’s just that I’ve used APEX twice to develop a frontend quickly and in both situations, the time I spent wrangling with it made me return to other real coding platforms. Since I’m looking to colocate at least the app and database, sadly Oracle 26ai didn’t fit the bill this time.

Conclusion

For engineers, the challenge is choosing the right architecture for the actual workload, not the most glamorous one.

I am pretty happy with my Cloudflare setup. It has very good DX (wrangler CLI and dashboard are snappy and informative) and performance, and is competitively priced.

More importantly, giving my current workload:

single‑tenant
no PII
simple schema
interactive latency requirements
low cost sensitivity
minimal operational overhead

The Cloudflare stack is faster, cheaper, and dramatically simpler to operate, being the right system for the right job:

Fastest latency (edge colocation, lightweight planner)
Lowest cost of any cloud provider, especially at 1k–100k queries/day (no warehouse compute, cheap inference)
Sufficient security for my risk profile (single‑tenant, no PII)
Adequate model quality (Worker AI models offer adequant accuracy for my schema)
High performance for real‑time NL2SQL chat (no cluster overhead)
Minimal operational burden (zero‑ops)

What’s Next?

I’d love to explore the DuckDB-WASM route more, as I also aim to build an eBird personal data tool for ephemeral scenarios. For example, allowing fellow birders to upload their own dataset in the browser and do natural language query or charting interactively. This kind of interaction doesn’t need persistency, and most birders don’t have a huge 1-million row dataset to start with. I think the in-browser data analytics approach should fit the bill perfectly.

Stay tuned!