Most enterprise knowledge lives in databases, not documents. Sales figures, inventory counts, customer records, and financial data are stored in relational databases and spreadsheets. Text-to-SQL enables users to query these structured data sources using natural language, bridging the gap between business questions and database queries. Combined with traditional document-based RAG, this creates hybrid retrieval systems that can answer questions requiring both factual context and precise numerical data.
1. LLMs and Tabular Data
Tabular data presents unique challenges for LLMs. Unlike free-form text, tables have strict row-column structure, typed values, and relational constraints (foreign keys, uniqueness). LLMs can process small tables directly in their context window, but larger datasets require the model to generate executable queries (SQL, pandas) that retrieve the relevant information.
1.1 Direct Table Reasoning
For small tables (fewer than 50 rows), LLMs can reason over tabular data presented directly in the prompt. The key is formatting: presenting tables as Markdown or structured text with clear column headers helps the model understand the schema. However, this approach fails for large datasets because of context window limits and because LLM arithmetic on many rows is unreliable.
import pandas as pd from openai import OpenAI client = OpenAI() def ask_about_table(df: pd.DataFrame, question: str): """Ask a question about a small DataFrame directly.""" # Convert to markdown for clear formatting table_md = df.to_markdown(index=False) response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "system", "content": "You are a data analyst. Answer questions about " "the provided table. Show your reasoning." }, { "role": "user", "content": f"Table:\n{table_md}\n\nQuestion: {question}" }] ) return response.choices[0].message.content
2. Text-to-SQL: Architecture
Text-to-SQL systems convert natural language questions into SQL queries that can be executed against a database. The core challenge is mapping the user's informal language to the precise syntax of SQL while correctly identifying which tables, columns, and joins are needed. Modern LLM-based text-to-SQL pipelines achieve over 85% accuracy on standard benchmarks by combining schema linking, few-shot examples, and error correction.
2.1 Schema Linking
Schema linking is the process of identifying which database tables and columns are relevant to the
user's question. This is often the hardest step because users rarely use exact column names. A user
asking about "revenue" might mean the total_amount column in the orders
table, or the annual_revenue column in the companies table.
def get_schema_context(db_connection, relevant_tables=None): """Extract schema information for the LLM prompt.""" schema_parts = [] # Get all tables or filter to relevant ones tables = relevant_tables or get_all_tables(db_connection) for table in tables: columns = get_columns(db_connection, table) col_info = [] for col in columns: col_str = f" {col['name']} {col['type']}" if col.get("primary_key"): col_str += " PRIMARY KEY" if col.get("foreign_key"): col_str += f" REFERENCES {col['foreign_key']}" col_info.append(col_str) # Include sample values for disambiguation samples = get_sample_values(db_connection, table, limit=3) schema_parts.append( f"CREATE TABLE {table} (\n" + ",\n".join(col_info) + f"\n);\n-- Sample rows: {samples}" ) return "\n\n".join(schema_parts)
2.2 SQL Generation with Few-Shot Examples
def text_to_sql(question, schema_context, examples=None): """Generate SQL from natural language question.""" few_shot = "" if examples: few_shot = "\n\nExamples:\n" + "\n".join([ f"Q: {ex['question']}\nSQL: {ex['sql']}" for ex in examples ]) response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "system", "content": f"""You are a SQL expert. Generate a SQL query to answer the user's question based on this schema: {schema_context} {few_shot} Rules: - Use only tables and columns from the schema - Return ONLY the SQL query, no explanation - Use standard SQL compatible with PostgreSQL - Add LIMIT 100 to prevent unbounded results - Handle NULLs appropriately""" }, { "role": "user", "content": question }], temperature=0.0 ) sql = response.choices[0].message.content.strip() # Remove markdown code fences if present sql = sql.replace("```sql", "").replace("```", "").strip() return sql
The choice of few-shot examples dramatically affects text-to-SQL accuracy. The most effective strategy is to embed the user's question and retrieve the most similar examples from a curated example bank using vector similarity. This "dynamic few-shot" approach ensures that the examples shown to the LLM are maximally relevant to the current question, improving accuracy by 10 to 20% compared to fixed examples on complex queries.
3. Error Correction and Self-Healing Queries
Even the best LLMs generate incorrect SQL queries. Common errors include referencing nonexistent columns, incorrect join conditions, missing GROUP BY clauses, and type mismatches. A robust text-to-SQL pipeline includes an error correction loop that catches execution failures, sends the error message back to the LLM along with the original query, and asks for a corrected version.
import sqlite3 def execute_with_retry(question, schema_context, db_path, max_retries=3): """Execute SQL with automatic error correction.""" conn = sqlite3.connect(db_path) error_history = [] for attempt in range(max_retries): # Generate SQL (include errors from prior attempts) if error_history: error_context = "\n".join([ f"Attempt {e['attempt']}: {e['sql']}\n" f"Error: {e['error']}" for e in error_history ]) question_with_context = ( f"{question}\n\nPrevious failed attempts:\n" f"{error_context}\n\nFix the errors." ) else: question_with_context = question sql = text_to_sql(question_with_context, schema_context) try: cursor = conn.execute(sql) columns = [desc[0] for desc in cursor.description] rows = cursor.fetchall() return { "sql": sql, "columns": columns, "rows": rows, "attempts": attempt + 1 } except Exception as e: error_history.append({ "attempt": attempt + 1, "sql": sql, "error": str(e) }) return {"error": "Failed after max retries", "history": error_history}
Allowing an LLM to generate and execute SQL queries introduces serious security risks. Always enforce these safeguards: (1) use a read-only database connection (no INSERT, UPDATE, DELETE, or DROP), (2) set query timeouts to prevent long-running queries, (3) limit result set sizes, (4) validate generated SQL against an allowlist of permitted operations before execution, and (5) run queries against a replica or snapshot, never the production database directly.
4. Text-to-SQL Benchmarks
Standard benchmarks measure text-to-SQL system performance across databases of varying complexity. The two most widely used benchmarks are Spider and BIRD, which test different aspects of the text-to-SQL challenge.
| Benchmark | Databases | Questions | Key Challenge | SOTA Accuracy |
|---|---|---|---|---|
| Spider | 200 databases | 10,181 | Cross-database generalization | ~87% (execution) |
| BIRD | 95 databases | 12,751 | Real-world complexity, external knowledge | ~72% (execution) |
| WikiSQL | 26,521 tables | 80,654 | Single-table, simpler queries | ~93% (execution) |
| SParC | 200 databases | 4,298 | Multi-turn conversational SQL | ~70% (execution) |
5. CSV and Spreadsheet Processing
Not all structured data lives in databases. CSV files, Excel spreadsheets, and Google Sheets are ubiquitous in business environments. LLMs can process these by either loading them into an in-memory database (SQLite) for SQL queries or generating pandas code for direct manipulation.
import pandas as pd import sqlite3 def csv_to_queryable(csv_path, table_name="data"): """Load a CSV into SQLite for text-to-SQL querying.""" df = pd.read_csv(csv_path) # Create in-memory database conn = sqlite3.connect(":memory:") df.to_sql(table_name, conn, index=False) # Generate schema description schema = f"Table: {table_name}\nColumns:\n" for col in df.columns: dtype = str(df[col].dtype) sample = df[col].dropna().head(3).tolist() schema += f" {col} ({dtype}) -- e.g., {sample}\n" return conn, schema # Usage conn, schema = csv_to_queryable("sales_data.csv", "sales") result = execute_with_retry( "What are the top 5 products by total revenue this quarter?", schema, conn )
6. Hybrid Structured/Unstructured Retrieval
Many real-world questions require combining structured data (from databases) with unstructured context (from documents). For example, "Why did Q3 revenue decline for our enterprise segment?" requires both the numerical revenue data from a database and contextual explanations from quarterly reports, meeting notes, or market analyses.
The most common pattern for hybrid retrieval is to route the question to both a text-to-SQL pipeline and a document RAG pipeline in parallel, then combine both result sets in the LLM prompt. The structured data provides the "what" (exact numbers, trends, comparisons), while the unstructured data provides the "why" (explanations, context, causal factors). This combination produces answers that are both precise and insightful.
7. Multi-Table Joins and Complex Queries
The hardest text-to-SQL queries involve multiple tables with complex join conditions, nested subqueries, window functions, and conditional aggregations. These queries require the LLM to understand the database's relational structure: which tables connect to which, through what foreign keys, and with what cardinality.
Three techniques significantly improve accuracy on complex multi-table queries: (1) Schema descriptions: annotating each column with a human-readable description of what it contains and how it relates to business concepts. (2) Join hints: explicitly listing common join patterns (e.g., "orders.customer_id joins to customers.id"). (3) Chain-of-thought SQL: asking the LLM to first write out its reasoning about which tables and joins are needed before generating the SQL, which reduces errors on queries requiring 3+ table joins.
Section 19.5 Quiz
Show Answer
Show Answer
Show Answer
Show Answer
Show Answer
Key Takeaways
- Text-to-SQL unlocks database-backed RAG: By converting natural language to SQL, LLMs can query relational databases for precise numerical answers that vector search cannot provide.
- Schema linking is the critical bottleneck: Correctly mapping user terms to database columns determines success; enhance it with column descriptions, sample values, and dynamic few-shot examples.
- Error correction loops are essential: Automatic retry with error feedback catches most SQL generation mistakes, improving reliability from roughly 70% to over 85% on complex queries.
- Security is non-negotiable: Read-only connections, query timeouts, result limits, SQL validation, and database replicas are mandatory safeguards for any production text-to-SQL system.
- Hybrid retrieval combines precision and context: The most powerful answers come from combining structured data (exact numbers) with unstructured context (explanations), retrieved in parallel and synthesized by the LLM.