SIGMOD 2026 · Software-Engineering-Inspired Text-to-SQL
Highlights · Results · Repository Tour · Installation · Quick Start · Reproducibility · Evaluation · Citation
|
DeepEye-SQL treats Text-to-SQL as a software engineering process rather than a single-shot generation task. Instead of asking one LLM call to solve everything, it decomposes the problem into grounding, reasoning, implementation, debugging, and final selection. The repository contains the full research pipeline used in our SIGMOD 2026 submission, with support for Spider, BIRD, and Spider2, including cloud-backed databases in Spider2. |
|
| 🧠 | Software-engineering pipeline instead of monolithic prompt-only generation. |
| 🧪 | Checker-based SQL revision for syntax, execution, and result-level repair. |
| 🎯 | Execution-aware selection over multiple SQL candidates. |
| 🌐 | Cross-benchmark support for Spider, BIRD, and Spider2. |
| 💾 | Structured snapshot workflow for checkpointing, resume, conversion, and evaluation. |
| 📏 | Built-in execution benchmark tooling for profiling SQL hotspots. |
DeepEye-SQL achieves strong performance with off-the-shelf LLMs and no task-specific fine-tuning.
| Benchmark | Metric | Score | Model | Public Outputs |
|---|---|---|---|---|
| BIRD-Dev | EX | 73.5 | Qwen3-Coder-30B-A3B | results/bird-dev/qwen3-coder-30b-a3b.json |
| BIRD-Test | EX | 75.1 | Qwen3-Coder-30B-A3B | not released |
| Spider-Test | EX | 89.8 | Qwen3-Coder-30B-A3B | results/spider-test/qwen3-coder-30b-a3b.json |
| Spider2-Lite | official score | 38.2 | DeepSeek-R1 | results/spider2-lite/deepseek-r1 |
| Spider2-Snow | official score | 50.5 | DeepSeek-R1 | results/spider2-snow/deepseek-r1 |
Additional released artifacts
- BIRD few-shot seeds: results/bird_dev_few_shots.json
- Spider few-shot seeds: results/spider_test_few_shots.json
- Alternative public predictions: results/bird-dev/gemma3-27b.json, results/bird-dev/qwen2.5-coder-32b.json, results/spider-test/gemma3-27b.json, results/spider-test/qwen2.5-coder-32b.json
Natural Language Question
|
v
1. Value Retrieval
Ground relevant values from the database / vector index
|
v
2. Schema Linking
Merge direct linking, reversed linking, and value linking
|
v
3. SQL Generation
Produce diverse SQL candidates via multiple generators
|
v
4. SQL Revision
Repair candidates using checker-style debugging passes
|
v
5. SQL Selection
Execute, compare, and select the final SQL
- A single generation pass is brittle on complex enterprise schemas.
- Different failure modes need different tools: grounding, linking, repair, and selection are not the same problem.
- Execution feedback is too valuable to reserve only for final evaluation.
DeepEye-SQL
├── app/
│ ├── config/ # lazy config loading and typed settings
│ ├── dataset/ # Spider / BIRD / Spider2 datasets + structured snapshots
│ ├── db_utils/ # SQL execution, schema loading, cloud adapters
│ ├── llm/ # OpenAI-compatible LLM wrapper
│ ├── pipeline/ # five-stage Text-to-SQL pipeline
│ ├── services/ # schema service, execution service, artifact store
│ ├── prompt/ # prompt templates
│ └── vector_db/ # vector index creation for value retrieval
├── config/ # example experiment configs
├── runner/ # reproducible entry scripts
├── results/ # released predictions and few-shot seeds
├── script/ # helper shell scripts
└── workspace/ # generated snapshots and intermediate outputs
- script/run_pipeline.sh: full pipeline automation
- runner/preprocess_dataset.py: build initial dataset snapshot
- runner/create_vector_db_parallel.py: create value-retrieval vector indices
- runner/run_value_retrieval.py
- runner/run_schema_linking.py
- runner/run_sql_generation.py
- runner/run_sql_revision.py
- runner/run_sql_selection.py
- runner/convert_snapshot_to_sql.py: convert structured snapshots to evaluation outputs
- runner/evaluation.py: unified evaluation entry
- runner/benchmark_execution.py: execution-layer benchmark runner
- Python
>= 3.12 - Linux/macOS environment recommended
- OpenAI-compatible LLM endpoint for each stage
- Embedding endpoint or local embedding model for value retrieval
git clone https://github.com/HKUSTDial/DeepEye-SQL.git
cd DeepEye-SQLWe recommend uv.
curl -LsSf https://astral.sh/uv/install.sh | sh
uv syncSpider2 cloud evaluation may require valid:
- BigQuery credentials
- Snowflake credentials
The corresponding paths are configured in config/config-spider2-example.toml.
Use the provided helper script:
bash script/download_dataset.shThis downloads:
- Spider test split
- BIRD dev split
Please obtain Spider2 data from the official repository:
Follow the official setup instructions there, then place the prepared data under paths consistent with your config, for example:
data/spider2-litedata/spider2-snow
You also need valid cloud credentials if your Spider2 split references BigQuery or Snowflake databases.
Three examples are included:
- config/config-spider-example.toml
- config/config-bird-example.toml
- config/config-spider2-example.toml
[dataset]
type = "bird" # spider | bird | spider2
split = "dev"
root_path = "data/bird"
save_path = "workspace/dataset/bird/dev.snapshot"[vector_database]
api_type = "openai" # or local
embedding_model_name_or_path = "your-embedding-model"
store_root_path = "workspace/vector_database/bird/dev"
db_parallel = 2
column_parallel = 8[sql_generation.llm]
model = "your-model-name"
base_url = "https://your-openai-compatible-endpoint/v1"
api_key = "your-api-key"
max_tokens = 4096
temperature = 0.7
api_type = "openai"
max_model_len = 128000- Each stage can use a different model.
- All stage outputs are stored as structured
.snapshotmanifests. - Only structured
.snapshotmanifests are supported.
export CONFIG_PATH=config/config-bird-example.toml
bash script/run_pipeline.shexport CONFIG_PATH=config/config-bird-example.toml
uv run runner/preprocess_dataset.py
uv run runner/create_vector_db_parallel.py
uv run runner/run_value_retrieval.py
uv run runner/run_schema_linking.py
uv run runner/run_sql_generation.py
uv run runner/run_sql_revision.py
uv run runner/run_sql_selection.pyTypical outputs land under workspace/:
- dataset snapshot: workspace/dataset
- stage snapshots: workspace/value_retrieval, workspace/schema_linking, workspace/sql_generation, workspace/sql_revision, workspace/sql_selection
DeepEye-SQL uses a structured snapshot format to make long-running experiments resumable and inspectable.
uv run runner/preprocess_dataset.pyThis creates the initial dataset snapshot referenced by dataset.save_path.
uv run runner/create_vector_db_parallel.pyNotes:
- This step is required for Spider/BIRD.
- Spider2 skips vector DB creation because the current workflow does not use vector retrieval there.
Each stage consumes the previous stage snapshot and writes a new one.
uv run runner/convert_snapshot_to_sql.py \
--snapshot_path workspace/sql_selection/bird/dev.snapshotuv run runner/evaluation.py \
--snapshot_path workspace/sql_selection/bird/dev.snapshotThe unified evaluator supports Spider, BIRD, and Spider2.
uv run runner/evaluation.py --helpuv run runner/evaluation.py \
--snapshot_path workspace/sql_selection/bird/dev.snapshot \
--dataset_type birduv run runner/evaluation.py \
--snapshot_path workspace/sql_selection/spider2/lite.snapshot \
--dataset_type spider2 \
--dataset_split liteThe evaluator will:
- auto-detect dataset type when possible
- convert snapshot outputs when needed
- call the official Spider2 evaluation entry for Spider2 workflows
We include an execution-layer benchmark script for profiling SQL hotspots.
uv run runner/benchmark_execution.py \
--rows 20000 \
--iterations 8 \
--measure-repeat 5uv run runner/benchmark_execution.py \
--snapshot-path workspace/sql_selection/bird/dev.snapshot \
--snapshot-sample-size 20This is useful when you want to quantify:
- cached vs uncached execution cost
measure_time()overhead- SQL selection scan cost
- execution invocation counts per item
No. The current pipeline skips vector DB construction for Spider2 and relies on the Spider2-specific database and schema workflow.
Yes, as long as the endpoint is OpenAI-compatible, or the embedding stack is configured to use local models for vector indexing.
That is intentional fail-fast behavior. Each stage expects its predecessor snapshot to exist. Run the previous stage first.
If you find DeepEye-SQL useful in your research, please cite:
@article{li2025deepeye,
author = {Boyan Li and Chong Chen and Zhujun Xue and Yinan Mei and Yuyu Luo},
title = {{DeepEye-SQL:} A Software-Engineering-Inspired Text-to-SQL Framework},
journal = {Proc. {ACM} Manag. Data},
volume = {4},
number = {3},
year = {2026},
doi = {10.1145/3802035}
}This project is released under the MIT License. See LICENSE.
DeepEye-SQL builds on public benchmark ecosystems and OpenAI-compatible LLM serving stacks. We thank the maintainers of Spider, BIRD, Spider2, ChromaDB, OpenAI-compatible serving frameworks, and the broader Text-to-SQL research community.