Advances in Text-to-SQL and Data Discovery

The field of text-to-SQL and data discovery is moving towards more innovative and adaptive approaches. Recent developments have focused on leveraging large language models (LLMs) to improve query generation and rewriting. There is also a growing interest in developing more expressive and flexible query languages that can handle complex queries and multiple database schemas. Noteworthy papers in this area include: End-to-End Text-to-SQL with Dataset Selection, which proposes a three-stage framework for identifying the user's intended database before generating SQL queries. E3-Rewrite, an LLM-based SQL rewriting framework that produces executable, equivalent, and efficient queries, achieving up to a 25.6% reduction in query execution time compared to state-of-the-art methods. TQL, a domain-specific language for data discovery that incorporates a type-like system to encompass downstream transformation-context in its discovery queries, providing a more expressive and practical approach to data discovery.

Sources

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

SQL-Exchange: Transforming SQL Queries Across Domains

Towards General-Purpose Data Discovery: A Programming Languages Approach

TQL: Towards Type-Driven Data Discovery

E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence,and Efficiency

Invertible Syntax without the Tuples (Functional Pearl)

Active Automata Learning with Advice

Built with on top of