Advances in High-Performance Data Processing and Query Optimization

The field of data processing and query optimization is moving towards improving performance and efficiency in handling large-scale data and complex queries. Researchers are exploring new techniques and frameworks to optimize SQL+ML queries, improve vector search and retrieval, and enhance the performance of disk-based vector databases. There is also a growing interest in developing static analysis frameworks for predicting reuse profiles and optimizing memory access patterns. Additionally, the development of real-time data systems and streaming frameworks is becoming increasingly important for applications that require timely data processing and analysis. Noteworthy papers in this area include: Optimization techniques for SQL+ML queries, which achieved significant performance gains through query plan optimization and caching. Individualized non-uniform quantization for vector search, which improved accuracy and reduced computational cost. CALL, a context-aware query grouping mechanism, which reduced search latency and improved cache hit ratios. Static Estimation of Reuse Profiles for Arrays in Nested Loops, which predicted reuse profiles and cache hit rates with high accuracy. ARCADE, a real-time data system, which supported high-throughput ingestion and expressive hybrid and continuous query processing. FusedANN, a geometric framework, which enabled efficient approximate search and improved recall-latency tradeoffs.

Sources

Optimization techniques for SQL+ML queries: A performance analysis of real-time feature computation in OpenMLDB

Individualized non-uniform quantization for vector search

CALL: Context-Aware Low-Latency Retrieval in Disk-Based Vector Databases

Static Estimation of Reuse Profiles for Arrays in Nested Loops

In-Transit Data Transport Strategies for Coupled AI-Simulation Workflow Patterns

Automated Insertion of Flushes and Fences for Persistency

To Stream or Not to Stream: Towards A Quantitative Model for Remote HPC Processing Decisions

ARCADE: A Real-Time Data System for Hybrid and Continuous Query Processing across Diverse Data Modalities

FusedANN: Convexified Hybrid ANN via Attribute-Vector Fusion

Built with on top of