Efficient Data Structures and Algorithms for Text Processing and Storage

The field of data structures and algorithms for text processing and storage is witnessing significant advancements. Researchers are focusing on developing innovative solutions that enable efficient and scalable processing of large textual datasets. A key direction is the exploration of compressed data structures, such as suffix trees and arrays, which facilitate fast query performance while minimizing storage requirements. Another area of interest is the design of specialized data structures, like ordered sets and log-structured merge key-value stores, that cater to specific use cases and offer improved performance. Noteworthy papers in this area include:

  • Engineering Fast and Space-Efficient Recompression from SLP-Compressed Text, which presents a groundbreaking approach to recompression that achieves substantial speedups and reductions in memory usage.
  • Compressing Suffix Trees by Path Decompositions, which introduces a novel method for compressing suffix trees that leads to a more efficient and elegant solution.
  • Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy, which proposes a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores. These developments have far-reaching implications for various applications, including text indexing, data compression, and storage systems, and are expected to influence the design of future data structures and algorithms.

Sources

String Matching with a Dynamic Pattern

Engineering Fast and Space-Efficient Recompression from SLP-Compressed Text

glass: ordered set data structure for client-side order books

Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)

Compressing Suffix Trees by Path Decompositions

PSM: Policy Synchronised Deterministic Memory

Built with on top of