Error Correction in DNA Data Storage

The field of DNA data storage is advancing rapidly, with a focus on developing robust error correction techniques to ensure reliable information retrieval. Researchers are exploring new methods to overcome errors that occur during the storage and retrieval process, such as deletions, insertions, and substitutions. A key area of investigation is the development of capacity-achieving codes for channels with synchronized errors, which is crucial for DNA-based data storage systems. The use of larger alphabet sizes, such as quaternary alphabets, is also being investigated to improve the efficiency and durability of these systems. Furthermore, studies on error exponents and variable-number-of-reads protocols are leading to a better understanding of the trade-offs between error probability and the number of reads required. Noteworthy papers include:

A study on Levenshtein's sequence reconstruction problem for larger alphabet sizes, which reveals surprising behavior of certain error types.
A paper on the capacity of insertion channels for small insertion probabilities, which establishes capacity in this regime and demonstrates its close alignment with achievable rates using i.i.d. inputs.
Research on channels with input-correlated synchronization errors, which identifies conditions for achieving capacity and provides explicit capacity-achieving codes for multi-trace channels.

Error Correction in DNA Data Storage

Sources