Advances in Text Representation and Retrieval

The field of natural language processing and information retrieval is rapidly advancing, with a focus on developing more efficient and effective methods for text representation and retrieval. Recent research has explored the use of deep learning techniques, such as deep text hashing, to improve the accuracy and speed of text retrieval systems. Additionally, there is a growing interest in applying these techniques to real-world problems, such as e-mail spam detection and personalized semantic search for e-commerce. Noteworthy papers in this area include: Image Hashing via Cross-View Code Alignment, which introduces a simple and unified principle for learning binary codes that remain consistent across semantically aligned views. CAT-ID$^2$: Category-Tree Integrated Document Identifier Learning, which proposes a novel ID learning method that incorporates prior category information into semantic IDs. Embedding based Encoding Scheme for Privacy Preserving Record Linkage, which studies how embedding based encoding techniques can be applied to ensure the privacy of entities being linked.

Sources

Quantitative Intertextuality from the Digital Humanities Perspective: A Survey

A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation

Image Hashing via Cross-View Code Alignment in the Age of Foundation Models

Real-time and Zero-footprint Bag of Synthetic Syllables Algorithm for E-mail Spam Detection Using Subject Line and Short Text Fields

Embedding based Encoding Scheme for Privacy Preserving Record Linkage

Taxonomy-based Negative Sampling In Personalized Semantic Search for E-commerce

CAT-ID$^2$: Category-Tree Integrated Document Identifier Learning for Generative Retrieval In E-commerce

Numbering Combinations for Compact Representation of Many-to-Many Relationship Sets

From data to corpus: semiotic and documentary issues in audiovisual archives

Built with on top of