Geometric Representation Learning for NLP

The field of natural language processing (NLP) is moving towards geometric representation learning, where the goal is to learn compact and semantically meaningful representations of text data. This is driven by the need to support emerging use cases that hybridize machine learning with knowledge graphs, and to improve the efficiency and scalability of textual representation learning. Recent developments have focused on embedding tensors in databases, representing data tensors as literals in RDF, and extending SPARQL to handle such literals. Another key area of research is the use of variational autoencoders (VAEs) to encode the knowledge of pre-trained language models into more compact and semantically disentangled representations. Additionally, there is a growing interest in manifold-constrained sentence embeddings, which enforce differential geometric constraints on the output space to encourage the learning of discriminative and topologically structured embeddings. Noteworthy papers include LangVAE and LangSpace, which offer a flexible and efficient way of building and analyzing textual representations, and Manifold-Constrained Sentence Embeddings via Triplet Loss, which demonstrates the value of embedding in manifold space. CSE-SFP is also a notable contribution, enabling unsupervised sentence representation learning via a single forward pass and producing higher-quality embeddings while reducing training time and memory consumption.

Sources

Representing and querying data tensors in RDF and SPARQL

LangVAE and LangSpace: Building and Probing for Language Model VAEs

Manifold-Constrained Sentence Embeddings via Triplet Loss: Projecting Semantics onto Spheres, Tori, and M\"obius Strips

CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass

Built with on top of