Advances in Structure-Aware Language Models and Privacy-Preserving Techniques

The field of natural language processing is moving towards incorporating structure-aware techniques into large language models (LLMs) to improve their performance on tasks involving structured inputs such as graphs. This direction is driven by the need to better capture the complexities of human language and to apply LLMs to a wider range of applications. Recent work has focused on developing methods that can effectively integrate graph topology into pretrained LLMs without requiring significant architectural changes. These advancements have the potential to enhance the performance of LLMs on tasks such as text generation from Abstract Meaning Representations (AMRs) and to enable more effective applications of LLMs in areas such as recommender systems and privacy-preserving data generation. Noteworthy papers in this area include SAFT, which introduces a structure-aware fine-tuning approach for AMR-to-text generation, and GraDe, which proposes a graph-guided dependency learning method for tabular data generation with LLMs. Additionally, researchers are exploring the use of federated learning to improve the performance and privacy of LLMs in decentralized environments. Papers such as FedWCM and FedVLM have made significant contributions to this area. Furthermore, the importance of privacy-preserving techniques in LLMs is being increasingly recognized, with papers such as CompLeak, Tab-MIA, and LoRA-Leak highlighting the risks of membership inference attacks and proposing new methods for mitigating these risks. Notable papers include CompLeak, which evaluates the privacy risks introduced by model compression, and LoRA-Leak, which introduces a holistic evaluation framework for membership inference attacks against LoRA fine-tuned language models.

Sources

SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation

Off-Policy Evaluation and Learning for Matching Markets

FedWCM: Unleashing the Potential of Momentum-based Federated Learning in Long-Tailed Scenarios

Comprehensive Privacy Risk Assessment in Social Networks Using User Attributes Social Graphs and Text Analysis

FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs

You Don't Bring Me Flowers: Mitigating Unwanted Recommendations Through Conformal Risk Control

CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage

Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation

FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs

SIFOTL: A Principled, Statistically-Informed Fidelity-Optimization Method for Tabular Learning

LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models

RecPS: Privacy Risk Scoring for Recommender Systems

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Built with on top of