Advances in Legal Knowledge Retrieval and Modeling

The field of legal knowledge retrieval and modeling is moving towards increased use of large language models (LLMs) and retrieval-augmented generation (RAG) systems to improve system performance and robustness. Researchers are exploring the application of LLMs to various legal tasks, including legal coding, nanobody-specific modeling, and legal question answering. A key challenge in this area is the lack of realistic legal benchmarks that capture the complexity of both legal retrieval and downstream legal question-answering. To address this, novel legal RAG benchmarks are being introduced, such as Bar Exam QA and Housing Statute QA. Another important direction is the development of methods to bring legal knowledge to the public, including the construction of legal question banks and interactive recommenders. Noteworthy papers in this area include:

  • NbBench, which introduces a comprehensive benchmark suite for nanobody representation learning, and
  • QBR, which proposes a question-bank-based approach to fine-grained legal knowledge retrieval for the general public,
  • Identifying Legal Holdings with LLMs, which presents a systematic study of the performance of modern LLMs on a legal benchmark dataset.

Sources

Explainability by design: an experimental analysis of the legal coding process

NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks

Identifying Legal Holdings with LLMs: A Systematic Study of Performance, Scale, and Memorization

Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use

Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models

A Reasoning-Focused Legal Retrieval Benchmark

Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model

QBR: A Question-Bank-Based Approach to Fine-Grained Legal Knowledge Retrieval for the General Public

Built with on top of