The field of natural language processing is moving towards improving the ability of large language models to handle complex logical queries and discourse understanding. Researchers are focusing on developing new benchmarks and evaluation metrics to assess the performance of these models on tasks such as logical reasoning, discourse parsing, and temporal relation extraction. The development of new datasets and tasks, such as Latent Reasoning Chain Extraction and Discourse Understanding, is also a key area of research. Notably, papers such as 'Do LLMs Really Struggle at NL-FOL Translation?' and 'ComLQ: Benchmarking Complex Logical Queries in Information Retrieval' have made significant contributions to the field by proposing novel evaluation protocols and benchmarks. Additionally, 'ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction' and 'BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models' have introduced new tasks and benchmarks to evaluate the discourse-level knowledge of modern language models.