The field of natural language processing is moving towards a more rigorous evaluation of Large Language Model (LLM) inference systems, with a focus on recognizing and avoiding common anti-patterns in evaluation methodologies. This is crucial for meaningful comparisons and reproducible results, particularly in the context of detecting AI-generated text. The development of more sophisticated detection methods is also a key area of research, with a growing recognition of the need for diffusion-aware detectors and more robust stylometric signatures. Furthermore, the creation of comprehensive benchmarks for text anomaly detection is enabling a more systematic evaluation of existing methods and the development of innovative approaches. Noteworthy papers include: On Evaluating Performance of LLM Inference Serving Systems, which establishes a rigorous foundation for evaluation methodology. Can You Detect the Difference?, which highlights the need for diffusion-aware detectors. Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding, which introduces a comprehensive benchmark for text anomaly detection.