Eval4NLP - 2025

Important Dates

All deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”).

Paper submission deadline: September 29, 2025. Direct submission via OpenReview.
ARR commitment deadline: October 27, 2025. Commitment via OpenReview ARR Commitment
Commitment deadline from other venues: October 27, 2025. Commitment via OpenReview Other Commitment
Notification of acceptance: November 3, 2025
Camera-ready papers due: November 11, 2025
Workshop date: December 23, 2025

New: This year, the workshop focuses on developing model evaluation and human evaluation strategies for multitasking, multilingual, and multimodal scenarios, with special consideration for low-resource and highly distant languages. Other key topics include designing evaluation metrics, creating adequate evaluation data, and reporting correct results.

Overview

Fair evaluations and comparisons are of fundamental importance to the NLP community to properly track progress, especially within the current deep learning revolution, with new state-of-the-art results reported in ever shorter intervals. This concerns the creation of benchmark datasets that cover typical use cases and blind spots of existing systems, the design of metrics for evaluating the performance of NLP systems along different dimensions, and the reporting of evaluation results in an unbiased manner.

While some workshops (e.g., Metrics Tasks at WMT, NeuralGen, HumEval, EvalNLGEval, GEM, and New Frontiers in Summarization) have tackled certain aspects of NLP evaluation, recent advancements have enabled models to be general purpose while handling multiple tasks (i.e, language understanding, summarization, dialogue, question answering, reasoning, etc.) across multiple languages and modalities. This progress has introduced challenges, such as the need for robust evaluation methods, diverse datasets, and reliable result reporting. There is a growing demand for evaluation strategies that address multitasking, multilingual, and multimodal scenarios. The first workshop in the series, Eval4NLP’20 (collocated with EMNLP’20), was the first workshop to take a broad and unifying perspective on the subject matter. The second (Eval4NLP’21 collocated with EMNLP’21), third (Eval4NLP’22 collocated with AACL’22) and fourth (Eval4NLP’24 collocated with AACL’23) workshop extended this perspective. The fifth Eval4NLP workshop aims to promote model evaluation and human evaluation strategies for these recent complex settings.

Further topics of interest of the workshop include (but not limited to):

Designing evaluation metrics and evaluation methodology
Creating adequate evaluation data and evaluation test suites
Reporting correct and reproducible results

See call for papers for more details. Further, reference papers here.

Contact us

Email: eval4nlp@gmail.com