All deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”).
New: This year's edition of the Eval4NLP workshop puts a focus on the evaluation of and through large language models (LLMs). Notably, the workshop will feature a shared task on LLM evaluation and specifically encourages the submission of LLM evaluation focused papers. Other submissions that fit the general scope of Eval4NLP are of course also welcome. See below for more details.
The current year has brought astonishing achievements in NLP. Generative large language models (LLMs) like ChatGPT and GPT4 demonstrate wide capabilities in understanding and performing tasks from in-context descriptions without fine-tuning, bringing world-wide attention to the risks and opportunities that arise from current and ongoing research. Further, the release of open-source models like LLaMA and Falcon LLM, better quantization techniques for inference and training, as well as the adaptation of efficient fine-tuning techniques such as LORA accelerate the research progress by allowing hardware and runtime efficiency. Given the ever growing speed of research, fair evaluations and comparisons are of fundamental importance to the NLP community in order to properly track progress. This concerns the creation of benchmark datasets that cover typical use cases and blind spots of existing systems, the designing of metrics for evaluating the performance of NLP systems on different dimensions, and the reporting of evaluation results in an unbiased manner.
Although certain aspects of NLP evaluation and comparison have been addressed in previous workshops (e.g., Metrics Tasks at WMT, NeuralGen, NLG-Evaluation, and New Frontiers in Summarization), we believe that new insights and methodology, particularly in the last 2-3 years, have led to much renewed interest in the workshop topic. The first workshop in the series, Eval4NLP’20 (collocated with EMNLP’20), was the first workshop to take a broad and unifying perspective on the subject matter. The second (Eval4NLP’21 collocated with EMNLP’21) and third (Eval4NLP’22 collocated with AACL’22) workshop extended this perspective. We believe the fourth workshop will continue the tradition and become a reputed platform for presenting and discussing latest advances in NLP evaluation methods and resources. As indicated above, this year we especially encourage the submission of works that consider the evaluation of LLMs and their generated content as well as works that leverage LLMs in their evaluation strategies.
Further topics of interest of the workshop include (but not limited to):