Accepted Papers

The following papers have been accepted to and will be presented at Eval4NLP this year.

WRF: Weighted Rouge-F1 Metric for Entity Recognition Lukas Jonathan Weber, Krishnan Jothi Ramalingam, Matthias Beyer and Axel Zimmermann

Assessing Distractors in Multiple-Choice Tests Vatsal Raina, Adian Liusie and Mark Gales

Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages Yixuan Wang, Qingyan Chen and Duygu Ataman

EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media Zahra Kolagar, Sebastian Steindl and Alessandra Zarcone

Zero-shot Probing of Pretrained Language Models for Geography Knowledge Nitin Ramrakhiyani, Vasudeva Varma, Girish Keshav Palshikar and Sachin Pawar

Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End Yanran Chen and Steffen Eger

Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models’ Interaction with Interaction Log Information Jeremy E Block, Yu-Peng Chen, Abhilash Budharapu, Lisa Anthony and Bonnie J Dorr

Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content Savita Bhat and Vasudeva Varma

Can a Prediction’s Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models Jad Doughman, Shady Shehata, Leen Al Qadi, Youssef Nafea and Fakhri Karray

Shared Task

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror and Steffen Eger

HIT-MI&T Lab’s Submission to Eval4NLP 2023 Shared Task Rui Zhang, Fuhai Song, Hui Huang, Jinghao Yuan, Muyun Yang and Tiejun Zhao

Understanding Large Language Model Based Metrics for Text Summarization Abhishek Pradhan and Ketan Kumar Todi

LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task Pavan Baswani, Ananya Mukherjee and Manish Shrivastava

Which is better? Exploring Prompting Strategy For LLM-based Metrics JoongHoon Kim, Sangmin Lee, Seung Hun Han, Saeran Park, Jiyoon Lee, Kiyoon Jeong and Pilsung Kang

Characterised LLMs Affect its Evaluation of Summary and Translation Yuan Lu and Yu-Ting Lin

Reference-Free Summarization Evaluation with Large Language Models Abbas Akkasi, Kathleen C. Fraser and Majid Komeili

Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task Neema Kotonya, Saran Krishnasamy, Joel R. Tetreault and Alejandro Jaimes

Exploring Prompting Large Language Models as Explainable Metrics Ghazaleh Mahmoudi

Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation Daniil Larionov, Vasiliy Viskov, George Kokush, Alexander Panchenko and Steffen Eger