WRF: Weighted Rouge-F1 Metric for Entity Recognition Lukas Jonathan Weber, Krishnan Jothi Ramalingam, Matthias Beyer and Axel Zimmermann
Assessing Distractors in Multiple-Choice Tests Vatsal Raina, Adian Liusie and Mark Gales
Delving into Evaluation Metrics for Generation: A Thorough Assessment of How Metrics Generalize to Rephrasing Across Languages Yixuan Wang, Qingyan Chen and Duygu Ataman
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media Zahra Kolagar, Sebastian Steindl and Alessandra Zarcone
Zero-shot Probing of Pretrained Language Models for Geography Knowledge Nitin Ramrakhiyani, Vasudeva Varma, Girish Keshav Palshikar and Sachin Pawar
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End Yanran Chen and Steffen Eger
Summary Cycles: Exploring the Impact of Prompt Engineering on Large Language Models’ Interaction with Interaction Log Information Jeremy E Block, Yu-Peng Chen, Abhilash Budharapu, Lisa Anthony and Bonnie J Dorr
Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content Savita Bhat and Vasudeva Varma
Can a Prediction’s Rank Offer a More Accurate Quantification of Bias? A Case Study Measuring Sexism in Debiased Language Models Jad Doughman, Shady Shehata, Leen Al Qadi, Youssef Nafea and Fakhri Karray
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror and Steffen Eger
HIT-MI&T Lab’s Submission to Eval4NLP 2023 Shared Task Rui Zhang, Fuhai Song, Hui Huang, Jinghao Yuan, Muyun Yang and Tiejun Zhao
Understanding Large Language Model Based Metrics for Text Summarization Abhishek Pradhan and Ketan Kumar Todi
LTRC_IIITH’s 2023 Submission for Prompting Large Language Models as Explainable Metrics Task Pavan Baswani, Ananya Mukherjee and Manish Shrivastava
Which is better? Exploring Prompting Strategy For LLM-based Metrics JoongHoon Kim, Sangmin Lee, Seung Hun Han, Saeran Park, Jiyoon Lee, Kiyoon Jeong and Pilsung Kang
Characterised LLMs Affect its Evaluation of Summary and Translation Yuan Lu and Yu-Ting Lin
Reference-Free Summarization Evaluation with Large Language Models Abbas Akkasi, Kathleen C. Fraser and Majid Komeili
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task Neema Kotonya, Saran Krishnasamy, Joel R. Tetreault and Alejandro Jaimes
Exploring Prompting Large Language Models as Explainable Metrics Ghazaleh Mahmoudi
Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation Daniil Larionov, Vasiliy Viskov, George Kokush, Alexander Panchenko and Steffen Eger