|Apr 22, 2021||The Call for Papers is out!|
|Apr 17, 2021||The Artificial Intelligence Journal (AIJ) and Salesforce are our generous sponsors this year.|
|Nov 19, 2020||Launch the workshop website|
Fair evaluations and comparisons are of fundamental importance to the NLP community to properly track progress, especially within the current deep learning revolution, with new state-of-the-art results reported in ever shorter intervals. This concerns the creation of benchmark datasets that cover typical use cases and blind spots of existing systems, the designing of metrics for evaluating the performance of NLP systems on different dimensions, and the reporting of evaluation results in an unbiased manner.
Although certain aspects of NLP evaluation and comparison have been addressed in previous workshops (e.g., Metrics Tasks at WMT, NeuralGen, NLG-Evaluation, and New Frontiers in Summarization), we believe that new insights and methodology, particularly in the last 1-2 years, have led to much renewed interest in the workshop topic. The first workshop in the series, Eval4NLP’20 (collocated with EMNLP’20), was the first workshop to take a broad and unifying perspective on the subject matter. We believe the second workshop will continue the tradition and become a reputed platform for presenting and discussing latest advances in NLP evaluation methods and resources.
Particular topics of interest of the workshop include (but not limited to):
See reference papers here.
HumEval invites submissions on all aspects of human evaluation of NLP systems.