Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings
Yang Liu, Alan Medlar and Dorota Glowacka
Developing a Benchmark for Reducing Data Bias in Authorship Attribution
Benjamin Murauer and Günther Specht
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings
Oleg Vasilyev and John Bohannon
Testing Cross-Database Semantic Parsers With Canonical Utterances
Heather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev and Xi Victoria Lin
Writing Style Author Embedding Evaluation
Enzo Terreau, Antoine Gourru and Julien Velcin
HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish Text
Vivek Srivastava and Mayank Singh
Error-Sensitive Evaluation for Ordinal Target Variables
David Chen, Maury Courtland, Adam Faulkner and Aysu Ezen-Can
MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation
Ayush Garg, Sammed Kagi, Vivek Srivastava and Mayank Singh
SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation
Chester Palen-Michel, Nolan Holley and Constantine Lignos
How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Urja Khurana, Eric Nalisnick and Antske Fokkens
StoryDB: Broad Multi-language Narrative Dataset
Alexey Tikhonov, Igor Samenko and Ivan Yamshchikov
What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP
Oskar Wysocki, Malina Florea, Dónal Landers and André Freitas
Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing
Lucie Gianola, Hicham El Boukkouri, Cyril Grouin, Thomas Lavergne, Patrick Paroubek and Pierre Zweigenbaum
Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator
Nicolas Garneau and Luc Lamontagne
Referenceless Parsing-Based Evaluation of AMR-to-English Generation
Emma Manning and Nathan Schneider
Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations
Yo Ehara
Validating Label Consistency in NER Data Annotation
Qingkai Zeng, Mengxia Yu, Wenhao Yu, Tianwen Jiang and Meng Jiang
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao, Steffen Eger and Yang Gao
IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task
Marcos Treviso, Nuno M. Guerreiro, Ricardo Rei and André F. T. Martins
The UMD Submission to the Explainable MT Quality Estimation Shared Task: Combining Explanation Models with Sequence Labeling
Tasnim Kabir and Marine Carpuat
Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics
Christoph Wolfgang Leiter
Explaining Errors in Machine Translation with Absolute Gradient Ensembles
Melda Eksi, Erik Gelbing, Jonathan Stieber and Chi Viet Vu
Explainable Quality Estimation: CUNI Eval4NLP Submission
Peter Polák, Muskaan Singh and Ondřej Bojar
Error Identification for Machine Translation with Metric Embedding and Attention
Raphael Rubino, Atsushi Fujita and Benjamin Marie