The 2nd Workshop on Evaluation and Comparison for NLP systems (Eval4NLP), co-located at the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP2021), invites the submission of long and short papers, with a theoretical or experimental nature, describing recent advances in system evaluation and comparison in NLP.
Important Dates (Tentative)
All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).
- Submission deadline: July 24, 2021
- Retraction of papers accepted for EMNLP (main conference): August 28, 2021
- Notification of acceptance: September 3, 2021
- Camera-ready papers due: September 24, 2021
- Workshop day: November 10 or 11, 2021
Designing evaluation metrics
Proposing and/or analyzing:
- Metrics with desirable properties, e.g., high correlations with human judgments, strong in distinguishing high-quality outputs from mediocre and low-quality outputs, robust across lengths of input and output sequences, efficient to run, etc.;
- Reference-free evaluation metrics, which only require source text(s) and system predictions;
- Cross-domain metrics, which can reliably and robustly measure the quality of system outputs from heterogeneous modalities (e.g., image and speech), different genres (e.g., newspapers, Wikipedia articles and scientific papers) and different languages;
- Cost-effective methods for eliciting high-quality manual annotations; and
- Methods and metrics for evaluating interpretability and explanations of NLP models
Creating adequate evaluation data
Proposing new datasets or analyzing existing ones by studying their:
- Coverage and diversity, e.g., size of the corpus, covered phenomena, representativeness of samples, distribution of sample types, variability among data sources, eras, and genres; and
- Quality of annotations, e.g., consistency of annotations, inter-rater agreement, and bias check
Reporting correct results
Ensuring and reporting:
- Statistics for the trustworthiness of results, e.g., via appropriate significance tests, and reporting of score distributions rather than single-point estimates, to avoid chance findings;
- Reproducibility of experiments, e.g., quantifying the reproducibility of papers and issuing reproducibility guidelines; and
- Comprehensive and unbiased error analyses and case studies, avoiding cherry-picking and sampling bias.
See reference papers here
The workshop welcomes two types of submission -- long and short papers. Long papers may consist of up to 8 pages of content, plus unlimited pages of references. Short papers may consist of up to 4 pages of content, plus unlimited pages of references. Please follow the EMNLP 2021 formatting requirements, using the official templates provided by the main conference. Final versions of both submission types will be given one additional page of content for addressing reviewers’ comments. The accepted papers will appear in the workshop proceedings.
The review process is double-blind. Therefore, no author information should be included in the papers. Self-references that reveal the author's identity must be avoided. Papers that do not conform to these requirements will be rejected without review.
The submission site will be available soon.
Multiple Submission Policy
Eval4NLP allows authors to submit a paper that is under review in another venue (journal, conference, or workshop) or to be submitted elsewhere during the Eval4NLP review period. However, the authors need to withdraw the paper from all other venues if they get accepted and want to publish in Eval4NLP. (Note that for papers submitted and accepted to the main EMNLP conference should retract them from the workshop by August 28.)
Best Paper Awards
Thanks to our generous sponsors, we will reward three prizes (at least $100 per award) to the best three paper submissions, as nominated by our program committee. Both long and short submissions will be eligible for prizes.