Call for Papers

The 2nd Workshop on "Evaluation & Comparison of NLP Systems" Co-located at EMNLP 2021

The 2nd Workshop on Evaluation and Comparison for NLP systems (Eval4NLP), co-located at the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP2021), invites the submission of long and short papers, with a theoretical or experimental nature, describing recent advances in system evaluation and comparison in NLP.

Latest News

Jul 25, 2021The submission deadline has been extended to July 31, 2021.
Jul 20, 2021The Multiple Submission Policy and Presenting Published Papers sections have been updated.
May 22, 2021The submission system is now open!
More details about preprints and supplementary materials are added to this call.
May 14, 2021We also welcome submissions from ACL Rolling Review.
Apr 22, 2021The Call for Papers is out!

Important Dates

All deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”).

  • ARR submission deadline: July 15, 2021 (See more details)
  • Workshop submission deadline: July 24, 2021  July 31, 2021
  • Submission of papers and reviews from ARR to Eval4NLP: August 24, 2021
  • Notification of acceptance: September 3, 2021
  • Camera-ready papers due: September 24, 2021
  • Workshop day: November 10 or 11, 2021


  1. Designing evaluation metrics
    Proposing and/or analyzing:
    • Metrics with desirable properties, e.g., high correlations with human judgments, strong in distinguishing high-quality outputs from mediocre and low-quality outputs, robust across lengths of input and output sequences, efficient to run, etc.;
    • Reference-free evaluation metrics, which only require source text(s) and system predictions;
    • Cross-domain metrics, which can reliably and robustly measure the quality of system outputs from heterogeneous modalities (e.g., image and speech), different genres (e.g., newspapers, Wikipedia articles and scientific papers) and different languages;
    • Cost-effective methods for eliciting high-quality manual annotations; and
    • Methods and metrics for evaluating interpretability and explanations of NLP models
  2. Creating adequate evaluation data
    Proposing new datasets or analyzing existing ones by studying their:
    • Coverage and diversity, e.g., size of the corpus, covered phenomena, representativeness of samples, distribution of sample types, variability among data sources, eras, and genres; and
    • Quality of annotations, e.g., consistency of annotations, inter-rater agreement, and bias check
  3. Reporting correct results
    Ensuring and reporting:
    • Statistics for the trustworthiness of results, e.g., via appropriate significance tests, and reporting of score distributions rather than single-point estimates, to avoid chance findings;
    • Reproducibility of experiments, e.g., quantifying the reproducibility of papers and issuing reproducibility guidelines; and
    • Comprehensive and unbiased error analyses and case studies, avoiding cherry-picking and sampling bias.
See reference papers here.

Submission Guidelines

The workshop welcomes two types of submission -- long and short papers. Long papers may consist of up to 8 pages of content, plus unlimited pages of references. Short papers may consist of up to 4 pages of content, plus unlimited pages of references. Please follow the EMNLP 2021 formatting requirements, using the official templates provided by the main conference. Final versions of both submission types will be given one additional page of content for addressing reviewers’ comments. The accepted papers will appear in the workshop proceedings.

The review process is double-blind. Therefore, no author information should be included in the papers. Self-references that reveal the author's identity must be avoided. Papers that do not conform to these requirements will be rejected without review.

The submission site (softconf) is now available at

Optional Supplementary Materials

Authors are allowed to submit (optional) supplementary materials (e.g., appendices, software, and data) to improve the reproducibility of results and/or to provide additional information that does not fit in the paper. All of the supplementary materials must be zipped into one single file (.tgz or .zip) and submitted via softconf together with the paper. However, because supplementary materials are completely optional, reviewers may or may not review or even download them. So, the submitted paper should be fully self-contained.


Papers uploaded to preprint servers (e.g., ArXiv) can be submitted to the workshop. There is no deadline concerning when the papers were made publicly available. However, the version submitted to Eval4NLP must be anonymized, and we ask the authors not to update the preprints or advertise them on social media while they are under review at Eval4NLP.

ACL Rolling Review

This year, our workshop also welcomes submissions from ACL Rolling Review (ARR). Authors of any papers that are submitted to ARR before July 15, 2021 11:59pm (Anywhere on Earth) and have their reviews ready before August 24, 2021, may submit their papers and reviews to our workshop (by August 24, 2021). Details about the logistics will be announced soon

Multiple Submission Policy

Eval4NLP allows authors to submit a paper that is under review in another venue (journal, conference, or workshop) or to be submitted elsewhere during the Eval4NLP review period. However, the authors need to withdraw the paper from all other venues if they get accepted and want to publish in Eval4NLP.

Note that the main EMNLP conference this year does not allow double submission to EMNLP workshops. So, papers submitted both to the main conference and EMNLP workshops (including us) will violate the multiple submission policy of the main conference. If authors would like to submit a paper under review by EMNLP to the Eval4NLP workshop, they need to withdraw their paper from EMNLP and submit it to our workshop before the workshop submission deadline -- July 24, 2021  July 31, 2021.

Best Paper Awards

Thanks to our generous sponsors, we will reward three prizes (at least $100 per award) to the best three paper submissions, as nominated by our program committee. Both long and short submissions will be eligible for prizes.

Presenting Published Papers

If you want to present a paper which has been published recently elsewhere (such as other top-tier AI conferences) at our workshop, you may send the details of your paper (Paper title, authors, publication venue, abstract, and a link to download the paper) directly to We will select a few high-quality and relevant papers to present at Eval4NLP. This allows such papers to gain more visibility from the workshop audience and increases the variety of the workshop program.

The submission deadline for this track is August 24, 2021 and the notification of acceptance will be sent on September 3, 2021. The submitted papers will be judged separately from the main track (submitted via softconf). Note that the chosen papers are considered as non-archival here and will not be included in the workshop proceedings.

Contact Information