VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Changhan Wang, Anirudh Jain, Danlu Chen and Jiatao Gu

Conference on Empirical Methods in Natural Language Processing (EMNLP) Demo Track, 2019

Automatic evaluation for text generation tasks,such as machine translation, text summariza-tion, image captioning and video descriptionusually relies on specially designed metrics,for instance, BLEU (Papineni et al., 2002) or ROUGE (Lin, 2004). They, however, are ab-stract and not perfectly aligned with human assessment, which requires inspecting exam-ples as a complement to identify detailed error patterns. In this paper, we present VizSeq, avisual analysis toolkit for analyzing instance-level errors and corpus-level statistics on awide variety of text generation tasks. It sup-ports multimodal sources and multiple text ref-erences, providing visualization via both webpage and Jupyter notebook interfaces. It canbe used locally or deployed onto public serversfor centralized data hosting and benchmarking. It covers most common N-gram basedmetrics accelerated with multiprocessing, andalso provides latest embedding-based metricssuch as BERTScore (Zhang et al., 2019)

[paper] [code]