Code and Data for Multi-Document Summarization with Determinantal Point Process Attention.
We use the WikiCatSum dataset available here. In particular, for our controlled experiments we use an Oracle (Section 4 in the paper) to rank the input and then truncate it to 500 input tokens.
We use the MultiNews data as preprocessed by Fabbri et al. (2019) (here).
Our code extends implementations in OpenNMT (Pointer-Generator and Transformer) here and Fairseq (ConvSeq2Seq) here to use DPP attention.
We use the wrapper script test_rouge.py as used in MultiNews.
We installed BERTScore with pip install bert-score
(version 0.3.9). Our script to run BERTScore: run_bertscore.sh (and previous formatting needed by Fariseq outputs is done by running this Python script: format-fairseqout-to-bertscore).
For the sentence mover's similarity metrics we follow the code here SMS github. Our script to run this metric: run_sms.sh.
We adapt the model proposed in Neural Text Summarization: A Critical Evaluation for multi-document evaluation. Installation instructions and the trained model can be found in FACTCC github. You will need to run format-to-factCC-eval.py to format model outputs as expected, factcc-eval.sh (with updated directory references from factCC/modeling/scripts/) to run the model evaluation, and factCC-summarise-predictions.py to summarise results. Note that we provide our modified version of FactCC run.py.
We implement the Fact_acc metric from Assessing the factual accuracy of generated text and use the relation extraction system proposed by (Sorokin and Gurevych, 2017) available at Relation Extraction github. For installation follow instructions provided there. Our script to run this metric is run_relext.sh.