msu-2016

Software dependencies

python-numpy
cython

This code has been developed using a virtualenv with a numpy install.

Create a python virtualenv
source [your-virtual-env]/bin/activate
pip install numpy
pip install cython
python setup.py build_ext --inplace
run commands

Note: cython is needed for processing over complex user and interface models. Complex interface models necessitate looping over all submitted updates; looping over lists is very slow in pure Python.

Data Requirements

Temporal Summarization 2013 qrels (present in data/ts-2013/qrels)
Temporal Summarization 2013 submitted runs (download from TREC into data/ts-2013/submitted-runs).
Lengths of all sentences submitted to TS 2013 (download from here into data/ts-2013/update-lengths).
Temporal Summarization 2014 submitted runs (download from TREC into data/ts-2014/submitted-runs).
Lengths of all sentences submitted to TS 2014 (download from here into data/ts-2014/update-lengths).

Code Layout

msu-2016 : Main codebase
├── Readme.md : This Readme.md
├── modeled_stream_utility.py : main script
├── nugget.py : nugget class for Temporal Summarization tracks
├── update.py : update class for sentences submitted to Temporal Summarization tracks
├── get_query_durations.py : extracts start and end timestamps for query durations from the tracks' topics.xml file
├── probability_distributions.py : base classes for probability distributions
├── population_model.py : user population model
├── user_model.py : user behavior model
├── user_interface_model.py : user interface models
├── utils.py : Utility functions

Cython-ic files for complex user interface models
├── cython_computations.pyx : defines a custom heap class and computes msu for ranked interfaces
├── setup.py : builds cython_computations library for importing into python code

Files for comparing msu-2016 and sigir-2015 codebases (see codebase-comparison)
├── modeled_stream_utility_with_time_trails.py: script to evaluate using time trails made by R (see sigir-2015/Readme.md )
└── gen-pythonic-time-trails.py: script to compute MSU given user time-trails generated by sigir-2015/generate.time.trails.R

Running the Evaluation

TS 2013

cd msu-2016;

python modeled_stream_utility.py ts13 ../data/ts-2013/qrels/matches.tsv ../data/ts-2013/qrels/nuggets.tsv ../data/ts-2013/qrels/pooled_updates.tsv ../data/ts-2013/qrels/topics_masked.xml ../data/ts-2013/update-lengths/ 1000 120 60 10800 5400 0.5 ../data/ts-2013/submitted-runs/input.* > msu2016-code.ts2013.results.all

cat msu2016-code.ts2013.results.all | grep AVG | gawk '{print $1, $3}' | sed 's_^input.__g' > msu2016-code.ts2013.results.avg

The msu2016-code.ts2013.results.avg file will contain the average MSU scores for the TS 2013 runs; MSU is computed for each system by simulating 1000 "reasonable" users with the user population having the parameters 2 ± 1 minute for reading sessions and 3 ± 1.5 hours spent away from the system with a latenes decay parameter 0.5.

TS 2014

cd msu-2016;

python modeled_stream_utility.py ts14 ../data/ts-2014/qrels/matches.tsv ../data/ts-2014/qrels/nuggets.tsv ../data/ts-2014/qrels/updates_sampled.tsv ../data/ts-2014/qrels/trec2014-ts-topics-test.xml ../data/ts-2014/update-lengths/ 1000 120 60 10800 5400 0.5 ../data/ts-2014/submitted-runs/* > msu2016-code.ts2014.results.all

Under development

modeled_stream_utility_ranked_order.py: We compute here MSU over an interface that presents users with ranked updates presented one at a time; users are assumed to follow the Rank Biased Precision user model; updates older than one day are removed from further consideration.

Codebase-comparison

We compare here the results generated by this new all-Python-ic code vs. the R+Python code developed for the MSU paper.

First we follow the proceduure outlined in sigir-2015/Readme.md and generate user time-trails and results for all runs of TS 2013 track. This generates a file ../data/ts-2013/simulation-data/0.mean.metrics as the output (Note that multiple MSU derived metrics are also reported in this file.)

we then

 ``` 
 cd msu-2016;

 python modeled_stream_utility_with_time_trails.py ../data/ts-2013/qrels/matches.tsv ../data/ts-2013/qrels/nuggets.tsv ../data/ts-2013/qrels/pooled_updates.tsv ../data/ts-2013/topic_query_durations ../data/ts-2013/update-lengths/ 1000 ../data/ts-2013/simulation-data/0.user.params ../data/ts-2013/simulation-data/0.time-trails/ 0.5 ../data/ts-2013/submitted-runs/input.* | grep AVG | sed 's_^input.__g' | gawk -v OFS="\t" '{print $1, $3}' > new.code.results 
 ```

get the sigir-2015 code's results

 ```
 gawk -v  OFS=O"\t" '(NR>1){print $1, $2}' ../data/ts-2013/simulation-data/0.mean.metrics > old.code.results
 ```

Comparing the old code results and the new code results results in no differences when using the R generated time-trails. This indicates that the MSU computation part of the old and new codes produce the same results.
```
 ```
 $ diff old.code.results new.code.results
 $  
 ```
```
HOWEVER, using the new all-Python-ic code produces difference absolute system scores. This we can attribute to the difference in the sampling processes of Python-numpy and R. Though the underlying probability distributions are parameterized identically, the sampled random deviates do differ, causing a slight change in respective absolute system scores. Note that the systems ranking does not significantly change with the new code (Kendall's tau > 0.987 between msu-2016 and sigir-2015 code bases). The RMSE between the absolute scores produced by both code bases is 0.027.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

msu-2016

Software dependencies

Data Requirements

Code Layout

Running the Evaluation

TS 2013

TS 2014

Under development

Codebase-comparison

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

msu-2016

Software dependencies

Data Requirements

Code Layout

Running the Evaluation

TS 2013

TS 2014

Under development

Codebase-comparison