Skip to content

Latest commit

 

History

History
102 lines (70 loc) · 6.46 KB

Readme.md

File metadata and controls

102 lines (70 loc) · 6.46 KB

msu-2016

Software dependencies

  1. python-numpy
  2. cython

This code has been developed using a virtualenv with a numpy install.

  1. Create a python virtualenv
  2. source [your-virtual-env]/bin/activate
  3. pip install numpy
  4. pip install cython
  5. python setup.py build_ext --inplace
  6. run commands

Note: cython is needed for processing over complex user and interface models. Complex interface models necessitate looping over all submitted updates; looping over lists is very slow in pure Python.

Data Requirements

  1. Temporal Summarization 2013 qrels (present in data/ts-2013/qrels)
  2. Temporal Summarization 2013 submitted runs (download from TREC into data/ts-2013/submitted-runs).
  3. Lengths of all sentences submitted to TS 2013 (download from here into data/ts-2013/update-lengths).
  4. Temporal Summarization 2014 submitted runs (download from TREC into data/ts-2014/submitted-runs).
  5. Lengths of all sentences submitted to TS 2014 (download from here into data/ts-2014/update-lengths).

Code Layout

msu-2016 : Main codebase
├── Readme.md : This Readme.md
├── modeled_stream_utility.py : main script
├── nugget.py : nugget class for Temporal Summarization tracks
├── update.py : update class for sentences submitted to Temporal Summarization tracks
├── get_query_durations.py : extracts start and end timestamps for query durations from the tracks' topics.xml file
├── probability_distributions.py : base classes for probability distributions
├── population_model.py : user population model
├── user_model.py : user behavior model
├── user_interface_model.py : user interface models
├── utils.py : Utility functions

Cython-ic files for complex user interface models
├── cython_computations.pyx : defines a custom heap class and computes msu for ranked interfaces
├── setup.py : builds cython_computations library for importing into python code

Files for comparing msu-2016 and sigir-2015 codebases (see codebase-comparison)
├── modeled_stream_utility_with_time_trails.py: script to evaluate using time trails made by R (see sigir-2015/Readme.md )
└── gen-pythonic-time-trails.py: script to compute MSU given user time-trails generated by sigir-2015/generate.time.trails.R

Running the Evaluation

TS 2013

cd msu-2016;

python modeled_stream_utility.py ts13 ../data/ts-2013/qrels/matches.tsv ../data/ts-2013/qrels/nuggets.tsv ../data/ts-2013/qrels/pooled_updates.tsv ../data/ts-2013/qrels/topics_masked.xml ../data/ts-2013/update-lengths/ 1000 120 60 10800 5400 0.5 ../data/ts-2013/submitted-runs/input.* > msu2016-code.ts2013.results.all

cat msu2016-code.ts2013.results.all | grep AVG | gawk '{print $1, $3}' | sed 's_^input.__g' > msu2016-code.ts2013.results.avg

The msu2016-code.ts2013.results.avg file will contain the average MSU scores for the TS 2013 runs; MSU is computed for each system by simulating 1000 "reasonable" users with the user population having the parameters 2 ± 1 minute for reading sessions and 3 ± 1.5 hours spent away from the system with a latenes decay parameter 0.5.

TS 2014

cd msu-2016;

python modeled_stream_utility.py ts14 ../data/ts-2014/qrels/matches.tsv ../data/ts-2014/qrels/nuggets.tsv ../data/ts-2014/qrels/updates_sampled.tsv ../data/ts-2014/qrels/trec2014-ts-topics-test.xml ../data/ts-2014/update-lengths/ 1000 120 60 10800 5400 0.5 ../data/ts-2014/submitted-runs/* > msu2016-code.ts2014.results.all

Under development

modeled_stream_utility_ranked_order.py: We compute here MSU over an interface that presents users with ranked updates presented one at a time; users are assumed to follow the Rank Biased Precision user model; updates older than one day are removed from further consideration.

Codebase-comparison

We compare here the results generated by this new all-Python-ic code vs. the R+Python code developed for the MSU paper.

  1. First we follow the proceduure outlined in sigir-2015/Readme.md and generate user time-trails and results for all runs of TS 2013 track. This generates a file ../data/ts-2013/simulation-data/0.mean.metrics as the output (Note that multiple MSU derived metrics are also reported in this file.)

  2. we then

     ``` 
     cd msu-2016;
    
     python modeled_stream_utility_with_time_trails.py ../data/ts-2013/qrels/matches.tsv ../data/ts-2013/qrels/nuggets.tsv ../data/ts-2013/qrels/pooled_updates.tsv ../data/ts-2013/topic_query_durations ../data/ts-2013/update-lengths/ 1000 ../data/ts-2013/simulation-data/0.user.params ../data/ts-2013/simulation-data/0.time-trails/ 0.5 ../data/ts-2013/submitted-runs/input.* | grep AVG | sed 's_^input.__g' | gawk -v OFS="\t" '{print $1, $3}' > new.code.results 
     ```
    
  3. get the sigir-2015 code's results

     ```
     gawk -v  OFS=O"\t" '(NR>1){print $1, $2}' ../data/ts-2013/simulation-data/0.mean.metrics > old.code.results
     ```
    
  4. Comparing the old code results and the new code results results in no differences when using the R generated time-trails. This indicates that the MSU computation part of the old and new codes produce the same results.

     ```
     $ diff old.code.results new.code.results
     $  
     ```
    
  5. HOWEVER, using the new all-Python-ic code produces difference absolute system scores. This we can attribute to the difference in the sampling processes of Python-numpy and R. Though the underlying probability distributions are parameterized identically, the sampled random deviates do differ, causing a slight change in respective absolute system scores. Note that the systems ranking does not significantly change with the new code (Kendall's tau > 0.987 between msu-2016 and sigir-2015 code bases). The RMSE between the absolute scores produced by both code bases is 0.027.