-
Notifications
You must be signed in to change notification settings - Fork 0
/
presentation.Rmd
208 lines (166 loc) · 6.71 KB
/
presentation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Swiss Tournaments"
subtitle: "CIS700/04: Machine Learning and Econometrics"
author: "Chris Hua"
date: "April 20, 2017"
output:
beamer_presentation:
latex_engine: xelatex
header-includes:
- \usetheme{metropolis}
- \institute{University of Pennsylvania}
---
```{r setup, include=FALSE}
library(ggplot2)
library(dplyr)
library(viridis)
knitr::opts_chunk$set(echo = FALSE)
```
## Motivation
* Tournament structure and design
* Do they work?
## Swiss Tournaments
* Widely used, including chess, policy debate, Hearthstone
* Random start + power matched rounds
* In debate: preseason tournaments identify top-$k$ debaters
- Reaching eliminations earns a bid for the postseason tournament
Do Swiss tournaments find the top-$k$ competitors?
## Simulation: Bradley-Terry
* Tournaments are sets of pairwise comparisons
* Assume each team has an underlying strength $\theta$
- Simulated using lognormal distribution
* Find winner by doing a random draw
$$\Pr(Y_{i,j} = 1) = \frac{\theta_i}{\theta_i + \theta_j}$$
## Simulation: Pairings?
* 2 rounds of random pairings
* 4 rounds of power-matched pairings
* Teams cannot be paired with teams they've already faced
* Prefer teams with same # of wins, otherwise, max difference of 1
* Run 500 simulations
## Simulation: Pairings!
Maximum-weight perfect-matching
* Treat pairings as a graph problem
* Teams = nodes ($n$), possible pairings = edges ($m$)
* Complexity of $O(nm \log n) \sim O(n^3)$
## Metrics
* Champion: Top-team is undefeated (Copeland champion)
* Top-$k$: Percent of the top-$k$ teams by strength which meet selection criteria
* Spearman's $\rho$
* Kendalls $\tau$
## Simulations
500 trials each, recorded mean and standard deviation
| Size | Teams | Rounds | $K$ |
|------------|-------|--------|-------|
| Small | 32 | 5 | 8 |
| Medium | 64 | 6 | 16 |
| Large | 128 | 6 | 32 |
| Very large | 256 | 7 | 64 |
## Real-world data
* Scraped 2009-2010 and 2010-2011 policy debate tournament results
- 2009-2010: 13310 debated rounds by 1424 teams, in 67 tournaments.
- Did actual MLE estimates - but hard to estimate results
* Hearthstone @ Dreamhack 2016: 190 players to pick top 8 for playoffs, 9 rounds
## Results - synthetic data
```{r}
results <- read.csv("~/code/tournament/data/results.csv", stringsAsFactors = F)
results %>%
tidyr::gather(Metric, Value, found_champ:rho) %>%
tidyr::spread(Type, Value) %>%
dplyr::filter(Metric != "sq_loss", Size != "Hearthstone") %>%
ggplot(aes(x = Metric, y = Mean, fill = Pairings)) +
geom_bar(position=position_dodge(), stat = "identity") +
geom_errorbar(aes(ymin = Mean-SD, ymax = Mean+SD, alpha = "0.5"),
position=position_dodge(.9)) +
facet_wrap(~Size) +
theme(legend.position = "bottom")+
scale_alpha_manual(values = c("0.5"=0.5, "1"=1), guide='none')+
scale_y_continuous(breaks = c(0:15)/5)
```
## Results - Analysis
* Surprisingly, Swiss doesn't do significantly better than random pairings
* Swiss is worse (probably) at having top team go undefeated
* Swiss underperformed in large specification, overperformed in extra large tournament.
## Results - BT distribution
```{r}
library(BradleyTerryScalable)
data09 <- jsonlite::fromJSON("~/code/tournament/data/2009-2010.json")
clean <- function(x) {
x %>% rename(speaks = `speaker points`) %>%
group_by(position) %>%
dplyr::summarize(id = stringr::str_c(ID, collapse = ","),
speaks = base::sum(as.numeric(speaks))) %>%
ungroup %>%
summarize(aff = first(id), aff_speaks = first(speaks),
neg = last(id), neg_speaks = last(speaks))
}
pairwise09 <- purrr::map_df(data09$debaters, clean)
pairwise09$win <- data09$win
pairwise09 <- pairwise09 %>%
filter(aff != neg)
pairwise_mat_09 <- pairwise09 %>%
dplyr::select(aff, neg, win) %>%
mutate(aff = factor(aff), neg = factor(neg), win = as.numeric(win)) %>%
BradleyTerryScalable::pairs_to_matrix()
fit_09 <- BradleyTerryScalable::btfit(
pairwise_mat_09, 1.5, maxit = 1000000,
components = connected_components(pairwise_mat_09)$components)
fit_df3 <- data.frame(keyName = names(fit_09$pi[[1]]), value = fit_09$pi[[1]], row.names = NULL)
fit_df4 <- data.frame(keyName = names(fit_09$pi[[2]]), value = fit_09$pi[[2]], row.names = NULL)
fit_df5 <- data.frame(keyName = names(fit_09$pi[[3]]), value = fit_09$pi[[3]], row.names = NULL)
fit_df3$comp <- 1
fit_df4$comp <- 2
fit_df5$comp <- 3
fit_df3 <- rbind(fit_df3, fit_df4)
fit_df3 <- rbind(fit_df3, fit_df5)
fit_df3 %>%
ggplot(aes(x = value)) + geom_histogram(bins = 40) +
facet_wrap(~comp, scales = "free") +
ggtitle("Estimated B-T values: 2009-2010", "By connected component")
```
## Results - BT distribution
```{r}
data10 <- jsonlite::fromJSON("~/code/tournament/data/2010-2011.json")
pairwise10 <- purrr::map_df(data10$debaters, clean)
pairwise10$win <- data10$win
# pairwise10 %<>%
# filter(aff != "28417,28417", neg != "28417,28417")
# write.csv(pairwise10, "~/code/tournament/data/pairs10.csv")
pairwise_mat_10 <- pairwise10 %>%
dplyr::select(aff, neg, win) %>%
mutate(aff = factor(aff), neg = factor(neg), win = as.numeric(win)) %>%
BradleyTerryScalable::pairs_to_matrix()
fit_10 <- BradleyTerryScalable::btfit(
pairwise_mat_10, 1.5, maxit = 1000000,
components = connected_components(pairwise_mat_10)$components)
fit_df <- data.frame(keyName = names(fit_10$pi[[1]]), value = fit_10$pi[[1]], row.names = NULL)
fit_df2 <- data.frame(keyName = names(fit_10$pi[[2]]), value = fit_10$pi[[2]], row.names = NULL)
fit_df$comp <- 1
fit_df2$comp <- 2
fit_df <- rbind(fit_df, fit_df2)
fit_df %>%
ggplot(aes(x = value)) + geom_histogram(bins = 40) +
facet_wrap(~comp, scales = "free") +
ggtitle("Estimated B-T values: 2010-2011", "By connected")
```
## Results - Hearthstone
```{r}
results <- read.csv("~/code/tournament/data/results.csv", stringsAsFactors = F)
results %>%
tidyr::gather(Metric, Value, found_champ:rho) %>%
tidyr::spread(Type, Value) %>%
dplyr::filter(Metric != "sq_loss", Size == "Hearthstone") %>%
ggplot(aes(x = Metric, y = Mean, fill = Pairings)) +
geom_bar(position=position_dodge(), stat = "identity") +
geom_errorbar(aes(ymin = Mean-SD, ymax = Mean+SD, alpha = "0.5"),
position=position_dodge(.9)) +
facet_wrap(~Size) +
theme(legend.position = "bottom")+
scale_alpha_manual(values = c("0.5"=0.5, "1"=1), guide='none')+
scale_y_continuous(breaks = c(0:15)/5)
```
## Conclusion
* Variety of real world settings tested
* Swiss rarely outperforms random pairings, and usually does very similarly
* Further work:
- Different pairing strategies
- Further investigation of effect of size