-
Notifications
You must be signed in to change notification settings - Fork 0
/
DividedTheyBlog.rmd
221 lines (161 loc) · 6.4 KB
/
DividedTheyBlog.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
title: Analysing the "Divided They Blog" network with R/igraph
author: "Robert Ackland"
date: "23 June 2017"
output: pdf_document
graphics: yes
---
## Introduction
In this exercise we will analyse the "Divided They Blog"(40 A-listers) dataset.
These data come from:
Adamic, L. and Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery (LINKDD 2005), pages 6–43.
First, load up the igraph package.
```{r eval=TRUE}
library(igraph)
```
Now we will create a data frame containing the edgelist, and then create our igraph graph object.
```{r eval=TRUE}
#load edgelist in csv file format
#edge_dat <- read.csv("DividedTheyBlog_40Alist_Edges.csv",header=TRUE)
edge_dat <- read.csv("http://vosonlab.net/papers/Taiwan_2017/DividedTheyBlog_40Alist_Edges.csv",header=TRUE)
#igraph likes two-column matrix format
el <- as.matrix(edge_dat)
#create igraph graph object from the edgelist
g <- graph.edgelist(el,directed=TRUE)
```
We can get descriptive information about the network:
```{r eval=TRUE}
g
```
This informs us that there are 39 nodes and 363 edges in the network. It tells us that our graph is *D*irected, *N*amed, the edges are not *W*eighted, and it is not a *B*ipartite graph.
```{r eval=TRUE}
#Note: there is only 39, not 40, because one vertex (blog.johnkerry.com) is an isolate and hence not yet in the network...
length(V(g)$name)
#So let's add that vertex manually
g <- add.vertices(g,1,name="blog.johnkerry.com")
```
Next, we can visualise the network by plotting it directly in R:
```{r eval=TRUE}
png("figures/divided.png", width=800, height=700)
plot(g,edge.width=1.5,edge.curved=.5,edge.arrow.size=0.5)
dev.off()
```
This results in the following:
\begin{center}
\includegraphics{figures/divided.png}
\end{center}
Next we will do some more descriptive analysis:
```{r eval=TRUE}
#list of nodes
V(g)
#list of edges
E(g)
#accessing particular node
V(g)[2]
#accessing particular edge
E(g)[1]
#list of "name" (node) attributes - use head() to print the first 5
head(V(g)$name)
#number of nodes in network
vcount(g)
#another way
length(V(g))
#number of edges
ecount(g)
#another way
length(E(g))
#list of the node attributes
list.vertex.attributes(g)
#list of the edge attributes (we don't have any)
list.edge.attributes(g)
```
We will now look at some measures of node centrality:
```{r eval=TRUE}
#node indegree
head(degree(g, mode="in"))
#node outdegree
head(degree(g, mode="out") )
#top-5 nodes, based onindegree
V(g)[order(degree(g, mode="in"), decreasing=T)[1:5]]
#closeness centrality
head(closeness(g))
#betweenness centrality
head(betweenness(g))
```
##Getting attributes into the network
```{r eval=TRUE}
#load attributes in csv file format
#attr <- read.csv("DividedTheyBlog_40Alist_Vertices.csv",header=TRUE)
attr <- read.csv("http://vosonlab.net/papers/Taiwan_2017/DividedTheyBlog_40Alist_Vertices.csv",header=TRUE)
#We are now going to create a vertex attribute called "Stance" by extracting the value
#of the column "Stance" in the attributes file when the Vertex matches the
#vertex name.
#First , lets look at the first 5 vertex names using head()
head(V(g)$name) #head() prints the first 5 elements
#the vertex names in the attributes data frame
head(attr$Vertex)
length(attr$Vertex) #we have all 40 of the vertices here
#match searches for each of the vertex names (in the igraph object) and returns their
#row position in the attributes data frame
match(V(g)$name,attr$Vertex)
#so this says that "mypetjawa.mu.nu" is row 2 of attr$Vertex, "wizbangblog.com" is in
#row 17 etc. (confirm for yourself that this is the case)
#so match returns an integer vector (indicating the correct rows in the data frame)
#this is used to return a character vector of "Stance" that is in the correct order
#and can be input as a new vertex attribute in the graph object
V(g)$Stance=as.character(attr$Stance[match(V(g)$name,attr$Vertex)])
head(V(g)$Stance)
```
Now let's plot the network again, this time using the vertex attribute "Stance" for the node colour:
```{r eval=TRUE}
#the vertex attribute "color" will be used by the plot function for node color
V(g)$color <- ifelse(V(g)$Stance=="conservative","red","blue")
png("figures/divided2.png", width=800, height=700)
plot(g,edge.width=1.5,edge.curved=.5,edge.arrow.size=0.5)
dev.off()
```
This results in the following:
\begin{center}
\includegraphics{figures/divided2.png}
\end{center}
##Calculating the homophily index
In igraph we will calculate the mixing matrix using a function written by Gary Weissman (see: https://gist.github.com/gweissman/2402741, http://www.babelgraph.org/wp/?p=351)
```{r eval=TRUE}
mixmat <- function(mygraph, attrib, use.density=TRUE) {
require(igraph)
# get unique list of characteristics of the attribute
attlist <- sort(unique(get.vertex.attribute(mygraph,attrib)))
numatts <- length(attlist)
# build an empty mixing matrix by attribute
mm <- matrix(nrow=numatts,
ncol=numatts,
dimnames=list(attlist,attlist))
# calculate edge density for each matrix entry by pairing type
# lends itself to parallel if available
el <- get.edgelist(mygraph,names=FALSE)
for (i in 1:numatts) {
for (j in 1:numatts) {
mm[i,j] <- length(which(apply(el,1,function(x) {
get.vertex.attribute(mygraph, attrib, x[1] ) == attlist[i] &&
get.vertex.attribute(mygraph, attrib, x[2] ) == attlist[j] } )))
}
}
# convert to proportional mixing matrix if desired (ie by edge density)
if (use.density) mm/ecount(mygraph) else mm
}
mixmat(g, "Stance", use.density=FALSE)
```
Now, let's calculate the homophily index for conservatives.
```{r eval=TRUE}
#create the mixing matrix
mm <- mixmat(g, "Stance", use.density=FALSE)
#population share of conservative bloggers
w_c <- length(which(V(g)$Stance=="conservative"))/length(V(g))
w_c #OK, this dataset is not too interesting for calculating homophily....
#homogeneity index of conservative bloggers
H_c <- mm[1,1]/(mm[1,1]+mm[1,2])
H_c #76.5% of conservative blogger ties are directed to other conservatives
#Homophily index of conservative bloggers
Hstar_c <- (H_c-w_c)/(1-w_c)
Hstar_c #conservatives display slight tendency towards homophily
```