library(igraph)
library(rio)
15 Social Network Analysis: A primer, a guide and a tutorial in R
1 Introduction
Social Network Analysis (SNA) has emerged as a computational method for studying interactions, relationships, and connections among a vast array of different entities that include humans, animals, and cities, just to name a few. Two related and largely overlapping fields are also concerned with the network as a concept: network science and network analysis. Network science is concerned with the study of the structure of networks, finding patterns and universal laws that may explain or underpin such structure in a large variety of phenomena. Network analysis is a very closely related field that is concerned with the analysis of networks that are not necessarily “social”. In this chapter, we will simply use the terms social network analysis and network analysis interchangeably.
1.1 What are networks?
A quintessential concept in most analytical methods is that observations are —or should be — independent from each other whereas, in network analysis, observations can be related and interdependent and may interact with or influence each other [1]. As such, network analysis offers a more realistic view of the interconnected world around us and allows us to paint an accurate picture of the relationships and interactions that underpin our world [2, 3]. For instance, when we study students’ engagement at a school, we may use a survey to measure each individual student’s engagement and compute statistics such as the correlation between engagement and grades. In doing so, we ignore that students interact with peers, teachers, and the environment around them [4, 5]. We also ignore that students get influenced by an engaged student, get supported by their friends, or face a problematic social environment that may hinder their engagement [1, 6]. Network analysis offers a rich set of methods for modeling and addressing such issues [7].
A network is simply a group of entities (often called vertices, nodes, or actors) connected through a relationship (often called edges, links, or arcs) [3]. In this chapter, we will use the terms vertices and edges for simplicity. Vertices can be humans (students, teachers, or families), ideas, keywords, behaviors, emotions, feelings, concepts, schools, countries, or any entity that can be hypothesized to have a relationship with other entities. Vertices may be connected through a vast array of relations. For instance, students may be connected to each other by being friends, teammates, classmates, group members, neighbors, sporting club fans, competitors, sharing a desk, or working together on an assignment. There is virtually no limit to how a researcher can hypothesize a network. Nevertheless, the interpretation of network analysis relies heavily on how the network was constructed [3, 8].
In learning contexts, the most common type of networks comes from digital data, and in particular online communications, the most common of which are discussion forums, where the conversation occurs between forum participants (vertices). Each reply from one participant to another forms a relationship between the two vertices, creating an edge in the network. In such a situation, we are talking about a directed network, where the interaction has a source (the person who replies) and a target (the replied-to) [9]. Other examples of directed networks are citation networks (where documents cite other documents), or users that follow other users in social media [10]. In turn, an undirected network contains non-directional relationships such as a network of siblings, friends, teammates, husband and wife, or co-workers [3.] Representing interactions as a network equips the researchers with a rich toolset to harness the power of social network analysis methods. Please note that the researcher can choose to model a directed network as undirected in case direction is deemed inappropriate according to theory, connect or research question.
3 Network analysis in R
The R language has a large number of libraries for the analysis of networks. The igraph
package —the one used in our chapter— seems to be the preferred package by the R community given the number of dependencies, i.e., the number of other packages that rely on igraph
or work with the igraph
format [45–47]. The igraph
package is fast, efficient, and well-respected within the academic community. The igraph
package is also well maintained, continuously updated, has a large community, and has been released for other platforms besides R, e.g., Python. Other packages, such as sna and network, have a large user base, especially among those who are interested in statistical network modeling. Any of these packages —sna
, network
or igraph
— can effectively perform the analysis described in this chapter. However, we will use igraph
based on its relative ease of use and convenience for the chapter objectives.
Example 1
Let us start with a simple example where we analyze the network created for Figure 6. Before doing anything else, we need to import the necessary packages. We will use igraph
to construct and represent networks, and we will use rio to download and import the data files that we need to use as an input for igraph
.
We can now use the import function from rio to download the data for the example (the data shown in Table 1), and assign it to a variable named SNA_example1.
<-
SNA_example1 import("https://github.com/lamethods/data/raw/main/8_examples/SNA_example1.xlsx")
The function graph_from_data_frame
from igraph
converts the edge list in Table 15.1 into a network. R expects a dataframe where the first two columns are used as edges (the first column is used as source column, and the second is used as source column). Please also note that the two columns can have any name. Also, all extra columns —if they are there— will be used as edge attributes. We can print it to see if it has been created correctly. The print function is commonly used to test if the graph creation has been successful. In other words, does the created network have the expected number of vertices, edges, and attributes?
<- graph_from_data_frame(SNA_example1)
Net print(Net)
IGRAPH 6b3ad19 DN-- 6 9 --
+ attr: name (v/c)
+ edges from 6b3ad19 (vertex names):
[1] B->A C->B D->C E->D F->E B->D C->A A->C D->A
The output of the print function gives a glimpse of the network properties. First, igraph
states that the object is an igraph
object (a network). Then, igraph
gives a unique seven-letter identifier for the network (not usually needed for analysis). Next, igraph
tells us that the network was directed (D) and named (N), i.e., vertices have a name attribute. Then, igraph
lists the attributes and the edges of the network. We can also visualize the created network (Figure 15.6) by using the function plot.
plot(Net)
We have seen here the most basic functions we can use in a graph with no arguments. As shown, networks work with little effort with R. In the next section, we will take a deeper look into these functions and others using another network from a published paper.
Example 2
The next example is a larger network that comes from the interactions of a group of teachers in a Massive Open Online Course (MOOC). The MOOC included 445 participants from different places in the United States. The dataset has an edges file where the first two columns are the sender (source
) and receiver (target
). There is also a file for the vertices that contains demographic and other relevant information about each vertex: gender, location, and their role, etc. For more information about this dataset, please refer to the data chapter [48]. To get the data into R, we first need to read the data, store it in a dataframe and then build a network with the appropriate arguments.
The first line of the code reads the edges list data with their attributes into a dataframe with the name net_edges.
The second line imports the vertex data with their attributes into a dataframe with the name net_nodes.
<-
net_edges import("https://github.com/lamethods/data/raw/main/6_snaMOOC/DLT1%20Edgelist.csv")
<-
net_nodes import("https://github.com/lamethods/data/raw/main/6_snaMOOC/DLT1%20Nodes.csv")
To create the network, we again use the function graph_from_data_frame
. This time we have to specify the edges dataframe using the argument d=net_edges. The second argument (optional) tells igraph
that the network should be directed, if not provided, the network is created directed by default. The third argument which is also optional vertices = net_nodes tells igraph
to use the dataframe net_nodes for vertex attributes. If the vertices argument is not provided, igraph
will extract vertex names from the edges data. In case there are important vertex attributes for the analysis, providing the vertices data can be useful. Building the network and explicitly setting all arguments —as we did— helps avoid the problems that could happen from the default settings of the function. For instance, the network could be created as directed where we aim at creating an undirected network. Note that igraph
generates a multigraph network by default (see Figure 15.7).
<- graph_from_data_frame(d=net_edges, directed = TRUE, vertices = net_nodes) DLT1
Let us now explore the network and see if it was built correctly using the function print. The print function output shows that the network is an igraph
object, directed and named (DN) has 445 vertices, 2529 edges and then igraph
lists the attributes of the vertices and the edges. Vertex attributes are listed along with their type. For instance, name (v/c) means the name attribute is a (v)ertex attribute and a (c)haracter. Edge attributes are listed in the same way. For instance, timestamp (e/c) means that it is an (e)dge and a (c)haracter.
print(DLT1)
IGRAPH 4f68816 DN-- 445 2529 --
+ attr: name (v/c), Facilitator (v/n), role1 (v/c), experience (v/n),
| experience2 (v/c), grades (v/c), location (v/c), region (v/c),
| country (v/c), group (v/c), gender (v/c), expert (v/c), connect
| (v/c), Timestamp (e/c), Discussion Title (e/c), Discussion Category
| (e/c), Parent Category (e/c), Category Text (e/c), Discussion
| Identifier (e/c), Comment ID (e/n), Discussion ID (e/n)
+ edges from 4f68816 (vertex names):
[1] 360->444 356->444 356->444 344->444 392->444 219->444 318->444 4 ->444
[9] 355->356 355->444 4 ->444 310->444 248->444 150->444 19 ->310 216->19
[17] 19 ->444 19 ->4 217->310 385->444 217->444 393->444 217->19 256->219
+ ... omitted several edges
A network can also be plotted with the function plot()
(Figure 15.7). However, plotting with R is a vast field and will not be discussed in detail here.
plot(DLT1, layout = layout.fruchterman.reingold,
vertex.size = 5, vertex.label.cex = 2)
3.1 Graph level analysis
Now that we have seen how to build a network from edge and vertex data, we are ready to understand some of the most commonly performed analyses in learning settings. The first type of analysis will look at the network level, or the whole group of collaborators. Analyzing the network level can tell us how interactive the group is, how cohesive, and how distributed the interactions are. We will go through each of these graph-level measures with a brief explanation of what they actually mean. We will use the data from example 1 (DLT1
).
Let us first start by calculating the basic measures of the network. The number of vertices can be queried using the function vcount
, which adds up to 445, and the number of edges can be queried using the function ecount
, which is 2,529. We can get the average number of interactions by a participant by dividing the number edges by the number of vertices which is 5.68.
vcount(DLT1) ## 445
ecount(DLT1) ## 2529
ecount(DLT1) / vcount(DLT1) ## 5.683146
The density of a graph is an important parameter of a collaborative network that refers to the ratio of existing edges to the maximum possible among all participants. Density is maximum (1) when every vertex has interacted with every other vertex in the network. Graph density can be measured using the function graph.density
.
graph.density(DLT1) ## 0.01279988
However, the graph.density
function may result in erratic results if the network is multigraph; this is because the igraph
algorithm will count the repeated edges and loops. Thus, we need to simplify the network (delete all repeated edges and loops) before computing the density and use the simplified network to compute the graph density. The results of the density of the graph of 0.0097 which is rather a low value.
graph.density(simplify(DLT1)) ## 0.009798563
Reciprocity is another important graph-level measure that also reflects group cohesion. A reciprocated edge is an edge where two vertices have a reciprocal relationship (e.g., they are simultaneously source and target) [16]. The higher the ratio of the reciprocated edges, the more collaborative the network is, less centralized (dominated by few) and more participatory. Reciprocity can be computed using the function reciprocity
, which automatically removes the loops (i.e., does not consider when a person replies to oneself). The reciprocity by igraph
definition is the fraction of reciprocated edges in a directed graph. The value here is 0.1997544 which means that only 20% of all edges were reciprocated.
reciprocity(DLT1) ## 0.1997544
We can also compute the dyad.census which returns the number of mutual interactions (reciprocated between a pair of vertices), the number of asymmetric interactions (interactions that are not reciprocated), and the number of non-connected pairs. The number of mutual interactions in our network is 212, which is relatively small given the asymmetric (1512) and non-connected pairs (97066).
dyad.census(DLT1) ## $mut [1] 212 $asym [1] 1512 $null [1] 97066
Transitivity (or global clustering coefficient) measures the tendency of the vertices to cluster together or form tightly knit cliques. In igraph
, transitivity is measured as the probability that the neighboring vertices of a vertex are also connected to each other or, more accurately, the ratio of triangles in the network to the total count of triplets (all occurrences of three vertices connected by two edges). There is a large volume of research that associates the ratio of cliques with cohesion in collaborative groups, strong ties and productive knowledge construction Block_2015. There are several methods for the estimation of transitivity. Here, we are going to focus on global transitivity (i.e., at the network-level) using the igraph
method. The transitivity can be calculated by the function transitivity
; the default function returns the global transitivity measure by default. The transitivity of our network here is 0.08880774.
transitivity(DLT1) ## 0.08880774
Another possible way is to use the related function triad_census
which reports the numbers of triangles and their different types. The reader may need to refer to the package manual to dig deeper in the results.
triad_census(DLT1)
[1] 13901588 486626 124805 4227 35745 11186 15929 3668
[9] 932 81 1857 376 223 334 345 68
Group productivity or intensity of interactivity can be explored using the degree measures and its variants. The average degree of the network measures how much on average each group member has contributed and received interactions. To compute the average degree, we first have to compute the degree for each member and then compute the mean.
In directed networks (like the one in this example), we can also compute the average in-degree and out-degree. For the same set of vertices, the network average in-degree should be equal to out-degree and both combined should be equal to the total degree. However, if we, for instance, compute a subset of vertices (only students excluding the teachers), in-degree and out-degree may be different. The code below computes the mean and median of the three measures, using the function degree with the argument mode="total"
for the total degree, mode="in"
for the in-degree, and mode=“out” for out-degree.
<- mean(degree(DLT1, mode = "total")) ## 11.36629
Mean_degree <- mean(degree(DLT1, mode = "in")) ## 5.683146
Mean_in_degree <- mean(degree(DLT1, mode = "out")) ## 5.683146
Mean_out_degree <- median(degree(DLT1, mode = "total")) ## 4
Median_degree <- median(degree(DLT1, mode = "in")) ## 1
Median_in_degree <- median(degree(DLT1, mode = "out")) ## 2 Median_out_degree
The mean degree is 11.36629 and the mean in-degree and out-degree are 5.683146. The median degree is 4, the median in-degree is 1, and the median out-degree is 2. The median differs significantly from the mean and may be more relevant here in this large network, where participation may not be well-distributed (see next section).
Collaboration is participatory by design but, oftentimes, some students may dominate and contribute disproportionately more than others. In the same vein, some may prefer to be isolated and thus rarely participate. Several measures allow us to measure the distribution of interactions across the network and how skewed the network contribution patterns are. An obvious method that comes straight from statistics is the standard deviation (SD) of the degree centrality. We can compute the SD like we calculated the mean and median in the previous step. The SD of degree centrality in our case is 34.2, SD for in-degree centrality is 26.7, and SD for out-degree centrality is 9.8. The SD is higher than the mean which suggests that calculation and inspection of the median was justified. We can also see that the SD of the in-degree centrality is much higher than the SD of the out-degree, which means that the variability in receiving replies is higher than that of contributions. This variability is rather common since students are selective about whom they respond to and choose the reply-worthy contributions.
<- sd(degree(DLT1, mode = "total")) ## 34.20511
SD_degree <- sd(degree(DLT1, mode = "in")) ## 26.73596
SD_in_degree <- sd(degree(DLT1, mode = "out")) ## 9.84249 SD_out_degree
SNA has dedicated indices for measuring dominance in networks, known as centralization indices. Centralization indices are 0 when every vertex contributes equally and reaches the maximum of 1 when a single vertex dominates. A centralization index exists for many centralities —e.g., degree, closeness, and betweenness. Nevertheless, most of the literature has reported degree centralization, which we will demonstrate here.
The code below computes the degree, in-degree and out-degree centralization. Please note that we use a simplified network to avoid the loops and repeated edges. The results of degree centralization confirm the previous results. The degree centralization is 0.38, out-degree centralization is 0.16, and in-degree centralization is 0.60. In our network, we see that the in-degree centralization is the highest index (0.60), which means that only a few students received replies.
<- centralization.degree(simplify(DLT1),
Centralization_degree mode = "all", loops = FALSE)$centralization ## 0.3826871
Warning: `centralization.degree()` was deprecated in igraph 2.0.0.
ℹ Please use `centr_degree()` instead.
<- centralization.degree(simplify(DLT1),
Centralization_in_degree mode = "in", loops = FALSE)$centralization ## 0.6064291
<- centralization.degree(simplify(DLT1),
Centralization_out_degree mode = "out", loops = FALSE)$centralization ## 0.1572214
Another way to see how the interactions are distributed is to plot the degree distribution using the hist function, as demonstrated in Figure 15.8.
par(mfrow=c(1,2))
hist(degree(DLT1, mode = "in"), breaks = 100)
hist(degree(DLT1, mode = "out"), breaks = 100)
3.2 Network connectivity
We can also examine how connected the whole group is; this can be performed using the function is.connected which returns FALSE
in our case, meaning that the graph has some disconnected components or subgroups of vertices that are isolated. We can check these subgroups by the function components which tells us that there are four components:
is.connected(DLT1)
Warning: `is.connected()` was deprecated in igraph 2.0.0.
ℹ Please use `is_connected()` instead.
[1] FALSE
<- components(DLT1)
Components print(Components)
$membership
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
441 442 443 444 445
2 3 4 1 1
$csize
[1] 442 1 1 1
$no
[1] 4
Using the function decompose, we can look at each of the components. The largest component has 442 vertices, and three others have one vertex. These isolated vertices are simply students who did not contribute at all and do not represent a real subgroup.
<- decompose(DLT1)
Decomposed 1]] Decomposed[[
IGRAPH be7756b DN-- 442 2529 --
+ attr: name (v/c), Facilitator (v/n), role1 (v/c), experience (v/n),
| experience2 (v/c), grades (v/c), location (v/c), region (v/c),
| country (v/c), group (v/c), gender (v/c), expert (v/c), connect
| (v/c), Timestamp (e/c), Discussion Title (e/c), Discussion Category
| (e/c), Parent Category (e/c), Category Text (e/c), Discussion
| Identifier (e/c), Comment ID (e/n), Discussion ID (e/n)
+ edges from be7756b (vertex names):
[1] 360->444 356->444 356->444 344->444 392->444 219->444 318->444 4 ->444
[9] 355->356 355->444 4 ->444 310->444 248->444 150->444 19 ->310 216->19
[17] 19 ->444 19 ->4 217->310 385->444 217->444 393->444 217->19 256->219
+ ... omitted several edges
We can also look at the network diameter or largest number of steps between vertices to see how far distant vertices are (using the distance function). A more representative variable would be the average distance between vertices, which we can obtain from the mean_distance
function. The network diameter is 8 and the mean distance is 3. Both numbers are relatively high. Global efficiency is another network-level measure that examines how effective is the network structure as a conduit for information exchange using the distances between vertices. When all vertices are close to each other, reachable with a few number of steps, the network is said to be efficient. The value of efficiency is high in well connected groups, and low otherwise. We can examine the efficiency using the function global_efficiency.
diameter(DLT1) ## 8
[1] 8
mean_distance(DLT1) ## 3.030694
[1] 3.030694
global_efficiency(DLT1) ## 0.1961034
[1] 0.1961034
3.3 Network operations
There are many functions and tools to manipulate networks in igraph
, which makes a comprehensive discussion of all of them beyond the scope of this introductory chapter. Nonetheless, we will discuss the most important functions. Oftentimes, we need to set an attribute to the vertices —e.g., setting the gender attribute for vertices— to be used in the analysis. Setting an attribute can be performed by using the V function followed by the $ character and the attribute that we want to set. Similarly, setting edge attributes can be done using the function E. In the next example, we define an attribute called weight
for the vertices and we do the same for the edges. Using the skills we learnt, we can use them to create a simplified weighted network. We do so by concatenating all repeated edges into a single edge with the weight as the frequency. For that, we start by first assigning weights of 1 to each node and edge. Lastly, we use the function simplify
to remove the duplicated edges and aggregate the weights (edge.attr.comb = list(weight = "sum", "ignore")
) while all other edge attributes will be ignored.
V(DLT1)$weight <- 1
E(DLT1)$weight <- 1
<- simplify(DLT1, remove.multiple = TRUE, remove.loops = TRUE,
simple.DLT1 edge.attr.comb = list(weight = "sum", "ignore"))
There are two important functions that we may need if we want to divide or create a subnet of the network. The function subgraph.edges
allows us to create a subset of a network based on edge characteristics. In the following example, we create a subgraph with the discussions involving Curriculum & Instruction by using the argument E(DLT1)$'Discussion Category'== 'Curriculum & Instruction'
. In the same way, the induced_subgraph function allows us to specify a subgraph based on the vertex characteristics. In the next example, we create a network for North Carolina teachers using V(DLT1)$location== "NC"
.
<- subgraph.edges(DLT1,
k eids = which(E(DLT1)$'Discussion Category' == 'Curriculum & Instruction'))
<- induced_subgraph(DLT1, vids = which(V(DLT1)$location == "NC")) NC_network
3.4 Individual vertex measures (centrality measures)
We have discussed the centrality measures in the introductory section and how they can be used in an educational context. Centrality measures may serve many functions, —e.g., indicators of performance in collaborative environments [49] or indicators for roles in collaboration [21]. The igraph
package allows the calculation of several centrality measures, many of which have been used in educational research and some of which may not be relevant. Other packages, such as centiserve
, allow an even larger number of centrality measures [50]. In this section, we will focus on the common centrality measures according to the recent meta-analysis by [49] and other recently used measures, such as diffusion centrality measures.
Degree centrality measures can be computed using the function degree and the argument mode specifies the type of degree where mode="in"
returns in-degree, mode="out"
returns out-degree, mode="total"
returns total degree centrality. In case of undirected networks, the mode argument is ignored and the function returns only the degree centrality, since there is no direction. We can combine all the centralities that we calculate together in a dataframe using the tibble function from the tibble package.
<- degree(DLT1, mode = "in")
InDegree <- degree(DLT1, mode = "out")
OutDegree <- degree(DLT1, mode = "total")
Degree <- tibble::tibble(name = V(DLT1)$name, InDegree, OutDegree, Degree)
Degree_df print(Degree_df)
# A tibble: 445 × 4
name InDegree OutDegree Degree
<chr> <dbl> <dbl> <dbl>
1 1 20 33 53
2 2 2 5 7
3 3 2 4 6
4 4 2 14 16
5 5 16 17 33
6 6 9 24 33
7 7 32 26 58
8 8 13 18 31
9 9 2 12 14
10 10 8 12 20
# ℹ 435 more rows
Note that igraph
has another function called graph.strength that computes the degree centrality and takes the edge weight attribute into account. In multigraph networks —like ours— degree centrality and graph.strength
should return the same result. However, in networks where the edges have a weight
attribute both functions (degree
and graph.strength
) return different results. In such weighted networks —like the simplified network we created in the previous example— the degree
function will return the unique connections of every vertex or the size of the vertex direct collaborators —known as the size of the ego network. The graph.strength
function will return the number of interactions a vertex has made. See the next example and compare the results. For more information about the different calculation methods of degree centrality of weighted networks, readers are advised to refer to the seminal article by [51].
<- graph.strength(DLT1, mode = "in") InStrength
Warning: `graph.strength()` was deprecated in igraph 2.0.0.
ℹ Please use `strength()` instead.
<- graph.strength(DLT1,mode = "out")
OutStrength <- graph.strength(DLT1, mode = "total")
Strength <- tibble::tibble(name=V(DLT1)$name,InStrength,OutStrength,Strength)
Strength_df print(Strength_df)
# A tibble: 445 × 4
name InStrength OutStrength Strength
<chr> <dbl> <dbl> <dbl>
1 1 20 33 53
2 2 2 5 7
3 3 2 4 6
4 4 2 14 16
5 5 16 17 33
6 6 9 24 33
7 7 32 26 58
8 8 13 18 31
9 9 2 12 14
10 10 8 12 20
# ℹ 435 more rows
Closeness and betweenness centralities are the most commonly used centrality measures according to [49] and both rely on the position of the vertex on the shortest paths between others. Closeness centrality can be calculated using the function closeness, which is directional; this means that we can compute in-closeness, out-closeness and total closeness centrality. Betweenness centrality can be computed using the function betweenness and the function is directional: the argument directed=TRUE
computes the directional version, and vice versa. Another commonly used centrality measure is eigenvector centrality, which can be computed using the function eigen_centrality.
The eigen_centrality
function default is directed=FALSE
, since it is less suited for directed networks [14]. Pagerank is another closely related centrality measure that uses a similar algorithm and is more suitable for directed networks. The Pagerank centrality can be calculated using the function pagerank.
Please note that to obtain the value of the centrality, you need to use $vector
at the end as demonstrated in the code.
An important question here is whether to compute these centralities with a simplified network, weighted network or a multigraph network. The answer depends on the context, the network structure and the research question. However, evidence suggests that multigraph configuration may render the most accurate results when centralities are used as indicators for performance [22]. The code below computes the aforementioned centralities, you may need to read the help of each centrality function for more options and arguments for customization:
<- closeness(DLT1, mode = c("in"))
Closeness_In <- closeness(DLT1, mode = c("out"))
Closeness_Out <- closeness(DLT1, mode = c("total"))
Closeness_total
<- betweenness(simple.DLT1, directed = FALSE)
Betweenness <- eigen_centrality(simple.DLT1, directed = FALSE)$vector
Eigen <- page.rank(DLT1, directed = FALSE)$vector Pagerank
Warning: `page.rank()` was deprecated in igraph 2.0.0.
ℹ Please use `page_rank()` instead.
Diffusion centralities have been introduced recently in several studies and seem to offer a more robust estimation of a vertex role in spreading information [19, 52]. Diffusion centrality can be computed in the same way as degree centrality. However, there is no function in the igraph
package to calculate this centrality, so we rely on the diffusion.degree
function from the centiserve
package. The function diffusion.degree
accepts the mode argument to compute different variants, i.e., "in"
, "out"
and "total"
diffusion degrees.
library(centiserve)
<- diffusion.degree(DLT1, mode = c("in"))
Diffusion.degree_in <- diffusion.degree(DLT1, mode = c("out"))
Diffusion.degree_out <- diffusion.degree(DLT1, mode = c("all")) Diffusion.degree
Coreness and cross-clique connectivity are related centralities that estimate the embeddedness of the vertex in the network can be calculated using the functions coreness
and crossclique.
Both coreness
and crossclique
centralities have been shown to better correlate with performance as well as with productive and reply-worthy content [19].
<- coreness(DLT1)
Coreness <- crossclique(DLT1) Cross_clique_connectivity
Warning in cliques(graph): At vendor/cigraph/src/cliques/cliquer_wrapper.c:42 :
Edge directions are ignored for clique calculations.
We can also combine the rest of the centralities together in a single dataframe:
<- tibble::tibble(name=V(DLT1)$name,Closeness_total,Betweenness,
Centdf
Eigen,Pagerank,Diffusion.degree,Coreness,Cross_clique_connectivity)print(Centdf)
# A tibble: 445 × 8
name Closeness_total Betweenness Eigen Pagerank Diffusion.degree Coreness
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.00110 1258. 0.206 0.00840 1865 18
2 2 0.000808 26.5 0.0107 0.00140 218 6
3 3 0.000799 30.6 0.00862 0.00130 191 6
4 4 0.00102 72.5 0.0803 0.00273 965 13
5 5 0.00106 309. 0.162 0.00525 1508 18
6 6 0.00108 250. 0.155 0.00539 1607 18
7 7 0.00111 1935. 0.230 0.00931 2088 21
8 8 0.00106 164. 0.136 0.00503 1483 18
9 9 0.00104 69.5 0.119 0.00251 1216 13
10 10 0.00106 716. 0.0875 0.00343 1432 17
# ℹ 435 more rows
# ℹ 1 more variable: Cross_clique_connectivity <int>
The calculation of graph level measures and centrality measures are usually a step in the analysis to answer a research question. For instance, density can tell how distributed the interactions between students are and therefore, how it is collaborative [6]. Centrality may be calculated to identify who are the most important students in the discussions or used to infer the roles e.g., who are the leaders who drive the discussion [21, 23]. Several studies have calculated centrality measures to investigate their relationship with performance [18]. All of such types of analysis can be performed using the analysis we have demonstrated. Of course, there are no limits to the potentials of SNA and researchers have a wide range of possibilities and potentials that they can achieve by building on the aforementioned tutorials.
4 Discussion
The present chapter offered a primer on social network analysis as well as a tutorial on the most common types of SNA analysis. SNA is a vast field with diverse applications that are far beyond a chapter or even a whole book. Readers who are interested in expanding their knowledge about SNA are advised to read the literature cited in this chapter. Furthermore, several systematic reviews have tried to offer a synthesis of the extant literature and can help the readers get an idea about the status of SNA research in education. Two systematic reviews, [27, 53]. – despite being relatively old– they give a useful review on the uses and applications of SNA in learning settings. For instance, the methods used by SNA researchers have been addressed in a dedicated systematic review by [8], where the authors offered a detailed review of methodological approaches used in SNA research. Centrality measures were the topic of a recent systematic review and meta-analysis that synthesized the literature and offered evidence of the association of centrality measures with academic achievement [49].
A more recent scientometric study by [1] offers a comprehensive review of all research on network analysis and network science across the past five decades. The study also offers a review of authors, countries, research themes and research foundations. Whereas not a traditional systematic review, the recent paper by [9] offers a review of the seminal papers of SNA with a methodological approach. The paper also offers recommendations for a reporting scheme for research using SNA. It is also important to mention that our chapter covered only static networks. Readers who are interested in the more advanced time varying networks, the temporal network chapter offers a great starting point [54]. Also several guides and empirical papers demonstrate examples of temporal network analysis [20, 55, 56]. Readers who want to go deeper in analysis of learning communities, the community detection chapter can be a good place [57]. Also, for readers interested in the novel methods of psychological networks, they are encouraged to read the psychological network chapter [58].
5 More reading resources
Books related to SNA that the readers can consult are:
Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R (Vol. 65). New York: Springer.
Luke, D. A. (2015). A user’s guide to network analysis in R (Vol. 72, No. 10.1007, pp. 978-3). New York: Springer.
Newman, Mark. Networks. Oxford university press, 2018.
Hanneman, R. A., & Riddle, M. (2005). Introduction to social network methods. (Link)
Carolan, B. V. (2013). Social network analysis and education: Theory, methods & applications. Sage Publications.
Network Science by Albert-László Barabási (Link)
6 Acknowledgements
This paper is co-funded by the Academy of Finland the project Towards precision education: Idiographic learning analytics (TOPEILA), Decision Number 350560