library(tsna)
library(ndtv)
library(networkDynamic)
library(tidyverse)
library(rio)
17 Temporal network analysis: Introduction, methods and analysis with R
1 Introduction
Learning is social and therefore, involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time [1]; that is, facilitated, curtailed or influenced at different temporal scales [2, 3]. Therefore, time has become a quintessential aspect in several learning theories, frameworks and methodological approaches to learning [3–5]. Modeling learning as a temporal and relational process is, nevertheless, both natural, timely and more tethered to reality [4, 6]. Traditionally, relations have been modeled with Social Network Analysis (SNA) and temporal events have been modeled with sequence analysis or process mining [3, 7].Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework [1]. Considering how important the timing and order of the learning process are, it is all-important that our analysis lens is not time-blind [8, 9]. Using time-blind methods flattens an essentially temporal process where the important details of progression are lost or distorted [10, 11]. In doing so, we miss the rhythm, the evolution and devolution of the process, we overlook the regularity and we may fail to capture the events that matter [9–11].
Temporal networks
Recent advances in network analysis have resulted in the emergence of the new field of temporal network analysis which combines both the relational and temporal dimensions into a single analytical framework: temporal networks, also referred to as time-varying networks, dynamic networks or evolving networks [10]. Today, temporal networks are increasingly adopted in several fields to model dynamic phenomena, e.g., information exchange, the spread of infections, or the reach of viral videos on social media [12]. Whereas temporal networks are concerned with the modeling of relationships similar to traditional social networks (i.e., static or aggregate networks), they are conceptually fundamentally different [10, 11, 13]. Additionally, temporal networks are not a simple extension of social networks, nor are they time-augmented social networks or time-weighted networks. In that, temporal networks are based on different representations of data, have a different mathematical underpinning, and use distinct visualization methods. In temporal networks, edges emerge (get activated or born) and dissolve (get deactivated or die) compared to always present edges in static social networks. Also, in temporal networks, an edge represents temporary interaction, contact, co-presence, or concurrency between two nodes interacting at a specific time. The fact that static networks represent nodes as being connected together all the time exaggerates connectivity [14, 15]. For instance, in Figure 17.1, we have five network visualizations, each network belonging to a weekday. We see that Monday, Tuesday, and Wednesday networks are relatively connected, whereas Thursday and Friday networks are disconnected. The corresponding aggregated or static network on the right is densely connected. The example in Figure 17.1 shows how a static network both conflates connectivity and obfuscates dynamics, you can read more about this example in [15]. Similarly, network measures calculated in static networks are inflated and biased -skewed towards higher values - because they ignore the temporal direction of edges allowing the edges to run back in time. Another characteristic of temporal networks is that edges have a starting time point and ending time point, the end of each edge is understandably later than the start, i.e., follows the forward-moving direction of time. Therefore, the paths in the temporal network are unidirectional or time-restricted [10, 11]. The next section discusses the temporal networks in detail.
2 The building blocks of a temporal network
2.1 Edges
In temporal networks, edges are commonly referred to as events, links, or dynamic edges. Two types of temporal networks are commonly described based on their edge type [12].
- Contact temporal networks: In contact temporal networks, edge duration is very brief, undefined, or negligible. For example, instant messages have no obvious duration but have a clear source (sender), target (receiver), and timestamp. Figure 17.2 shows a contact temporal network where the edges are represented as sequences of contacts between nodes with no duration.
- Interval temporal networks: In interval temporal networks, each interaction has a duration. An example of such a network would be a conversation where each of the conversants talks for a certain length of time. In the interval temporal network, the duration of interactions matters and the modeling thereof helps understand the process. In Figure 17.3, we see an interval temporal network where each edge has a clear start and clear end. For example, an edge forms between node A and node B at time point 1 and dissolves at time point 3, i.e., lasts for two time points.
2.2 Paths, concurrency, and reachability
Paths represent the pathways that connect edges, the identification of which can help solve essential problems like the shortest path between two places in a route planning application, e.g., Google maps. In a dynamic process, the paths represent a time-respecting sequence of edges i.e., where the timing of each edge follows one another according to time passage, that is, the timestamps are incrementally increasing [10, 11]. For instance, let’s assume we have a group of students interacting about a problem, starting by defining the problem, argumenting, debating, and finding a solution. The temporal path that would represent the sequence of interactions among students in this process will be a defining->argumenting->debating->solving. We expect that the timestamp of defining precedes argumenting and argumenting precedes debating and so on. In that way, the path is unidirectional, follows a time-ordered sequence, and requires that each node is temporally connected, i.e., the two nodes coexist or interact with each other at the same time [11]. Such temporal co-presence is known as concurrent. Concurrency defines the duration of the nodes where they were co-present together and therefore can be a measure of the magnitude of contact between the two nodes. This is particularly important when we are modeling processes where the path length matters e.g., social influence. A student is more likely to be influenced by an idea when the student discusses the idea with another for a longer period of time. Similarly, self-regulation could be more meaningful when phases are more concurrent rather than disconnected [3]. Reachability is the proportion of nodes that can be reached from a node using time-respecting paths. A node is more influential or central, if it can reach a larger number of nodes [12].
2.3 Nodes
Nodes in temporal networks are similar to static networks at large. Such nodes can be humans, objects, semantics, historical events or chemical reactions to mention a few. Perhaps, the possible difference —if it at all exists— is that temporal network tend to be studied in fields where temporal order is consequential e.g., epidemics, linguistics and spread of ideas.
3 Previous work and examples of temporal network analysis
Few studies have addressed temporal network analysis. Yet, some examples exist that may shed light on the novel framework and how it can be harnessed in education. In a study by Saqr and Nouri [15], the authors investigated how students interact in a problem-based learning environment using temporal networks. The study estimated temporal centrality measures, used temporal network visualization, and examined the predictive power of temporal centrality measures. The study reported rhythmic changes in centrality measures, network properties as well as the way students mix online. The study also found that temporal centrality measures were predictive of students” performance from as early as the second day of the course. Models that included temporal centrality measures have performed consistently better and from as early as the first week of the course. Another study by [9] analyzed students’ interactions in an online collaborative environment where students interacted in Facebook groups. The authors compared centrality measures from traditional social networks to temporal centrality measures and found that temporal centralities are more predictive of performance. Another study from the same group has used chat messages to study how students interact online and how temporal networks can shed light on different dynamics of students interacting using Discord instant messaging platform compared to students interacting using the forums in Moodle. Temporal networks were more informative in capturing the differences in dynamics and how such dynamics affected students” way of communicating [16].
4 Tutorial: Building a temporal network
Temporal network is a relatively new field with an emerging repertoire of methods that are continuously expanding. As we currently stand, a coherent tutorial that combines all possible steps of the analysis does not exist, and that is what this chapter aims to fill. The tutorial will introduce the R packages, visualization and mathematical analysis e.g., graph and node level centrality measures.
The first step is to load the needed packages. Unlike the SNA chapter [17] where we relied on the igraph
framework, we will rely on the statnet
framework that has a rich repertoire of temporal network packages. We will use three main packages, namely tsna
(Temporal Social Network Analysis) which provides most functions for dealing with temporal networks as an extension for the popular sna
package. The package networkDynamic
offers several complementary functions for the network manipulation, whereas the package ndtv
(Network Dynamic Temporal Visualization) offers several functions for visualizing temporal networks. To learn more about these packages, please visit their help pages. The next code chunk loads these packages as well as tidyverse
packages to process the network dataframe [20]. We also need tidyverse
for manipulating the file and preparing the data.
To create a temporal network, we need a timestamped file with interactions. The essential fields are the source
, target
and time
, and perhaps also some information about the interactions or the nodes (but these constitute extra information that is good to have). A temporal network is created by combining a base static network (that has the network base information) and a dynamic network with time information. As such, we need to prepare the Massive Open Online Course (MOOC) dataset described in detail here [21] and prepare it for creating a static network that will serve as a base network.
The next code chunk loads the dataset files (edges and nodes data) from the MOOCs dataset. Some cleaning of the data is necessary.
<- import("https://raw.githubusercontent.com/lamethods/data/main/6_snaMOOC/DLT1%20Edgelist.csv")
net_edges <- import("https://raw.githubusercontent.com/lamethods/data/main/6_snaMOOC/DLT1%20Nodes.csv") net_nodes
First, we have to clean the column names from extra spaces using the function clean_names
from the janitor
package. Next, we have to remove loops, or instances where the source and target of the interaction are the same since it makes little sense that a person responds to oneself in a temporal network (this is not essential). Third, we need to create a dataframe where we replace duplicate edges with a weight equal to the frequency of repeated interactions, we will need this file for the creation of the base network (see later). Fourth, we recode the expertise level in the nodes file to meaningful codes (from its original numerical coding as 1,2,3) so that we can use them later in the analysis. The fifth step is to convert the timestamp to sequential days starting from the first day of the course; this makes sense for easy interpretation. Also, time works better in networkDynamic when it is numeric. The final step is to remove discussions where there are no replies. This cleaning is necessary since we have a dataset that was not essentially prepared for temporal networks.
<- net_edges |> janitor::clean_names() #1 cleaning column names
net_edges <- net_edges |> filter(sender != receiver) #2 removing loops
net_edges_NL
## Removing duplicates and replacing them with weight
<- net_edges_NL |> group_by(sender, receiver) |> tally(name = "weight") #3
net_edges_NLW
## Recoding expertise
<- net_nodes |>
net_nodes mutate(expert_level = case_match(experience, #4
1 ~"Expert",
2 ~ "Student",
3 ~ "Teacher"))
## A function to create serial days
= function(my_date) {
dayizer = lubridate::parse_date_time(my_date, "mdy HM")
numeric_date = min(numeric_date)
Min_time = (numeric_date - Min_time) / (24*60*60)
my_date = round(my_date,2)
my_date return(as.numeric(my_date))
}
$new_date = dayizer(net_edges_NL$timestamp) #5
net_edges_NL
## Remove dicussions with no interactions
<- net_edges_NL |> group_by (discussion_title) |> filter(n() > 1) net_edges_NL
As mentioned before, the first step in creating a temporal network is creating a static base network (base network) which carries all the information about the network, e.g., the nodes, edges as well as their attributes. The base network is typically a static weighted network. Here we define the base network file (the weighted edge file we created before), we use directed = TRUE
to create our network as directed and we tell the network
function that the vertices attributes are in the net_nodes file.
<- network(net_edges_NLW, directed = TRUE, matrix.type = "edgelist",
NetworkD loops = FALSE, multiple = FALSE, vertices = net_nodes)
For creating a temporal network, we need more than the source and the target commonly needed for the static network. In particular, the following variables are required to be defined.
tail
: the source of the interactionhead
: the target of the interactiononset
: The starting time of the interactionterminus
: the end time of the interactionduration
: the duration of the interaction
Our dataset —which comes from forum MOOC interactions, see net_edges
below— has an obvious starting time (which is the timestamp of each interaction) but has no clear end time. There is no straightforward answer to this question. Nonetheless, a possible way to consider the duration of every post is the duration the post was active in the discussion or continued to be discussed. That is the time from the post in a discussion thread to the last post in the same threads of replies. Such a method —while far from perfect— offers a rough method for estimating the time during which this interaction has been “active” in the discussion [15, 22]. For an illustration, see Figure 17.4 which shows the duration for the first and second posts.
net_edges
The next code chunk creates a variable for the starting time of each interaction, computes the ending time where this post was part of an active discussion, and then computes the duration.
# Create the required variables (start, end, and duration) defined by
<- net_edges_NL |> group_by(discussion_title) |>
net_edges_NL mutate(start = min(new_date), end = max(new_date), duration = end - start)
In the same way, the second duration is computed in the same way (D2). The next step of the analysis creates a dataframe where all the needed information for the network is specified and in the next step we simply use the networkDynamic
with two arguments, the base network and the dataframe with all the temporal network information created in the previous step. Nonetheless, dealing with around 450 nodes in a network is hard, it becomes impossible to visualize or get insights from a large number of crowded nodes. So, for the sake of simplicity of demonstration in this tutorial, we will create a smaller subset of the network of people who had a reasonable number of interactions (degree more than 20) using get.inducedSubgraph
argument. The resulting network is then called Active_Network
which we will analyze.
## Creating a dataframe with needed variables
<- data.frame("onset" = net_edges_NL$start ,
edge_spells "terminus" = net_edges_NL$end,
"tail" = net_edges_NL$sender,
"head" = net_edges_NL$receiver,
"onset.censored" = FALSE,
"terminus.censored" = FALSE,
"duration" = net_edges_NL$duration)
## Creating the dynamic network network
<- networkDynamic(NetworkD, edge.spells = edge_spells) Dynamic_network
Edge activity in base.net was ignored
Created net.obs.period to describe network
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 72.01
Temporal mode: continuous
Time unit: unknown
Suggested time increment: NA
<- get.inducedSubgraph(Dynamic_network,
Active_Network v = which(degree(Dynamic_network) > 20))
We can then confirm that the network has been created correctly using the print
function. As the output shows, we have 521 distinct time changes, 72 days, 445 vertices and 1936 edges. We can also use the function plot
to see how the network looks. The argument pad
helps us remove the additional whitespace around the network. Plotting a temporal network helps summarize all the interactions in the network. As we can see in Figure 17.5, the network is dense with several edges between interacting students.
print(Dynamic_network)
NetworkDynamic properties:
distinct change times: 495
maximal time range: 0 until 72.01
Includes optional net.obs.period attribute:
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 72.01
Temporal mode: continuous
Time unit: unknown
Suggested time increment: NA
Network attributes:
vertices = 445
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period: (not shown)
total edges= 1936
missing edges= 0
non-missing edges= 1936
Vertex attribute names:
connect country experience experience2 expert expert_level Facilitator gender grades group location region role1 vertex.names
Edge attribute names not shown
plot.network(Active_Network, pad = -0.5)
4.1 Visualization of temporal networks
To take advantage of the temporal network, we can use a function to extract the network at certain times to explore the activity. In the next example in Figure 17.6, we chose the first four weeks one by one and plotted them alongside each other. The function filmstrip
can create a similar output with a snapshot of the network at several time intervals.
plot.network(network.extract(Active_Network, onset = 1, terminus = 7))
plot.network(network.extract(Active_Network, onset = 8, terminus = 14))
plot.network(network.extract(Active_Network, onset = 15, terminus = 21))
plot.network(network.extract(Active_Network, onset = 22, terminus = 28))
A similar result, yet with three-dimensional placing, can also be obtained with the timePrism
function, as shown in Figure 17.7. You may need to consult the package manual to get more information about the arguments and options for the plots.
compute.animation(Active_Network)
slice parameters:
start:0
end:72.01
interval:1
aggregate.dur:1
rule:latest
timePrism(Active_Network, at = c(1, 7, 14, 21),
spline.lwd = 1,
box = TRUE,
angle = 60,
axis = TRUE,
planes = TRUE,
plane.col = "#FFFFFF99",
scale.y = 1,
orientation = c("z", "x", "y"))
However, a better way is to take advantage of the capabilities of the package ndtv
and the network temporal information by rendering a full animated movie of the network and explore each and every event as it happens.
render.d3movie(Active_Network)