17  Temporal network analysis: Introduction, methods and analysis with R

Author

Mohammed Saqr

Abstract
Learning involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time. Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework. Temporal networks allow modeling of the temporal learning processes i.e., the emergence and flow of activities, communities, and social processes through fine-grained dynamic analysis. This can provide insights into phenomena like knowledge co-construction, information flow, and relationship building. This chapter introduces the basic concepts of temporal networks, their types and techniques. A detailed guide of temporal network analysis is introduced in this chapter, that starts with building the network, visualization, mathematical analysis on the node and graph level. The analysis is performed with a real-world dataset. The discussion chapter offers some extra resources for interested users who want to expand their knowledge of the technique.

1 Introduction

Learning is social and therefore, involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time [1]; that is, facilitated, curtailed or influenced at different temporal scales [2, 3]. Therefore, time has become a quintessential aspect in several learning theories, frameworks and methodological approaches to learning [35]. Modeling learning as a temporal and relational process is, nevertheless, both natural, timely and more tethered to reality [4, 6]. Traditionally, relations have been modeled with Social Network Analysis (SNA) and temporal events have been modeled with sequence analysis or process mining [3, 7].Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework [1]. Considering how important the timing and order of the learning process are, it is all-important that our analysis lens is not time-blind [8, 9]. Using time-blind methods flattens an essentially temporal process where the important details of progression are lost or distorted [10, 11]. In doing so, we miss the rhythm, the evolution and devolution of the process, we overlook the regularity and we may fail to capture the events that matter [911].

Temporal networks

Recent advances in network analysis have resulted in the emergence of the new field of temporal network analysis which combines both the relational and temporal dimensions into a single analytical framework: temporal networks, also referred to as time-varying networks, dynamic networks or evolving networks [10]. Today, temporal networks are increasingly adopted in several fields to model dynamic phenomena, e.g., information exchange, the spread of infections, or the reach of viral videos on social media [12]. Whereas temporal networks are concerned with the modeling of relationships similar to traditional social networks (i.e., static or aggregate networks), they are conceptually fundamentally different [10, 11, 13]. Additionally, temporal networks are not a simple extension of social networks, nor are they time-augmented social networks or time-weighted networks. In that, temporal networks are based on different representations of data, have a different mathematical underpinning, and use distinct visualization methods. In temporal networks, edges emerge (get activated or born) and dissolve (get deactivated or die) compared to always present edges in static social networks. Also, in temporal networks, an edge represents temporary interaction, contact, co-presence, or concurrency between two nodes interacting at a specific time. The fact that static networks represent nodes as being connected together all the time exaggerates connectivity [14, 15]. For instance, in Figure 17.1, we have five network visualizations, each network belonging to a weekday. We see that Monday, Tuesday, and Wednesday networks are relatively connected, whereas Thursday and Friday networks are disconnected. The corresponding aggregated or static network on the right is densely connected. The example in Figure 17.1 shows how a static network both conflates connectivity and obfuscates dynamics, you can read more about this example in [15]. Similarly, network measures calculated in static networks are inflated and biased -skewed towards higher values - because they ignore the temporal direction of edges allowing the edges to run back in time. Another characteristic of temporal networks is that edges have a starting time point and ending time point, the end of each edge is understandably later than the start, i.e., follows the forward-moving direction of time. Therefore, the paths in the temporal network are unidirectional or time-restricted [10, 11]. The next section discusses the temporal networks in detail.

Figure 1. Interactions are aggregated per day showing that some days show an active group e.g., Monday, and other days have an inactive group e.g., Friday. Meanwhile, the static network on the right appears dense [15]

2 The building blocks of a temporal network

2.1 Edges

In temporal networks, edges are commonly referred to as events, links, or dynamic edges. Two types of temporal networks are commonly described based on their edge type [12].

  • Contact temporal networks: In contact temporal networks, edge duration is very brief, undefined, or negligible. For example, instant messages have no obvious duration but have a clear source (sender), target (receiver), and timestamp. Figure 17.2 shows a contact temporal network where the edges are represented as sequences of contacts between nodes with no duration.

Figure 2. Example of a contact temporal network. Edges form momentarily and have no obvious duration.

  • Interval temporal networks: In interval temporal networks, each interaction has a duration. An example of such a network would be a conversation where each of the conversants talks for a certain length of time. In the interval temporal network, the duration of interactions matters and the modeling thereof helps understand the process. In Figure 17.3, we see an interval temporal network where each edge has a clear start and clear end. For example, an edge forms between node A and node B at time point 1 and dissolves at time point 3, i.e., lasts for two time points.

Figure 3. An example of an interval network. On the top, we see an example of an edge. In the bottom arrows pointing to two edges B-D and A-D, we can see that the two edges do not overlap and therefore, although they share a connection (A), we can’t assume that B is connected to A through D.

2.2 Paths, concurrency, and reachability

Paths represent the pathways that connect edges, the identification of which can help solve essential problems like the shortest path between two places in a route planning application, e.g., Google maps. In a dynamic process, the paths represent a time-respecting sequence of edges i.e., where the timing of each edge follows one another according to time passage, that is, the timestamps are incrementally increasing [10, 11]. For instance, let’s assume we have a group of students interacting about a problem, starting by defining the problem, argumenting, debating, and finding a solution. The temporal path that would represent the sequence of interactions among students in this process will be a defining->argumenting->debating->solving. We expect that the timestamp of defining precedes argumenting and argumenting precedes debating and so on. In that way, the path is unidirectional, follows a time-ordered sequence, and requires that each node is temporally connected, i.e., the two nodes coexist or interact with each other at the same time [11]. Such temporal co-presence is known as concurrent. Concurrency defines the duration of the nodes where they were co-present together and therefore can be a measure of the magnitude of contact between the two nodes. This is particularly important when we are modeling processes where the path length matters e.g., social influence. A student is more likely to be influenced by an idea when the student discusses the idea with another for a longer period of time. Similarly, self-regulation could be more meaningful when phases are more concurrent rather than disconnected [3]. Reachability is the proportion of nodes that can be reached from a node using time-respecting paths. A node is more influential or central, if it can reach a larger number of nodes [12].

2.3 Nodes

Nodes in temporal networks are similar to static networks at large. Such nodes can be humans, objects, semantics, historical events or chemical reactions to mention a few. Perhaps, the possible difference —if it at all exists— is that temporal network tend to be studied in fields where temporal order is consequential e.g., epidemics, linguistics and spread of ideas.

3 Previous work and examples of temporal network analysis

Few studies have addressed temporal network analysis. Yet, some examples exist that may shed light on the novel framework and how it can be harnessed in education. In a study by Saqr and Nouri [15], the authors investigated how students interact in a problem-based learning environment using temporal networks. The study estimated temporal centrality measures, used temporal network visualization, and examined the predictive power of temporal centrality measures. The study reported rhythmic changes in centrality measures, network properties as well as the way students mix online. The study also found that temporal centrality measures were predictive of students” performance from as early as the second day of the course. Models that included temporal centrality measures have performed consistently better and from as early as the first week of the course. Another study by [9] analyzed students’ interactions in an online collaborative environment where students interacted in Facebook groups. The authors compared centrality measures from traditional social networks to temporal centrality measures and found that temporal centralities are more predictive of performance. Another study from the same group has used chat messages to study how students interact online and how temporal networks can shed light on different dynamics of students interacting using Discord instant messaging platform compared to students interacting using the forums in Moodle. Temporal networks were more informative in capturing the differences in dynamics and how such dynamics affected students” way of communicating [16].

4 Tutorial: Building a temporal network

Temporal network is a relatively new field with an emerging repertoire of methods that are continuously expanding. As we currently stand, a coherent tutorial that combines all possible steps of the analysis does not exist, and that is what this chapter aims to fill. The tutorial will introduce the R packages, visualization and mathematical analysis e.g., graph and node level centrality measures.
The first step is to load the needed packages. Unlike the SNA chapter [17] where we relied on the igraph framework, we will rely on the statnet framework that has a rich repertoire of temporal network packages. We will use three main packages, namely tsna (Temporal Social Network Analysis) which provides most functions for dealing with temporal networks as an extension for the popular sna package. The package networkDynamic offers several complementary functions for the network manipulation, whereas the package ndtv (Network Dynamic Temporal Visualization) offers several functions for visualizing temporal networks. To learn more about these packages, please visit their help pages. The next code chunk loads these packages as well as tidyverse packages to process the network dataframe [20]. We also need tidyverse for manipulating the file and preparing the data.

library(tsna)
library(ndtv)
library(networkDynamic)
library(tidyverse)
library(rio)

To create a temporal network, we need a timestamped file with interactions. The essential fields are the source, target and time, and perhaps also some information about the interactions or the nodes (but these constitute extra information that is good to have). A temporal network is created by combining a base static network (that has the network base information) and a dynamic network with time information. As such, we need to prepare the Massive Open Online Course (MOOC) dataset described in detail here [21] and prepare it for creating a static network that will serve as a base network.

The next code chunk loads the dataset files (edges and nodes data) from the MOOCs dataset. Some cleaning of the data is necessary.

net_edges <- import("https://raw.githubusercontent.com/lamethods/data/main/6_snaMOOC/DLT1%20Edgelist.csv")
net_nodes <- import("https://raw.githubusercontent.com/lamethods/data/main/6_snaMOOC/DLT1%20Nodes.csv")

First, we have to clean the column names from extra spaces using the function clean_names from the janitor package. Next, we have to remove loops, or instances where the source and target of the interaction are the same since it makes little sense that a person responds to oneself in a temporal network (this is not essential). Third, we need to create a dataframe where we replace duplicate edges with a weight equal to the frequency of repeated interactions, we will need this file for the creation of the base network (see later). Fourth, we recode the expertise level in the nodes file to meaningful codes (from its original numerical coding as 1,2,3) so that we can use them later in the analysis. The fifth step is to convert the timestamp to sequential days starting from the first day of the course; this makes sense for easy interpretation. Also, time works better in networkDynamic when it is numeric. The final step is to remove discussions where there are no replies. This cleaning is necessary since we have a dataset that was not essentially prepared for temporal networks.

net_edges <- net_edges |> janitor::clean_names() #1 cleaning column names
net_edges_NL <- net_edges |> filter(sender != receiver) #2 removing loops

## Removing duplicates and replacing them with weight
net_edges_NLW <- net_edges_NL |> group_by(sender, receiver) |> tally(name = "weight") #3

## Recoding expertise
net_nodes <- net_nodes |> 
  mutate(expert_level = case_match(experience, #4
        1 ~"Expert",
        2 ~ "Student",
        3 ~ "Teacher"))

## A function to create serial days
dayizer = function(my_date) {
  numeric_date = lubridate::parse_date_time(my_date, "mdy HM")
  Min_time = min(numeric_date)
  my_date = (numeric_date - Min_time) / (24*60*60)
  my_date = round(my_date,2)
  return(as.numeric(my_date))
}

net_edges_NL$new_date = dayizer(net_edges_NL$timestamp) #5

## Remove dicussions with no interactions
net_edges_NL <- net_edges_NL |> group_by (discussion_title) |> filter(n() > 1)

As mentioned before, the first step in creating a temporal network is creating a static base network (base network) which carries all the information about the network, e.g., the nodes, edges as well as their attributes. The base network is typically a static weighted network. Here we define the base network file (the weighted edge file we created before), we use directed = TRUE to create our network as directed and we tell the network function that the vertices attributes are in the net_nodes file.

NetworkD <- network(net_edges_NLW, directed = TRUE, matrix.type = "edgelist", 
                    loops = FALSE, multiple = FALSE, vertices = net_nodes)

For creating a temporal network, we need more than the source and the target commonly needed for the static network. In particular, the following variables are required to be defined.

  • tail: the source of the interaction

  • head: the target of the interaction

  • onset: The starting time of the interaction

  • terminus: the end time of the interaction

  • duration: the duration of the interaction

Our dataset —which comes from forum MOOC interactions, see net_edges below— has an obvious starting time (which is the timestamp of each interaction) but has no clear end time. There is no straightforward answer to this question. Nonetheless, a possible way to consider the duration of every post is the duration the post was active in the discussion or continued to be discussed. That is the time from the post in a discussion thread to the last post in the same threads of replies. Such a method —while far from perfect— offers a rough method for estimating the time during which this interaction has been “active” in the discussion [15, 22]. For an illustration, see Figure 17.4 which shows the duration for the first and second posts.

net_edges

Figure 4. A sample discussion demonstrating a way to compute the duration of the post. The first post lasts active in the discussion from starting time to the last reply it stimulated in the same thread (D1)

The next code chunk creates a variable for the starting time of each interaction, computes the ending time where this post was part of an active discussion, and then computes the duration.

# Create the required variables (start, end, and duration) defined by

net_edges_NL <- net_edges_NL |> group_by(discussion_title) |>
mutate(start = min(new_date), end = max(new_date), duration = end - start)

In the same way, the second duration is computed in the same way (D2). The next step of the analysis creates a dataframe where all the needed information for the network is specified and in the next step we simply use the networkDynamic with two arguments, the base network and the dataframe with all the temporal network information created in the previous step. Nonetheless, dealing with around 450 nodes in a network is hard, it becomes impossible to visualize or get insights from a large number of crowded nodes. So, for the sake of simplicity of demonstration in this tutorial, we will create a smaller subset of the network of people who had a reasonable number of interactions (degree more than 20) using get.inducedSubgraph argument. The resulting network is then called Active_Network which we will analyze.

## Creating a dataframe with needed variables

edge_spells <- data.frame("onset" = net_edges_NL$start ,
                         "terminus" = net_edges_NL$end, 
                         "tail" = net_edges_NL$sender, 
                         "head" = net_edges_NL$receiver, 
                         "onset.censored" = FALSE,
                         "terminus.censored" = FALSE, 
                         "duration" = net_edges_NL$duration)

## Creating the dynamic network network
Dynamic_network <- networkDynamic(NetworkD, edge.spells = edge_spells)
Edge activity in base.net was ignored
Created net.obs.period to describe network
 Network observation period info:
  Number of observation spells: 1 
  Maximal time range observed: 0 until 72.01 
  Temporal mode: continuous 
  Time unit: unknown 
  Suggested time increment: NA 
Active_Network <- get.inducedSubgraph(Dynamic_network,
                                v = which(degree(Dynamic_network) > 20))

We can then confirm that the network has been created correctly using the print function. As the output shows, we have 521 distinct time changes, 72 days, 445 vertices and 1936 edges. We can also use the function plot to see how the network looks. The argument pad helps us remove the additional whitespace around the network. Plotting a temporal network helps summarize all the interactions in the network. As we can see in Figure 17.5, the network is dense with several edges between interacting students.

print(Dynamic_network)
NetworkDynamic properties:
  distinct change times: 495 
  maximal time range: 0 until  72.01 

Includes optional net.obs.period attribute:
 Network observation period info:
  Number of observation spells: 1 
  Maximal time range observed: 0 until 72.01 
  Temporal mode: continuous 
  Time unit: unknown 
  Suggested time increment: NA 

 Network attributes:
  vertices = 445 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  net.obs.period: (not shown)
  total edges= 1936 
    missing edges= 0 
    non-missing edges= 1936 

 Vertex attribute names: 
    connect country experience experience2 expert expert_level Facilitator gender grades group location region role1 vertex.names 

 Edge attribute names not shown 
plot.network(Active_Network, pad = -0.5)

Figure 5. A plot of the full network as plotted by the plot function.

4.1 Visualization of temporal networks

To take advantage of the temporal network, we can use a function to extract the network at certain times to explore the activity. In the next example in Figure 17.6, we chose the first four weeks one by one and plotted them alongside each other. The function filmstrip can create a similar output with a snapshot of the network at several time intervals.

plot.network(network.extract(Active_Network, onset = 1, terminus = 7))
plot.network(network.extract(Active_Network, onset = 8, terminus = 14))
plot.network(network.extract(Active_Network, onset = 15, terminus = 21))
plot.network(network.extract(Active_Network, onset = 22, terminus = 28))

Figure 6. A plot of the network at weeks 1, 2, 3, and 4

A similar result, yet with three-dimensional placing, can also be obtained with the timePrism function, as shown in Figure 17.7. You may need to consult the package manual to get more information about the arguments and options for the plots.

compute.animation(Active_Network)
slice parameters:
  start:0
  end:72.01
  interval:1
  aggregate.dur:1
  rule:latest
timePrism(Active_Network, at = c(1, 7, 14, 21),
  spline.lwd = 1,
  box = TRUE,
  angle = 60,
  axis = TRUE,
  planes = TRUE,
  plane.col = "#FFFFFF99",
  scale.y = 1,
  orientation = c("z", "x", "y"))

Figure 7. A three-dimensional visualization of the network as a sequence of snapshots in space and time prism

However, a better way is to take advantage of the capabilities of the package ndtv and the network temporal information by rendering a full animated movie of the network and explore each and every event as it happens.

render.d3movie(Active_Network)