# install.packages("rENA", repos = c("https://cran.qe-libs.org", "https://cran.rstudio.org"))
18 Epistemic Network Analysis and Ordered Network Analysis in Learning Analytics
1 Introduction
This chapter provides a tutorial on conducting epistemic network analysis (ENA) and ordered network analysis (ONA) using R. We introduce these two techniques together because they share similar theoretical foundations, but each addresses a different challenge for analyzing large-scale qualitative data on learning processes.
ENA and ONA are methods for quantifying, visualizing, and interpreting network data. Taking coded data as input, ENA and ONA represent associations between codes in undirected or directed weighted network models, respectively. Both techniques measure the strength of association among codes and illustrate the structure of connections in network graphs, and they quantify changes in the composition and strength of those connections over time. Importantly, ENA and ONA enable comparison of networks both visually and via summary statistics, so they can be used to explore a wide range of research questions in contexts where patterns of association in coded data are hypothesized to be meaningful and where comparing those patterns across individuals or groups is important.
In the following sections, we will (1) briefly review literature relevant to the application of ENA and ONA, (2) provide a step-by-step guide to implementing ENA and ONA in R, and (3) suggest additional resources and examples for further exploration. By the end of this chapter, readers will be able to apply these techniques in their own research.
2 Literature review
2.1 Epistemic network analysis (ENA)
ENA is a method for identifying and quantifying connections in coded data and representing them in undirected weighted network models (Shaffer et al., 2016). There are two key features that differentiate ENA from other networks analysis tools or multivariate analyses: (1) ENA produces summary statistics that can be used to compare the differences in the content of networks rather than just their structure; and (2) ENA network visualizations provide information that is mathematically consistent with those summary statistics, which facilitates meaningful interpretation of statistical differences (Bowman et al., 2021). These features enable researchers to analyze a wide range of phenomena in learning analytics, including complex thinking and knowledge construction (Csanadi et al, 2018; Oshima et al., 2018), collaborative problem solving (Bressler et al., 2019; Swiecki et al., 2020), socio-emotional aspects of learning (Prieto et al., 2021), mentoring (Zhang et al., 2022), and teacher professional development (Bauer et al., 2019; Fernandez-Nieto et al., 2021; Phillips et al., 2023)
One key feature that makes ENA an effective method in modeling collaborative interaction is that ENA can model individuals’ unique contributions to collaborative discourse while accounting for group context, and thus both individuals and groups can be analyzed in the same model. This feature is particularly valuable in collaborative learning environments, where the interactions and contributions of each individual are related and should not be treated as a series of isolated events. For example, Swiecki et al., (2020) analyzed the communications of air defense warfare teams in training exercises and found that ENA was not only able to reveal differences in individual performance identified in a qualitative analysis of the collaborative discourse, but also to test those differences statistically.
2.2 Ordered network analysis (ONA)
Ordered Network Analysis (ONA) extends the theoretical and analytical advantages of ENA to account for the order of events by producing directed weighted networks rather than undirected models (Tan et al., 2022). Like ENA, ONA takes coded data as input, identifies and measures connections among coded items, and visualizes the structure of connections in a metric space that enables both statistical and visual comparison of networks. However, ONA models the order in which codes appear in the data, enabling analysis of phenomena in which the order of events is hypothesized to be important.
For example, Tan et al. (2022) used ONA to model the performance of military teams learning to identify, assess, and respond to potential threats detected by radar. The findings demonstrate that ONA could detect qualitative differences between teams in different training conditions that were not detected with unordered models and show that they are statistically significant. In their work, Tan et al. (2022) argued that ONA possesses an advantage over methods such as Sequential Pattern Mining (SPM), which is widely used to identify frequent sequential patterns. In contrast to SPM, which prioritizes the specific micro-sequential order of events, ONA models processes by accounting for the co-temporal order of interactions between the units of analysis in response and what they are responding to. Consequently, ONA is a more appropriate methodological choice when modeling processes in ill-formed problem-solving scenarios, where collaborative interactions do not follow a prescribed sequence of steps but where the order of activities is still important.
ONA has also been used to analyze log data from online courses. For example, Fan et al. (2021) analyzed self-regulated learning tactics employed by learners in Massive Open Online Courses (MOOC) using ONA and process mining. The authors found that ONA provided more nuanced interpretations of learning tactics compared to process mining because ONA models learning tactics across four dimensions: frequency, continuity, order, and the role of specific learning actions within broader tactics.
Like ENA, ONA produces summary statistics for network comparison and mathematically consistent network visualizations that enable interpretation of statistical measures. Unlike ENA, ONA models the order in which codes appear in data, enabling researchers to investigate whether and to what extent the order of events is meaningful in a given context.
In the following sections, we provide a step-by-step guide to conducting ENA and ONA analyses in R.
3 Epistemic network analysis in R
In this section, we demonstrate how to conduct an ENA analysis using the rENA
package. If you are not familiar with ENA as an analytic technique, we recommend that you first read Shaffer & Ruis (2017) and Bowman et al. (2022) to familiarize yourself with the theoretical and methodological foundations of ENA.
3.1 Install the rENA
package and load the library
Before installing the rENA
package, be sure that you are using R version 4.1 or newer. To check your R version, type R.version
in your console. To update your R version (if needed), download and install R from the official R website: https://cran.r-project.org/
First, install the rENA
package and then load the rENA
library after installation is complete.
library(rENA)
We also install the other package that is required for accessing the view()
function (section 3.7.3) in rENA
.
# install.packages("tma", repos = c("https://cran.qe-libs.org", "https://cran.rstudio.org"))
library(tma)
3.2 Dataset
The dataset we will use as an example, RS.data
, is included in the rENA
package. Note that the RS.data
file in the package is only a subset of the full dataset, and is thus intended for demonstration purposes only.
To start, pass RS.data
from the rENA
package to a data frame named data.
= rENA::RS.data data
You can preview the input data frame to familiarize yourself with the data structure.
data
RS.data
consists of discourse from RescuShell, an online learning simulation where students work as interns at a fictitious company to solve a realistic engineering design problem in a simulated work environment. Throughout the internship, students communicate with their project teams and mentors via online chat, and these chats are recorded in the “text” column. A set of qualitative codes were applied to the data in the “text” column, where a value of 0 indicates the absence of the code and a value of 1 indicates the presence of the code in a given line.
Further details about the RS.data
dataset can be found in Shaffer & Arastoopour (2014). Analyses of data from RescuShell and other engineering virtual internships can be found in Arastoopour et al. (2016) and Chesler et al. (2015).
3.3 Construct an ENA model
To construct an ENA model, there is a function called ena
which enables researchers to set the parameters for their model. This function wraps two other functions—ena.accumulate.data
and ena.make.set
—which can be used together to achieve the same result.
In the following sections, we will demonstrate how to set each parameter and explain how different choices affect the resulting ENA model.
3.3.1 Specify units
In ENA, units can be individuals, ideas, organizations, or any other entity whose structure of connections you want to model. To set the units parameter, specify which column(s) in the data contain the variables that identify unique units.
For this example, choose the “Condition” column and the “UserName” column to define the units. The “Condition” column has two unique values: FirstGame
, and SecondGame
, representing novice users and relative expert users, respectively, as some students participated in RescuShell after having already completed a different engineering virtual internship. The “UserName” column includes unique user names for all students (n=48). This way of defining the units means that ENA will construct a network for each student in each condition.
= c("Condition", "UserName") unitCols
To verify that the units are correctly specified, subset and preview the unique values in the units columns. There are 48 units from two conditions, which means that the ENA model will produce 48 individual-level networks for each of the units, and each unit is uniquely associated with either the novice group (FirstGame
) or the relative expert group (SecondGame
).
unique(data[, unitCols])
3.3.2 Specify codes
Next, specify the columns that contain the codes. Codes are concepts whose pattern of association you want to model for each unit. ENA represent codes as nodes in the networks and co-occurrences of codes as edges. Most researchers use binary coding in ENA analyses, where the values in the code columns are either 0 (indicating that the code is not present in that line) or 1 (indicating that the code is present in that line). RS.data
contains six code columns, all of which will be used here.
To specify the code columns, enter the code column names in a vector.
= c('Data', 'Technical.Constraints', 'Performance.Parameters', 'Client.and.Consultant.Requests', 'Design.Reasoning', 'Collaboration') codeCols
To verify that the codes are correctly specified, preview the code columns selected.
data[,codeCols]
3.3.3 Specify conversations
The conversation parameter determines which lines in the data can be connected. Codes in lines that are not in the same conversation cannot be connected. For example, you may want to model connections within different time segments, such as days, or different steps in a process, such as activities.
In our example, choose the “Condition”, “GroupName”, and “ActivityNumber” columns to define the conversations. These choices indicate that connections can only happen between students who were in the same condition (FirstGame or SecondGame
) and on the same project team (group), and within the same activity. This definition of conversation reflects what actually happened in the simulation: in a given condition, students only interacted with those who were in the same group, and each activity occurred on a different day.
To specify the conversation parameter, enter the column names in a vector.
= c("Condition", "GroupName", "ActivityNumber") conversationCols
To verify that the conversations are correctly specified, subset and preview the unique values in the conversation columns.
unique(data[, conversationCols])
3.3.4 Specify the window
Once the conversation parameter is specified, a window method needs to be specified. Whereas the conversation parameter specifies which lines can be related, the window parameter determines which lines within the same conversation are related. The most common window method used in ENA is called a moving stanza window, which is what will be used here.
Briefly, a moving stanza window is a sliding window of fixed length that moves through a conversation to detect and accumulate code co-occurrences in recent temporal context. The lines within a designated stanza window are considered related to each other. For instance, if the moving stanza window is 7, then each line in the conversation is linked to the six preceding lines. See Siebert-Evenstone et al. (2017) and Ruis et al. (2019) for more detailed explanations of windows in ENA models.
Here, set the window.size.back parameter equal to 7. User can specify a different moving stanza window size by passing a different numerical value to the window.size.back
parameter.
= 7 window.size.back
The ENA package also enables use of an infinite stanza window, which assumes that lines in any part of a conversation are related. The infinite stanza window works the same way as a moving stanza window, but there is no limit on the number of previous lines that are included in the window besides the conversation itself. The infinite stanza window is less commonly used in ENA, but is specified as follows:
= "INF" window.size.back
3.3.5 Specify groups and rotation method
When specifying the units, we chose a column that indicates two conditions: FirstGame
(novice group) and SecondGame
(relative expert group). To enable comparison of students in these two conditions, three additional parameters need to be specified: groupVar
, groups
, and mean
.
= "Condition" # "Condition" is the column used as our grouping variable
groupVar = c("FirstGame", "SecondGame") # "FirstGame" and "SecondGame" are the two unique values of the "Condition" column
groups = TRUE mean
These three parameters indicate that when building the ENA model, the first dimension will maximize the difference between the two conditions: FirstGame
and SecondGame.
This difference maximization is achieved through mean = TRUE
, which specifies that a means rotation will be performed at the dimensional reduction stage. If the means rotation is set to FALSE or there aren’t two distinct groups in your data, ENA will by default use singular value decomposition (SVD) to perform the dimensional reduction. Bowman et al. (2022) provide a mathematical explanation of the methods used in ENA to perform dimensional reductions.
3.3.6 Specify metadata
The last parameter to be specified is metadata. Metadata columns are not required to construct an ENA model, but they provide information that can be used to subset units in the resulting model.
Specify the metadata columns shown below to include data on student outcomes related to reported self-confidence before and after participating in engineering virtual internships. We will use this data to demonstrate a simple linear regression analysis that can be done using ENA outputs as predictors.
= c("CONFIDENCE.Change","CONFIDENCE.Pre","CONFIDENCE.Post","C.Change") # optional metaCols
3.3.7 Construct a model
Now that all the essential parameters have been specified, the ENA model can be constructed.
To build an ENA model, we need two functions ena.accumulate.data
and ena.make.set
, and we recommend that you store the output in an object (in this case, set.ena).
=
accum.ena ena.accumulate.data(
text_data = RS.data[, 'text'],
units = data[,unitCols],
conversation = data[,conversationCols],
metadata = data[,metaCols], # optional
codes = data[,codeCols],
window.size.back = 7
)
=
set.ena ena.make.set(
enadata = accum.ena, # the accumulation run above
rotation.by = ena.rotate.by.mean, # equivalent of mean=TRUE in the ena function
rotation.params = list(
$meta.data$Condition=="FirstGame", # equivalent of groups in the ena function
accum.ena$meta.data$Condition=="SecondGame" # equivalent of groups in the ena function
accum.ena
) )
3.4 Summary of key model outputs
Users can explore what is stored in the object set
by typing set$
and select items from the drop down list. Here, we briefly describe the top-level items in set
that are often of interest.
3.4.1 Connection counts
Connection counts are the frequencies of unique connections a unit made. For each unit, ENA creates a cumulative adjacency vector that contains the sums of all unique code co-occurrences for that unit across all stanza windows. Here, there are 48 units in the ENA model, so there are 48 adjacency vectors. Each term in an ENA adjacency vector represents a unique co-occurrence of codes. Thus with six codes, each vector has 15 terms (n choose two). This is because ENA models are undirected and do not model co-occurrences of the same code.
To access ENA adjacency vectors, use set.ena$connection.counts
.
$connection.counts set.ena
3.4.2 Line weights
To compare networks in terms of their relative patterns of association, researchers can spherically normalize the cumulative adjacency vectors by diving each one by its length. The resulting normalized vectors represent each unit’s relative frequencies of code co-occurrence. In other words, the sphere normalization controls for the fact that different units might have different amounts of interaction or different numbers of activities.
Notice that in set.ena$connection.counts,
the value for each unique code co-occurrence is an integer equal or greater than 0, because they represent the raw connection counts between each unique pair of codes. In set.``ena``$line.weights
, those raw counts are normalized, and therefore the values are rational numbers between 0 and 1.
To access the normalized adjacency vectors, use set.ena$line.weights
.
$line.weights set.ena
3.4.3 ENA points
As the product of a dimensional reduction, for each unit, ENA produces an ENA point in a two-dimensional space. Since there are 48 units, ENA produces 48 ENA points.
By default, rENA
visualizes ENA points on an x-y coordinate plane defined by the first two dimensions of the dimensional reduction: for a means rotation, MR1 and SVD2, and for an SVD, SVD1 and SVD2.
To access these points, use set.ena$points
.
$points set.ena
ENA points are thus summary statistics that researchers can use to conduct statistical tests, and they can also be used in subsequent analyses. For example, statistical differences between groups in the data can be tested using ENA dimension scores, and those scores can also be used in regression analyses to predict outcome variables, which we will demonstrate later.
3.4.4 Rotation matrix
The rotation matrix used during the dimensional reduction can be accessed through set.ena$rotation
. This is mostly useful when you want to construct an ENA metric space using one dataset and then project ENA points from different data into that space, as in section 5.1.
$rotation.matrix set.ena
3.4.5 Metadata
set$meta.data
returns a data frame that includes all the columns of the ENA set except for the columns representing code co-occurrences.
$meta.data set.ena
3.5 ENA visualization
Once an ENA set is constructed, it can be visualized, which facilitates interpretation of the model. Here, we will look at the two conditions, FirstGame
(novices) and SecondGame
(relative experts), by plotting their mean networks.
3.5.1 Plot a mean network
To plot a network, use the ena.plot.network
function. This function requires the network
parameter (a character vector of line weights), and the line weights come from set$line.weights
.
First, subset line weights for each of the two groups.
# Subset lineweights for `FirstGame`
= as.matrix(set.ena$line.weights$Condition$FirstGame)
first.game.lineweights
# Subset lineweights for `SecondGame`
= as.matrix(set.ena$line.weights$Condition$SecondGame) second.game.lineweights
Next, calculate the mean networks for the two groups, and store the line weights as vectors.
= as.vector(colMeans(first.game.lineweights))
first.game.mean = as.vector(colMeans(second.game.lineweights)) second.game.mean
During plotting, use a pipe |>
to send the output of one function into the first parameter of the subsequent function. To distinguish the two mean networks, set the color of the FirstGame
mean network to red.
ena.plot(set.ena, title = "FirstGame mean plot") |>
ena.plot.network(network = first.game.mean, colors = c("red"))
and the color of the SecondGame
mean network to blue.
ena.plot(set.ena, title = "SecondGame mean plot") |>
ena.plot.network(network = second.game.mean, colors = c("blue"))
As you can see from the two network visualizations above, their node positions are exactly same. All ENA networks from the same model have the same node positions, which are determined by an optimization routine that attempts to place the nodes such that the centroid of each unit’s network and the location of the ENA point in the reduced space are co-located.
Because of the fixed node positions, ENA can construct a subtracted network, which enables the identification of the most salient differences between two networks. To do this, ENA subtracts the weight of each connection in one network from the corresponding weighted connection in another network, then visualizes the differences in connection strengths. Each edge is color-coded to indicate which of the two networks contains the stronger connection, and the thickness and saturation of the edges corresponds to the magnitude of the difference.
To plot a subtracted network, first calculate the subtracted network line weights by subtracting one group’s line weights from the other. (Because ENA computes the absolute values of the differences in edge weights, the order of the two networks in the subtraction doesn’t matter.)
= first.game.mean - second.game.mean subtracted.mean
Then, use the ena.plot
function to plot the subtracted network. If the differences are relatively small, a multiplier can be applied to rescale the line weights, improving legibility.
ena.plot(set.ena, title = "Subtracted: `FirstGame` (red) - `SecondGame` (blue)") |>
ena.plot.network(network = subtracted.mean * 5, # Optional rescaling of the line weights
colors = c("red", "blue"))
Here, the subtracted network shows that on average, students in the FirstGame
condition (red) made more connections with Technical.Constraints
and Collaboration
than students in the SecondGame
condition (blue), while students in the SecondGame
condition made more connections with Design.Reasoning
and Performance.Parameters
than students in the FirstGame
condition. This is because students with more experience of engineering design practices did not need to spend as much time and effort managing the collaborative process and learning about the basic technical elements of the problem space, and instead spent relatively more time focusing on more complex analysis and design reasoning tasks.
Note that this subtracted network shows no connection between Technical.Constraints
and Design.Reasoning
, simply because the strength of this connection was similar in both conditions. Thus, subtraction networks should always be visualized along with with the two networks being subtracted.
3.5.2 Plot a mean network and its points
The ENA point or points associated with a network or mean network can also be visualized.
To visualize the points associated with each of the mean networks plotted above, use set$points
to subset the rows that are in each condition and plot each condition as a different color.
# Subset rotated points for the first condition
= as.matrix(set.ena$points$Condition$FirstGame)
first.game.points
# Subset rotated points for the second condition
= as.matrix(set.ena$points$Condition$SecondGame) second.game.points
Then, plot the FirstGame
mean network the same as above using ena.plot.network
, use |>
to pipe in the FirstGame
points that we want to include, and plot them using ena.plot.points
.
Each point in the space is the ENA point for a given unit. The red and blue squares on the x-axis are the means of the ENA points for each condition, along with the 95% confidence interval on each dimension (you might need to zoom in for better readability)
Since we used a means rotation to construct the ENA model, the resulting space highlights the differences between FirstGame
and SecondGame
by constructing a rotation that places the means of each condition as close as possible to the x-axis of the space and maximizes the differences between them.
ena.plot(set.ena, title = " points (dots), mean point (square), and confidence interval (box)") |>
ena.plot.points(points = first.game.points, colors = c("red")) |>
ena.plot.group(point = first.game.points, colors =c("red"),
confidence.interval = "box")
ena.plot(set.ena, title = "FirstGame mean network and its points") |>
ena.plot.network(network = first.game.mean, colors = c("red")) |>
ena.plot.points(points = first.game.points, colors = c("red")) |>
ena.plot.group(point = first.game.points, colors =c("red"),
confidence.interval = "box")
Then, do the same for the SecondGame
condition.
ena.plot(set.ena, title = " points (dots), mean point (square), and confidence interval (box)") |>
ena.plot.points(points = second.game.points, colors = c("blue")) |>
ena.plot.group(point = second.game.points, colors =c("blue"),
confidence.interval = "box")
ena.plot(set.ena, title = "SecondGame mean network and its points") |>
ena.plot.network(network = second.game.mean, colors = c("blue")) |>
ena.plot.points(points = second.game.points, colors = c("blue")) |>
ena.plot.group(point = second.game.points, colors =c("blue"),
confidence.interval = "box")
Lastly, do the same for subtraction as well.
ena.plot(set.ena, title = "Subtracted mean network: `FirstGame` (red) - `SecondGame` (blue)") |>
ena.plot.network(network = subtracted.mean * 5,
colors = c("red", "blue")) |>
ena.plot.points(points = first.game.points, colors = c("red")) |>
ena.plot.group(point = first.game.points, colors =c("red"),
confidence.interval = "box") |>
ena.plot.points(points = second.game.points, colors = c("blue")) |>
ena.plot.group(point = second.game.points, colors =c("blue"),
confidence.interval = "box")
Note that the majority of the red points (FirstGame
) are located on the left side of the space, and the blue points (SecondGame
) are mostly located on the right side of the space. This is consistent with the line weights distribution in the mean network: the FirstGame
units make relatively more connections with nodes on the left side of the space, while the SecondGame
units make relatively more connections with nodes on the right side of the space. The positions of the nodes enable interpretation of the dimensions, and thus interpretation of the locations of the ENA points.
3.5.3 Plot an individual unit network and its point
Plotting the network and ENA point for a single unit uses the same approach. First, subset the line weights and point for a given unit.
= as.matrix(set.ena$line.weights$ENA_UNIT$`FirstGame.steven z`) # subset line weights
unit.A.line.weights = as.matrix(set.ena$points$ENA_UNIT$`FirstGame.steven z`) # subset ENA point unit.A.point
Then, plot the network and point for that unit.
ena.plot(set.ena, title = "Individual network: `FirstGame`.steven z") |>
ena.plot.network(network = unit.A.line.weights, colors = c("red")) |>
ena.plot.points(points = unit.A.point, colors = c("red"))
Following the exact same procedure, we can, for example, choose a unit from the other condition to plot and also construct a subtracted plot for those two units.
= as.matrix(set.ena$line.weights$ENA_UNIT$`SecondGame.samuel o`) # subset line weights
unit.B.line.weights = as.matrix(set.ena$points$ENA_UNIT$`SecondGame.samuel o`) # subset ENA point unit.B.point
ena.plot(set.ena, title = "Individual network: `SecondGame`.samuel o") |>
ena.plot.network(network = unit.B.line.weights, colors = c("blue")) |>
ena.plot.points(points = unit.B.point, colors = c("blue"))
To visually analyze the differences between the two individual networks, plot their subtracted network.
ena.plot(set.ena, title = "Subtracted network: `FirstGame`.steven z (red) - `SecondGame`.samuel o (blue)") |>
ena.plot.network(network = (unit.A.line.weights - unit.B.line.weights) * 5,
colors = c("red", "blue")) |>
ena.plot.points(points = unit.A.point, colors = c("red")) |>
ena.plot.points(points = unit.B.point, colors = c("blue"))
In this unit-level subtracted network, Unit A (red) made relatively more connections with codes such as Technical.Constraints
, Data
, and Collaboration
, while Unit B (blue) made relatively more connections with Design.Reasoning
and Performance.Parameters
.
3.5.4 Plot everything, everywhere, all at once
The helper function ena.plotter
enables users to plot points, means, and networks for each condition at the same time. This gives the same results as above more parsimoniously. However, this approach does not enable customization of edge and point colors.
#with helper function
<-ena.plotter(set.ena,
ppoints = T,
mean = T,
network = T,
print.plots = T,
groupVar = "Condition",
groups = c("SecondGame","FirstGame"),
subtractionMultiplier = 5)
$SecondGame
$FirstGame
$SecondGame-FirstGame
class(p) <- c("plotly", "enaplot", "html-fill-item-overflow-hidden", "html-fill-item", class(p))
3.6 Compare groups statistically
In addition to visual comparison of networks, ENA points can be analyzed statistically. For example, here we might test whether the patterns of association in one condition are significantly different from those in the other condition.
To demonstrate both parametric and non-parametric approaches to this question, the examples below use a Student’s t test and a Mann-Whitney U test to test for differences between the FirstGame
and SecondGame
condition. For more on differences between parametric and non-parametric tests, see Kaur & Kumar (2015).
First, install the lsr
package to enable calculation of effect size (Cohen’s d) for the t test.
# install.packages('lsr')
library(lsr)
Then, subset the points to test for differences between the two conditions.
= as.matrix(set.ena$points$Condition$FirstGame)[,1]
ena_first_points_d1 = as.matrix(set.ena$points$Condition$SecondGame)[,1]
ena_second_points_d1
= as.matrix(set.ena$points$Condition$FirstGame)[,2]
ena_first_points_d2 = as.matrix(set.ena$points$Condition$SecondGame)[,2] ena_second_points_d2
Conduct the t test on the first and second dimensions.
# parametric tests
= t.test(ena_first_points_d1, ena_second_points_d1)
t_test_d1 t_test_d1
Welch Two Sample t-test
data: ena_first_points_d1 and ena_second_points_d1
t = -6.5183, df = 45.309, p-value = 5.144e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2687818 -0.1419056
sample estimates:
mean of x mean of y
-0.09411588 0.11122786
= t.test(ena_first_points_d2, ena_second_points_d2)
t_test_d2 t_test_d2
Welch Two Sample t-test
data: ena_first_points_d2 and ena_second_points_d2
t = 1.9334e-16, df = 43.175, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.07768526 0.07768526
sample estimates:
mean of x mean of y
1.935145e-19 -7.254914e-18
Compute any other statistics that may be of interest. A few examples are given below.
mean(ena_first_points_d1)
[1] -0.09411588
mean(ena_second_points_d1)
[1] 0.1112279
mean(ena_first_points_d2)
[1] 1.935145e-19
mean(ena_second_points_d2)
[1] -7.254914e-18
sd(ena_first_points_d1)
[1] 0.1115173
sd(ena_second_points_d1)
[1] 0.1063515
sd(ena_first_points_d2)
[1] 0.1267104
sd(ena_second_points_d2)
[1] 0.1380851
length(ena_first_points_d1)
[1] 26
length(ena_second_points_d1)
[1] 22
length(ena_first_points_d2)
[1] 26
length(ena_second_points_d2)
[1] 22
cohensD(ena_first_points_d1, ena_second_points_d1)
[1] 1.880622
cohensD(ena_first_points_d2, ena_second_points_d2)
[1] 5.641688e-17
Here, along the x axis (MR1), a two-sample t test assuming unequal variance shows that the FirstGame
(mean=-0.09, SD=0.11, N=26) condition is statistically significantly different for alpha=0.05 from the SecondGame
condition (mean=0.11, SD=0.10, N=22; t(45.31)=-6.52, p=0.00, Cohen’s d=1.88). Along the y axis (SVD2), a two-sample t test assuming unequal variance shows that the FirstGame
condition (mean=0.11, SD=0.13, N=26) is not statistically significantly different for alpha=0.05 from the SecondGame
condition (mean=0.00, SD=1.3, N=22; t(43.17)=0, p=1.00).
The Mann-Whitney U test is a non-parametric alternative to the independent two-sample t test.
First, install the rcompanion
package to calculate the effect size (r) for a Mann-Whitney U test.
# install.packages('rcompanion')
library(rcompanion)
Then, conduct a Mann-Whitney U test on the first and second dimensions.
# non parametric tests
= wilcox.test(ena_first_points_d1, ena_second_points_d1)
w_test_d1 = wilcox.test(ena_first_points_d2, ena_second_points_d2)
w_test_d2
w_test_d1
Wilcoxon rank sum exact test
data: ena_first_points_d1 and ena_second_points_d1
W = 50, p-value = 8.788e-08
alternative hypothesis: true location shift is not equal to 0
w_test_d2
Wilcoxon rank sum exact test
data: ena_first_points_d2 and ena_second_points_d2
W = 287, p-value = 0.9918
alternative hypothesis: true location shift is not equal to 0
Compute any other statistics that may be of interest. A few examples are given below.
median(ena_first_points_d1)
[1] -0.08464154
median(ena_second_points_d1)
[1] 0.1300029
median(ena_first_points_d2)
[1] -0.007252397
median(ena_second_points_d2)
[1] 0.0003031848
length(ena_first_points_d1)
[1] 26
length(ena_second_points_d1)
[1] 22
length(ena_first_points_d2)
[1] 26
length(ena_second_points_d2)
[1] 22
abs(wilcoxonR(ena_first_points_d1, ena_second_points_d1))
r
0.863
abs(wilcoxonR(ena_first_points_d2, ena_second_points_d2))
r
0.863
Here, along the x axis (MR1), a Mann-Whitney U test shows that the FirstGame
condition (Mdn=-0.08, N=26) was statistically significantly different for alpha=0.05 from the SecondGame
condition (Mdn=-0.007, N=22; U=50, p=0.00, r=0.86). Along the y axis (SVD2), a Mann-Whitney U test shows that the FirstGame
condition (Mdn=0.13, N=26) is not statistically significantly different for alpha=0.05 from the SecondGame
condition (Mdn=0.00, N=22; U=287, p=0.99). The absolute value of r
value in Mann-Whitney U test varies from 0 to close to 1. The interpretation values for r commonly in published literature is: 0.10 - < 0.3
(small effect), 0.30 - < 0.5
(moderate effect) and >= 0.5
(large effect).
3.7 Model evaluation
In this section, we introduce three ways users can evaluate the quality of their ENA models.
3.7.1 Variance explained
Briefly, variance explained (also called explained variation) refers to the proportion of the total variance in a dataset that is accounted for by a statistical model or set of predictors.
In ENA, to represent high-dimensional vectors in a two-dimensional space, ENA uses either singular value decomposition or means rotation combined with SVD. For each of the reduced dimensions, the variance in patterns of association among units explained by that dimension can be computed.
$model$variance set.ena
MR1 SVD2 SVD3 SVD4 SVD5 SVD6
0.320460221 0.244500582 0.152894192 0.093518444 0.060221209 0.034883009
SVD7 SVD8 SVD9 SVD10 SVD11 SVD12
0.027680609 0.017549851 0.013516258 0.009812554 0.008089549 0.005764568
SVD13 SVD14 SVD15
0.004938552 0.003474931 0.002695469
Here, the first dimension is MR1 and the second dimension is SVD2. The MR1 dimension has the highest variance explained at 32%.
As with any statstical model, greater explained variance does not necessarily indicate a better model, as it may be due to overfitting, but it provides one indicator of model quality.
3.7.2 Goodness of fit
Briefly, a model’s goodness of fit refers to how well a model fits or represents the data. A model with a high goodness of fit indicates that it accurately represents the data and can make reliable predictions.
In ENA, a good fit means that the positions of the nodes in the space—and thus the network visualizations—are consistent with the mathematical properties of the model. In other words, we can confidently rely on the network visualizations to interpret the ENA model. The process that ENA uses to achieve high goodness of fit is called co-registration. The mathematical details of co-registration are beyond the scope of this chapter and can be found in Bowman et al., (2022).
To test a model’s goodness of fit, use ena.correlations
. The closer the value is to 1, the higher the model’s goodness of fit is. Most ENA models have a goodness of fit that is well above 0.90.
ena.correlations(set.ena)
3.7.3 Close the interpretative loop
Another approach to evaluate an ENA model is to confirm the alignment between quantitative model (in our case, our ENA model) and the original qualitative data. In other words, we can return to the original data to confirm that quantitative findings give a fair representation of the data. This approach is an example of what’s called as closing the interpretative loop in Quantitative Ethnography field (Shaffer, 2017).
For example, based on our visual analysis of the network of SecondGame.samuel o
in previous section, we are interested in what the lines are in the original data that contributed to the connection between Design.Reasoning
and Performance.Parameters
.
Let’s first review what SecondGame.samuel o
ENA network looks like.
ena.plot(set.ena, title = "Individual network: `SecondGame`.samuel o") |>
ena.plot.network(network = as.matrix(set.ena$line.weights$ENA_UNIT$`SecondGame.samuel o`), colors = c("blue")) |>
ena.plot.points(points = as.matrix(set.ena$points$ENA_UNIT$`SecondGame.samuel o`), colors = c("blue"))
To do so, we use view()
function and specify required parameters as below.
This is going to activate a window shows up in your Viewer
panel. If it is too small to read, you can click on the “Show in new window” button to view it in your browser for better readability. (Note: the html page produced by the view() function will show separately from the html file knitted from RMD file)
view(accum.ena,
id_col = "ENA_UNIT", # do not need to change this
wh = c("SecondGame.samuel o"), # the unit we are interested in
units.by = c("Condition", "UserName"), # consistent with in 3.3.1
conversation.by = c("Condition", "GroupName", "ActivityNumber"), # consistent with in 4.3.3
codes = c("Performance.Parameters", "Design.Reasoning"), # codes of choice
window = 7) # consistent with in 3.3.4
In the Viewer
panel, hover over your cursor on any of the lines that are in bold, a size of 7 lines rectangle shows up, representing that in a moving stanza window of size 7, the referent line (the line in bold) and its preceding 6 lines. The 1 and 0 in Technical.Constraints
column and Design.Reasoning column shows where the connections happened.
For example, line 2477 Samuel shared his Design.Reasoning
about “mindful of (the) how one device scores relative to other ones”, to reference back to what Casey said in line 2476 about Performance.Parameters
“not one source/censor can be the best in every area so we had to sacrifice certain attributes”, as well as what Jackson said in line 2475 about safety as one of the Performance.Parameters
“when it came to the different attributes, i think that all were important in their own way but i think safety is one of the most important”.
This is a qualitative example of a connection made between Performance.Parameters
and Design.Reasoning.
3.8 Using ENA model outputs in other analyses
It is often useful to use the outputs of ENA models in subsequent analyses. The most commonly used outputs are the ENA points, i.e., set$points
. For example, we can use a linear regression analysis to test whether ENA points on the first two dimensions are predictive of an outcome variable, in this case, change in confidence in engineering skills.
= set.ena$points
regression_data $CONFIDENCE.Change = as.numeric(regression_data$CONFIDENCE.Change) regression_data
Warning: NAs introduced by coercion
= lm(CONFIDENCE.Change ~ MR1 + SVD2 + Condition,
condition_regression data = regression_data,
na.action = na.omit)
summary(condition_regression)
Call:
lm(formula = CONFIDENCE.Change ~ MR1 + SVD2 + Condition, data = regression_data,
na.action = na.omit)
Residuals:
Min 1Q Median 3Q Max
-1.18092 -0.24324 -0.08171 0.30716 1.88404
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.1111 0.1490 7.457 2.82e-09 ***
MR1 -0.4540 0.8616 -0.527 0.601
SVD2 0.3268 0.7154 0.457 0.650
ConditionSecondGame -0.3484 0.2566 -1.358 0.182
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6374 on 43 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1228, Adjusted R-squared: 0.0616
F-statistic: 2.007 on 3 and 43 DF, p-value: 0.1273
The results of this regression analysis show that ENA points are not a significant predictor of the students’ pre-post change in confidence (MR1: t=-0.53, p=0.60; SVD2: t=0.46, p=0.65; Condition: t=-1.36, p=0.18). The overall model was also not significant (F(3, 43)=2.01, p=0.13) with an adjusted r-squared value of 0.06.
Recall that the dataset we are using is a small subset of the full RS.data
, and thus results that are significant for the whole dataset may not be for this sample.
4 Ordered Network Analysis with R
This section demonstrates how to conduct an ONA analysis using the ona
R package. If you are new to ONA as an analytic technique, Tan et al. (2022) provides a more detailed explication of its theoretical and methodological foundations.
Because ONA shares some conceptual and procedural similarities with ENA, you may also want to read the recommended papers from the ENA section (Shaffer et al., 2016; Shaffer & Ruis, 2017; Bowman et al., 2022).
4.1 Install the ONA package and load the library
Install the ona package and load the ona library after installing.
# install.packages("ona", repos = c("https://cran.qe-libs.org", "https://cran.rstudio.org"))
library(ona)
Then, install the other package that is required for ONA analysis.
# install.packages("tma", repos = c("https://cran.qe-libs.org", "https://cran.rstudio.org"))
library(tma)
4.2 Dataset
(Refer to section 3.2 for a detailed description of the dataset used here.)
Load the RS.data
dataset.
= ona::RS.data data
4.3 Construct an ONA model
To construct an ONA model, identify which columns in the data to use for the parameters required by the ONA modeling function. The parameters are defined identically in both ENA and ONA; see Section 3.3 for detailed explanations.
4.3.1 Specify units
Select the units as in Section 3.3.1.
<- c("Condition", "UserName") my_units
4.3.2 Specify codes
Select the codes as in Section 3.3.2.
= c(
my_codes 'Data',
'Technical.Constraints',
'Performance.Parameters',
'Client.and.Consultant.Requests',
'Design.Reasoning',
'Collaboration')
4.3.3 Specify conversations
The parameter to specify conversations in rENA
is called conversation
; in ONA, the equivalent is called my_hoo_rules
, where hoo
is an abbreviation of horizon of observation
.
Choose the combination of Condition
column, GroupName
column, and ActivityNumber
column to define the conversation parameter.
The syntax to specify conversations using my_hoo_rules
in ONA is slightly different from the syntax to specify conversation
in ENA, but the conceptual definition is the same.
<- conversation_rules(
my_hoo_rules %in% UNIT$Condition &
(Condition %in% UNIT$GroupName &
GroupName %in% UNIT$ActivityNumber)) ActivityNumber
4.3.4 Specify the window
Specify a moving stanza window size by passing a numerical value to the window_size
parameter.
= 7 window_size
To specify an infinite stanza window in ONA, set the size of the moving window equal to or larger than the number of lines in the longest conversation. For example, set window_size = 4000
, which is greater than the total number of rows in our dataset (nrows=3,824).
4.3.5 Specify metadata
As in ENA, metadata columns can be included if desired. Metadata columns are not required to construct an ONA model, but they provide information that can be used to subset units in the resulting model.
= c("CONFIDENCE.Change","CONFIDENCE.Pre","CONFIDENCE.Post","C.Change") metaCols
4.3.6 Accumulate connections
Now that all the parameters are specified, connections can be accumulated. For each unit, the ONA algorithm uses a moving stanza window to identify connections formed from a current line of data (e.g., a turn of talk), or response, to the preceding lines within the window (the ground).
Unlike in ENA, where connections among codes are recorded in a symmetric adjacency matrix, ONA accounts for the order in which the connections occur by constructing an asymmetric adjacency matrix for each unit; that is, the number of connections from code A to code B may be different than the number of connections from B to A.
To accumulate connections, pass the parameters specified to the contexts
and accumulate_contexts
functions, and store the output in an object (in this case, accum.ona).
<-
accum.ona contexts(data,
units_by = my_units,
hoo_rules = my_hoo_rules) |>
accumulate_contexts(codes = my_codes,
decay.function = decay(simple_window, window_size = 7),
meta.data = metaCols,
return.ena.set = FALSE) # keep this as FALSE to get an ONA model, otherwise it will return an undirected model)
4.3.7 Construct an ONA model
After accumulation, call the model
function to construct an ONA model. ONA currently implements singular value decomposition (SVD) and means rotation (MR) to perform dimensional reduction.
To create an ONA model using SVD, pass the accum.ona
object to the model
function.
<-
set.ona model(accum.ona)
When there are two discrete groups to compare, a means rotation can be used, as described in Section 3.3.5.
A means rotation is specified using rotate.using ="mean"
in the model
function. Additionally, the model
function expects rotation.params
to be a list
with two named elements, each containing a logical vector representing the rows of units to be included in each group.
Here, construct the ONA model as shown below.
<-
set.ona model(accum.ona, # the previously run accumulation above
rotate.using ="mean", # means rotation method
rotation.params = # two groups for means rotation in a list
list(FirstGame=accum.ona$meta.data$Condition=="FirstGame",
`SecondGame`=accum.ona$meta.data$Condition=="SecondGame")
)
4.4 Summary of key model outputs
Information about an ONA model is now stored in the R object set.ona
.
As in rENA, users can explore the data stored in the object by typing set.ona$
and select items from the drop down list. Here, we briefly explain the top-level items in set.ona$
.
4.4.1 Connection counts
Because ONA accounts for the order in which the connections occur by constructing an asymmetric adjacency matrix for each unit, connection counts from code A to code B and from B to A, as well as self-connections for each code (from A to A) are recorded. Thus, because six codes were included in the model, the cumulative adjacency vector for each unit contains 36 terms (n^2).
$connection.counts set.ona
4.4.2 Line weights
In set.ona$connection.counts
, the value for each unique co-occurrence of codes is an integer equal or greater than 0, because they represent the directional connection counts between each pair of codes. In set.ona$line.weights
, the connection counts are sphere normalized, and so the values are between 0 and 1. See section 3.4.2 for more information about line weights.
$line.weights set.ona
4.4.3 ONA points
For each unit, ONA produces an ONA point in a two-dimensional space formed by the first two dimensions of the dimensional reduction.
Here, the MR1 column represents the x-axis coordinate for each unit, and the SVD2 column represents the y-axis coordinate for each unit.
$points set.ona
4.4.4 Rotation matrix
The rotation matrix used during the dimensional reduction can be accessed through set.ona$rotation
. This is mostly useful when you want to construct an ONA metric space using one dataset and then project ONA points from different data into that space, as in section 5.2.
$rotation.matrix set.ona
4.4.5 Metadata
set.ona$meta.data
gives a data frame that includes all the columns except for the code connection columns.
$meta.data set.ona
4.5 ONA visualization
Once an ONA model is constructed, ONA networks can be visualize. The plotting function in ONA is called plot
, and it works similarly to the same function in ENA.
Before plotting, you can set up several global parameters to ensure consistency across plots. These parameters will be clearer in subsequent sections.
= 0.4 # scale up or down node sizes
node_size_multiplier = 1 # zoom in or out node positions
node_position_multiplier =1.5 # zoom in or out the point positions
point_position_multiplier = 1.5 # adjust the chevron color lighter or darker
edge_arrow_saturation_multiplier = 1 # scale up or down edge sizes edge_size_multiplier
4.5.1 Plot a mean network
Mean ONA networks can be plotted for each of the conditions along with their subtracted network.
First, plot the mean network for the FirstGame
condition. Use a pipe |>
to connect the edges
function and the nodes
function. Users are only required to specify the weights
parameter, as the remaining parameters have default values unless specified otherwise.
plot.ena.ordered.set(set.ona, title = "FirstGame (red) mean network") |>
edges(
weights =set.ona$line.weights$Condition$FirstGame,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("red")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("red"))
Since this is the first ONA network visualization in this chapter, we briefly explain how to read an ONA network.
Node size: In ONA, the node size is proportional to the number of occurrences of that code as a response to other codes in the data, with larger nodes indicating more responses. For example, in this plot, students in the FirstGame
condition responded most frequently with discourse about Technical.Constraints.
Self-connections: The color and saturation of the circle within each node is proportional to the number of self-connections for that code: that is, when a code is both what students responded to and what they responded with. Colored circles that are larger and more saturated reflect codes with more frequent self-connections.
Edges: Note that unlike most directed network visualizations, which use arrows or spearheads to indicate direction, ONA uses a “broadcast” model, where the source of a connection (what students responded to) is placed at the apex side of the triangle and the destination of a connection (what students responded with) is placed at its base.
Chevrons on edges: The chevrons point in the direction of the connection. Between any pair of nodes, if there is a bidirectional connection, the chevron only appears on the side with the stronger connection. This helps viewers differentiate heavier edges in cases such as between Technical.Constraints
and Data
, where the connection strengths from both directions are similar. When the connection strengths are identical between two codes, the chevron will appear on both edges.
Now, plot the mean network for SecondGame
.
plot.ena.ordered.set(set.ona, title = "SecondGame (blue) mean network") |>
edges(
weights = set.ona$line.weights$Condition$SecondGame,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("blue"))
Then, plot the subtracted network to show the differences between the mean networks of the FirstGame
and SecondGame
conditions.
plot.ena.ordered.set(set.ona, title = "Subtracted mean network: `FirstGame` (red) vs `SecondGame` (blue)") |>
edges(
weights = (colMeans(set.ona$line.weights$Condition$FirstGame) - colMeans(set.ona$line.weights$Condition$SecondGame))*4, # optional weights multiplier to adjust readability
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("red","blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("red","blue"))
4.5.2 Plot a mean network and its points
Besides plotting the mean network for each condition and the subtracted network, we can also plot the individual units within each condition.
Use set.ona$points
to subset the rows that are in each condition and plot the units in each condition as a different color.
The points are specified in the units
function. The edges
and nodes
functions remain the same as above.
plot.ena.ordered.set(set.ona, title = "points (dots), mean point (square), and confidence interval") |>
units(
points=set.ona$points$Condition$FirstGame,
points_color = c("red"),
show_mean = TRUE, show_points = TRUE, with_ci = TRUE)
plot.ena.ordered.set(set.ona, title = "FirstGame (red) mean network") |>
units(
points=set.ona$points$Condition$FirstGame,
points_color = c("red"),
show_mean = TRUE, show_points = TRUE, with_ci = TRUE) |>
edges(
weights =set.ona$line.weights$Condition$FirstGame,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("red")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("red"))
plot.ena.ordered.set(set.ona, title = "points (dots), mean point (square), and confidence interval") |>
units(
points=set.ona$points$Condition$SecondGame,
points_color = c("blue"),
show_mean = TRUE, show_points = TRUE, with_ci = TRUE)
plot.ena.ordered.set(set.ona, title = "SecondGame (blue) mean network") |>
units(
points=set.ona$points$Condition$SecondGame,
points_color = "blue",
show_mean = TRUE, show_points = TRUE, with_ci = TRUE) |>
edges(
weights = set.ona$line.weights$Condition$SecondGame,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("blue"))
Plot the subtracted network as follows.
# `FirstGame` and `SecondGame` subtracted plot
plot.ena.ordered.set(set.ona, title = "Difference: `FirstGame` (red) vs `SecondGame` (blue)") |>
units(
points = set.ona$points$Condition$FirstGame,
points_color = "red",
show_mean = TRUE, show_points = TRUE, with_ci = TRUE) |>
units(
points = set.ona$points$Condition$SecondGame,
points_color = "blue",
show_mean = TRUE, show_points = TRUE, with_ci = TRUE) |>
edges(
weights = (colMeans(set.ona$line.weights$Condition$FirstGame) - colMeans(set.ona$line.weights$Condition$SecondGame))*4, # optional multiplier to adjust for readability
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("red","blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("red","blue"))
4.5.3 Plot an individual network and its points
To plot an individual student’s network and ONA point, use set.ona$points
.
Here, we choose the same two units we compared in the ENA analysis (Section 3.5.3).
# first game
plot.ena.ordered.set(set.ona, title = "FirstGame::steven z") |>
units(
points=set.ona$points$ENA_UNIT$`FirstGame::steven z`,
points_color = "red",
show_mean = FALSE, show_points = TRUE, with_ci = FALSE) |>
edges(
weights = set.ona$line.weights$ENA_UNIT$`FirstGame::steven z`,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("red")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("red"))
# second game
plot.ena.ordered.set(set.ona, title = "SecondGame::samuel o") |>
units(
points=set.ona$points$ENA_UNIT$`SecondGame::samuel o`,
points_color = "blue",
show_mean = FALSE, show_points = TRUE, with_ci = FALSE) |>
edges(
weights = set.ona$line.weights$ENA_UNIT$`SecondGame::samuel o`,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("blue"))
In this case, both units make relatively strong connections between Design.Reasoning
and Data
. However, for Unit A (red), the connection is relatively more from Design.Reasoning
to Data
than the other way around. This indicates that more often this unit responded with Data
. In contrast, Unit B (blue) responded more frequently to Data
with Design.Reasoning
.
A subtracted network can make such differences more salient.
# units difference
= as.vector(as.matrix(set.ona$line.weights$ENA_UNIT$`FirstGame::steven z`))
mean1 = as.vector(as.matrix(set.ona$line.weights$ENA_UNIT$`SecondGame::samuel o`))
mean2
= mean1 - mean2 subtracted.mean
plot.ena.ordered.set(set.ona, title = "subtracted network of steven z (red) and Samuel (blue)") |>
units(
points = set.ona$points$ENA_UNIT$`FirstGame::steven z`, points_color = "red",
point_position_multiplier = point_position_multiplier,
show_mean = FALSE, show_points = TRUE, with_ci = FALSE) |>
units(
points = set.ona$points$ENA_UNIT$`SecondGame::samuel o`, points_color = "blue",
point_position_multiplier = point_position_multiplier,
show_mean = FALSE, show_points = TRUE, with_ci = FALSE) |>
edges(
weights = subtracted.mean*2,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("red", "blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("red", "blue"))
The connection between Design.Reasoning
and Data
consists of two triangles, one in blue pointing from Data
to Design.Reasoning
, the other in red pointing from Design.Reasoning
to Data. This indicates that although both units made strong connections between these two codes, the relative directed frequencies are different. Recall that in the ENA subtracted network for the same two units, the connections between Data
and Design.Reasoning
were basically the same. ONA, by accounting for the order of events, shows that while the undirected relative frequencies were similar, there was a difference in the order in which the two students made the connection.
4.6 Compare groups statistically
In addition to visual comparison of networks, ENA points can be analyzed statistically. For example, here we might test whether the patterns of association in one condition are significantly different from those in the other condition.
To demonstrate both parametric and non-parametric approaches to this question, the examples below use a Student’s t test and a Mann-Whitney U test to test for differences between the FirstGame
and SecondGame
condition. For more instructions on when to choose parametric and non-parametric tests, please refer to Kaur & Kumar (2015).
First, install the lsr
package to enable calculation of effect size (Cohen’s d) for the t test.
# install.packages('lsr')
library(lsr)
Then, subset the points to test for differences between the two conditions.
= as.matrix(set.ona$points$Condition$FirstGame)[,1]
ona_first_points_d1 = as.matrix(set.ona$points$Condition$SecondGame)[,1]
ona_second_points_d1
= as.matrix(set.ona$points$Condition$FirstGame)[,2]
ona_first_points_d2 = as.matrix(set.ona$points$Condition$SecondGame)[,2] ona_second_points_d2
Conduct the t test on the first and second dimensions.
# parametric tests
= t.test(ona_first_points_d1, ona_second_points_d1)
t_test_d1 t_test_d1
Welch Two Sample t-test
data: ona_first_points_d1 and ona_second_points_d1
t = -3.7729, df = 41.001, p-value = 0.0005111
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.18227713 -0.05517572
sample estimates:
mean of x mean of y
-0.05441628 0.06431015
= t.test(ona_first_points_d2, ona_second_points_d2)
t_test_d2 t_test_d2
Welch Two Sample t-test
data: ona_first_points_d2 and ona_second_points_d2
t = -6.9301e-16, df = 45.45, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1008208 0.1008208
sample estimates:
mean of x mean of y
-1.727628e-17 1.742362e-17
Compute any other statistics that may be of interest. A few examples are given below.
mean(ona_first_points_d1)
[1] -0.05441628
mean(ona_second_points_d1)
[1] 0.06431015
mean(ona_first_points_d2)
[1] -1.727628e-17
mean(ona_second_points_d2)
[1] 1.742362e-17
sd(ona_first_points_d1)
[1] 0.09754142
sd(ona_second_points_d1)
[1] 0.1171941
sd(ona_first_points_d2)
[1] 0.1784777
sd(ona_second_points_d2)
[1] 0.1679372
length(ona_first_points_d1)
[1] 26
length(ona_second_points_d1)
[1] 22
length(ona_first_points_d2)
[1] 26
length(ona_second_points_d2)
[1] 22
cohensD(ona_first_points_d1, ona_second_points_d1)
[1] 1.109985
cohensD(ona_first_points_d2, ona_second_points_d2)
[1] 1.997173e-16
Here, along the x axis (MR1), a two-sample t test assuming unequal variance shows that the FirstGame
(mean=-0.05, SD=0.09, N=26) condition is statistically significantly different for alpha=0.05 from the SecondGame
condition (mean=0.06, SD=0.12, N=22; t(41.001)= -3.77, p=0.00, Cohen’s d=1.1). Along the y axis (SVD2), a two-sample t test assuming unequal variance shows that the FirstGame
condition (mean=-1.73, SD=0.17, N=26) is not statistically significantly different for alpha=0.05 from the SecondGame
condition (mean=1,74, SD=0.17, N=22; t(45.45)= 0, p=1.00, Cohen’s d=0.00).
The Mann-Whitney U test is a non-parametric alternative to the independent two-sample t test.
First, install the rcompanion
package to calculate the effect size (r) for a Mann-Whitney U test.
# install.packages('rcompanion')
library(rcompanion)
Then, conduct a Mann-Whitney U test on the first and second dimensions.
# non parametric tests
= wilcox.test(ona_first_points_d1, ona_second_points_d1)
w_test_d1 = wilcox.test(ona_first_points_d2, ona_second_points_d2)
w_test_d2
w_test_d1
Wilcoxon rank sum exact test
data: ona_first_points_d1 and ona_second_points_d1
W = 130, p-value = 0.0009533
alternative hypothesis: true location shift is not equal to 0
w_test_d2
Wilcoxon rank sum exact test
data: ona_first_points_d2 and ona_second_points_d2
W = 264, p-value = 0.6593
alternative hypothesis: true location shift is not equal to 0
Compute any other statistics that may be of interest. A few examples are given below.
median(ona_first_points_d1)
[1] -0.04307778
median(ona_second_points_d1)
[1] 0.09596238
median(ona_first_points_d2)
[1] 0.001753116
median(ona_second_points_d2)
[1] 0.05862436
length(ona_first_points_d1)
[1] 26
length(ona_second_points_d1)
[1] 22
length(ona_first_points_d2)
[1] 26
length(ona_second_points_d2)
[1] 22
abs(wilcoxonR(ona_first_points_d1, ona_second_points_d1))
r
0
abs(wilcoxonR(ona_first_points_d2, ona_second_points_d2))
r
0.707
Here, along the x axis (MR1), a Mann-Whitney U test shows that the FirstGame
condition (Mdn=-0.04, N=26) was statistically significantly different for alpha=0.05 from the SecondGame
condition (Mdn=0.10, N=22 U=130, p=0.001, r=0.00). Along the y axis (SVD2), a Mann-Whitney U test shows that the FirstGame
condition (Mdn=0.001, N=26) is not statistically significantly different for alpha=0.05 from the SecondGame
condition (Mdn=0.00, N=22, U=264, p=0.66, r=0.71). The absolute value of r
value in Mann-Whitney U test varies from 0 to close to 1. The interpretation values for r commonly in published literature is: 0.10 - < 0.3
(small effect), 0.30 - < 0.5
(moderate effect) and >= 0.5
(large effect).
4.7 Model evaluation
4.7.1 Variance explained
Briefly, variance explained (also called explained variation) refers to the proportion of the total variance in a dataset that is accounted for by a statistical model or set of predictors.
In ONA, to represent high-dimensional vectors in a two-dimensional space, ONA uses either singular value decomposition or means rotation combined with SVD. For each of the reduced dimensions, the variance in patterns of association among units explained by that dimension can be computed.
$model$variance set.ona
MR1 SVD2 SVD3 SVD4 SVD5 SVD6
1.367940e-01 2.736079e-01 1.751045e-01 1.037211e-01 5.575839e-02 4.824400e-02
SVD7 SVD8 SVD9 SVD10 SVD11 SVD12
3.788019e-02 2.985421e-02 2.455490e-02 1.986132e-02 1.547312e-02 1.296796e-02
SVD13 SVD14 SVD15 SVD16 SVD17 SVD18
1.190085e-02 8.781167e-03 8.412761e-03 6.174962e-03 4.901254e-03 4.393250e-03
SVD19 SVD20 SVD21 SVD22 SVD23 SVD24
3.834715e-03 3.353035e-03 2.662176e-03 2.031686e-03 1.782493e-03 1.557342e-03
SVD25 SVD26 SVD27 SVD28 SVD29 SVD30
1.220686e-03 1.053932e-03 9.264935e-04 7.927517e-04 7.377140e-04 5.640286e-04
SVD31 SVD32 SVD33 SVD34 SVD35 SVD36
3.789101e-04 1.971669e-04 1.817454e-04 1.602525e-04 1.146862e-04 6.435798e-05
In our example above, since we used means rotation method, the first dimension is labeled as MR1 and the second dimension is labeled as SVD2.The two dimensions in combination explained more than 40% of the variance.
Here, the first dimension is MR1 and the second dimension is SVD2. The two dimensions in combination explained more than 40% of the variance.
As with any statistical model, greater explained variance does not necessarily indicate a better model, as it may be due to overfitting, but it provides one indicator of model quality.
4.7.2 Goodness of fit
Briefly, a model’s goodness of fit refers to how well a model fits or represents the data. A model with a high goodness of fit indicates that it accurately represents the data and can make reliable predictions.
In ONA, a good fit means that the positions of the nodes in the space—and thus the network visualizations—are consistent with the mathematical properties of the model. In other words, we can confidently rely on the network visualizations to interpret the ONA model. The process that ONA uses to achieve high goodness of fit is called co-registration, the same as the one used in ENA. The mathematical details of co-registration are beyond the scope of this chapter and can be found in Bowman et al., (2022).
To test a model’s goodness of fit, use ona::correlations
. The closer the value is to 1, the higher the model’s goodness of fit is. Most ENA models have a goodness of fit that is well above 0.90.
::correlations(set.ona) ona
4.7.3 Close the interpretative loop
Another approach to evaluate an ONA model is to confirm the alignment between quantitative model (in our case, our ONA model) and the original qualitative data. In other words, we can return to the original data to confirm that quantitative findings give a fair representation of the data. This approach is an example of what’s called as closing the interpretative loop in Quantitative Ethnography field (Shaffer, 2017).
For example, based on our visual analysis of the network of “SecondGame::samuel o” in previous section, we are interested in what the lines are in the original data that contributed to the connection from Performance.Parameters
to Design.Reasoning
.
Let’s first review what SecondGame::samuel o
ONA network looks like. Based on the connection direction and strength from Technical.Constraints
to Performance.Parameters
, we would expect to see more examples of Samuel responded with Design.Reasoning
to Performance.Parameters
, than the other way around.
plot.ena.ordered.set(set.ona, title = "SecondGame::samuel o") |>
units(
points=set.ona$points$ENA_UNIT$`SecondGame::samuel o`,
points_color = "blue",
show_mean = FALSE, show_points = TRUE, with_ci = FALSE) |>
edges(
weights = set.ona$line.weights$ENA_UNIT$`SecondGame::samuel o`,
edge_size_multiplier = edge_size_multiplier,
edge_arrow_saturation_multiplier = edge_arrow_saturation_multiplier,
node_position_multiplier = node_position_multiplier,
edge_color = c("blue")) |>
nodes(
node_size_multiplier = node_size_multiplier,
node_position_multiplier = node_position_multiplier,
self_connection_color = c("blue"))
To do so, we use view()
function and specify required parameters as below.
This is going to activate a window shows up in your Viewer
panel. If it is too small to read, you can click on the “Show in new window” button to view it in your browser for better readability. (Note: the html page produced by the view() function will show separately from the html file knitted from RMD file)
view(accum.ona, # the object stored our connection accumulation results in 4.3.6
wh = c("SecondGame::samuel o"), # the unit we are interested in
units.by = c("Condition", "UserName"), # consistent with in 4.3.1
conversation.by = c("Condition", "GroupName", "ActivityNumber"), # consistent with in 4.3.3
codes = c("Performance.Parameters", "Design.Reasoning"), # codes of choice
window = 7) # consistent with in 4.3.4
In the Viewer
panel, hover over your cursor on any of the lines that are in bold, a size of 7 lines rectangle shows up, representing that in a moving stanza window of size 7, the referent line (the line in bold) and its preceding 6 lines. The 1 and 0 in Technical.Constraints
column and Design.Reasoning
column shows where the connections happened.
Notice that here we are viewing the same qualitative example as in section 3.7.3 in ENA. In line 2477 Samuel shared his Design.Reasoning
about “mindful of (the) how one device scores relative to other ones”, as a response to what Casey said in line 2476 about Performance.Parameters
“not one source/censor can be the best in every area so we had to sacrifice certain attributes”, as well as what Jackson said in line 2475 about safety as one of the Performance.Parameters
“when it came to the different attributes, i think that all were important in their own way but i think safety is one of the most important”.
Here, ONA was able to not only capture the occurrence between code Design.Reasoning
and Performance.Parameters
as ENA did, but also represent the connection direction from Design.Reasoning
to Performance.Parameters
-
4.8 Using ONA outputs in other analysis
As with ENA, the outputs of ONA models can be used as inputs in other statistical models. See Section 3.8 for an example using ENA points.
5 Projection
In the sections above, we demonstrated how to do an ENA analysis and an ONA analysis. In this section, we show how to project new data into a space constructed with different data. This can be done as long as the same codes are used in both sets.
5.1 Projections in ENA
To project the ENA points from one model into a space constructed with different data, replace the rotation.set
parameter of ena.make.set
. In the example below, an “expert” model is developed using the SecondGame
units and the FirstGame
(novice) units are projected into that space.
= rENA::RS.data
data
#expert data
= subset(data, Condition == "SecondGame")
exp.data
#novice data
= subset(data, Condition == "FirstGame")
nov.data
#expert model
= exp.data[,c("Condition","UserName")]
units_exp = exp.data[,c("Condition","GroupName","ActivityNumber")]
conversation_exp = exp.data[,codeCols]
codes_exp = exp.data[,c("CONFIDENCE.Change",
meta_exp "CONFIDENCE.Pre","CONFIDENCE.Post","C.Change")]
=
set_exp ena.accumulate.data(
text_data = exp.data[, 'text'],
units = units_exp,
conversation = conversation_exp,
codes = codes_exp,
metadata = meta_exp,
window.size.back = 7,
|>
) ena.make.set()
$rotation$rotation.matrix set_exp
$model$points.for.projection set_exp
#novice model
= nov.data[,c("Condition","UserName")]
units_nov = nov.data[,c("Condition","GroupName","ActivityNumber")]
conversation_nov = nov.data[,codeCols]
codes_nov = nov.data[,c("CONFIDENCE.Change",
meta_nov "CONFIDENCE.Pre","CONFIDENCE.Post","C.Change")]
=
set_nov ena.accumulate.data(
text_data = nov.data[, 'text'],
units = units_nov,
conversation = conversation_nov,
codes = codes_nov,
metadata = meta_nov,
window.size.back = 7,
|>
) ena.make.set(rotation.set = set_exp$rotation)
# plot expert model (what we projected into) Using plotting wrapper to save time
= ena.plotter(set_exp,
plot_exp points = T,
mean = T,
network = T,
print.plots = F
)
# plot test model (points from test model in training model space)
= ena.plotter(set_nov,
plot_nov points = T,
mean = T,
network = T,
print.plots = F)
$plot plot_exp
[[1]]
$plot plot_nov
[[1]]
5.2 Projections in ONA
Projection works similarly in ONA.
= ona::RS.data
data
#expert data
= subset(data, Condition == "SecondGame")
exp.data
#novice data
= subset(data, Condition == "FirstGame")
nov.data
#shared unit cols
= c("UserName","Condition","GroupName")
units
#shared code cols
= c(
codes 'Data',
'Technical.Constraints',
'Performance.Parameters',
'Client.and.Consultant.Requests',
'Design.Reasoning',
'Collaboration')
#shared hoo
= conversation_rules(
hoo %in% UNIT$Condition & GroupName %in% UNIT$GroupName))
(Condition
#expert accum
= contexts(exp.data, units_by = units, hoo_rules = hoo) |>
accum.exp accumulate_contexts(codes = codes,
decay.function = decay(simple_window, window_size = 7),
return.ena.set = FALSE, norm.by = NULL)
#expert model
= model(accum.exp)
set.exp
#novice accum
= contexts(nov.data, units_by = units, hoo_rules = hoo) |>
accum.nov accumulate_contexts(codes = codes,
decay.function = decay(simple_window, window_size = 7),
return.ena.set = FALSE, norm.by = NULL)
#novice model
= model(accum.nov)
set.nov
# projecting novice data into expert space
= model(accum.nov, rotation.set = set.exp$rotation)
set
plot.ena.ordered.set(set, title = "novice data into expert space") |>
units(
points = set$points,
show_mean = TRUE, show_points = TRUE, with_ci = TRUE) |>
edges(
weights = set$line.weights) |>
nodes(
self_connection_color = "red",
node_size_multiplier = 0.6)
6 Discussion
In this chapter, we introduced two techniques, ENA and ONA, for quantifying, visualizing, and interpreting networks using coded data. Through the use of a demonstration dataset that documents collaborative discourse among students collaborating to solve an engineering design problem, we provided step-by-step instructions on how to model complex, collaborative thinking using ENA and ONA in R. The chapter combines theoretical explanations with tutorials, intended to be of aid to researchers with varying degrees of familiarity with network analysis techniques and R. This chapter mainly showcased the standard and most common use of these two tools. The ENA and ONA R packages, akin to other R packages, offer flexibility to researchers to tailor their analyses to their specific needs. For example, users with advanced R knowledge can supply their own adjacency matrices and use ENA or ONA solely as a visualization tool rather than an integrated modeling and visualization tool.
Due to the technical and practical focus of this chapter, we omitted detailed explanations of the theoretical, methodological, and mathematical foundations of ENA and ONA that are crucial for informed, theory-based learning analytics research using these techniques. Consult the Further Reading section for papers that explain these aspects of ENA and ONA in greater detail.
7 Further reading
Arastoopour Irgens, G., & Eagan, B. (2022, October). The Foundations and Fundamentals of Quantitative Ethnography. In International Conference on Quantitative Ethnography (pp. 3-16). Cham: Springer Nature Switzerland.
Bowman, D., Swiecki, Z., Cai, Z., Wang, Y., Eagan, B., Linderoth, J., & Shaffer, D. W. (2021). The mathematical foundations of epistemic network analysis. In Advances in Quantitative Ethnography: Second International Conference, ICQE 2020, Malibu, CA, USA, February 1-3, 2021, Proceedings 2 (pp. 91-105). Springer International Publishing.
Brohinsky J., Marquart C., Wang J., Ruis A.R., & Shaffer D.W. (2021). Trajectories in epistemic network analysis. In Ruis, A.R. & Lee, S.B. (Eds.), Advances in Quantitative Ethnography: Second International Conference, ICQE 2020, Malibu, CA, USA, February 1-3, 2021, Proceedings (pp. 106-121). Springer.
Gasevic, D., Greiff, S., & Shaffer, D. W. (2022). Towards strengthening links between learning analytics and assessment: Challenges and potentials of a promising new bond. Computers in Human Behavior, 1–7.
Ruis, A. R. , Tan, Y., Brohinsky, J., Yang, Y., Cai, Z., and Shaffer, D. W. (2023). Thick Description of Thin Data: Modeling Socio-Environmental Problem-Solving Trajectories in Localized Land-Use Simulations (in press)
Shaffer, D. W. (2017). Quantitative ethnography. Cathcart Press.
Shaffer, D. W. (2018). Epistemic network analysis: Understanding learning by using big data for thick description. Fischer, F., Hmelo-Silver, C. E., Goldman, S. R., & Reimann, P. (Eds.) International Handbook of the Learning Sciences (pp. 520-531). New York: Routledge.
Shaffer, D. W. (2018). Big data for thick description of deep learning. In K. Millis, D. Long, J. Magliano, and K. Weimer (Eds.), Deep learning: Multi-disciplinary approaches (pp. 262-275). NY, NY: Routledge.
Shaffer, D. W., Eagan, B., Knowles, M., Porter, C., & Cai, Z. (2022). Zero re-centered projection: An alternative proposal for modeling empty networks in ENA. In B. Wasson & S. Zörgő (Eds.), Advances in Quantitative Ethnography: Third International Conference, ICQE 2021, Virtual Event, November 6–11, 2021, Proceedings (pp. 66–79). Springer.
Shaffer, D. W., & Ruis, A. R. (2022, October). Is QE Just ENA?. In International Conference on Quantitative Ethnography (pp. 71-86). Cham: Springer Nature Switzerland.
Swiecki, Z., Lian, Z., Ruis, A. R., & Shaffer, D. W. (2019). Does order matter? Investigating sequential and cotemporal models of collaboration. In Lund, K. Niccolai, G., Lavoué, E., Hmelo-Silver, C., Gwon, G. & Baker, M. (Eds.) A Wide Lens: Combining Embodied, Enactive, Extended, and Embedded Learning in Collaborative Settings: 13th International Conference on Computer Supported Collaborative Learning (CSCL), I (pp.112-119).
Tan, Y., Hinojosa, C., Marquart, C., Ruis, A., & Shaffer, D. W. (2022). Epistemic network analysis visualization. In B. Wasson & S. Zörgő (Eds.), Advances in Quantitative Ethnography Third International Conference, ICQE 2021, Virtual Event, November 6–11, 2021, Proceedings (pp. 129–143). Springer.
Tan, Y., Ruis, A.R., Marquart C., Cai, Z., Knowles, M., & Shaffer, D.W. (2022). Ordered network analysis. In 2022 International Conference on Quantitative Ethnography.
Wang Y., Swiecki Z., Ruis A.R., & Shaffer D. W. (2021). Simplification of epistemic networks using parsimonious removal with interpretive alignment. In Ruis, A.R. & Lee, S.B. (Eds.), Advances in Quantitative Ethnography: Second International Conference, ICQE 2020, Malibu, CA, USA, February 1-3, 2021, Proceedings (pp. 137-151). Springer.
Wang, Y., Ruis, A.R., Shaffer, D.W. (2023). Modeling Collaborative Discourse with ENA Using a Probabilistic Function. In: Damşa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi-org.ezproxy.library.wisc.edu/10.1007/978-3-031-31726-2_10
8 References
Andrist, S., Ruis, A. R., & Shaffer, D. W. (2018). A network analytic approach to gaze coordination during a collaborative task. Computers in Human Behavior, 89, 339-348.
Arastoopour, G., Chesler, N. C., & Shaffer, D. W. (2014). Epistemic persistence: A simulation-based approach to increasing participation of women in engineering. Journal of Women and Minorities in Science and Engineering, 20(3): 211-234.
Arastoopour, G., & Shaffer, D. W. (2015). Epistemography and professional CSCL environment design. Paper presented at the International Conference on Computer Supported Collaborative Learning. Gothenberg, Sweden.
Arastoopour, G., Shaffer, D. W., Swiecki, Z., Ruis, A. R., & Chesler, N. C. (2016). Teaching and assessing engineering design thinking with virtual internships and epistemic network analysis. International Journal of Engineering Education, 32(3B), 1492–1501.
Baker, R. S., Gašević, D., & Karumbaiah, S. (2021). Four paradigms in learning analytics: Why paradigm convergence matters. Computers and Education: Artificial Intelligence, 2, 100021.
Bauer, E., Sailer, M., Kiesewetter, J., Shaffer, D. W., Schulz, C., Pfeiffer, J., Gurevych, I., Fischer, M. R., & Fischer, F. (2020). Pre-Service teachers’ diagnostic argumentation: What is the role of conceptual knowledge and epistemic activities?. Proceedings of the Fifteenth International Conference of the Learning Sciences, 2399-2400.
Bowman, D., Swiecki, Z., Cai, Z., Wang, Y., Eagan, B., Linderoth, J., & Shaffer, D. W. (2021). The mathematical foundations of epistemic network analysis. In Advances in Quantitative Ethnography: Second International Conference, ICQE 2020, Malibu, CA, USA, February 1-3, 2021, Proceedings 2 (pp. 91-105).
Bressler, D. M., Bodzin, A. M., Eagan, B., & Tabatabai, S. (2019). Using epistemic network analysis to examine discourse and scientific practice during a collaborative game. Journal of Science Education and Technology, 28, 553-566.
Chesler, N., Arastoopour, G., D’Angelo, C., Bagley, E., & Shaffer, D. W. (2013). Design of a professional practice simulator for educating and motivating first-year engineering students. Advances in Engineering Education 3(3): 1-29.
Chesler, N. C., Ruis, A. R., Collier, W., Swiecki, Z., Arastoopour, G., & Shaffer, D. W. (2015). A novel paradigm for engineering education: Virtual internships with individualized mentoring and assessment of engineering thinking. Journal of Biomechanical Engineering, 137(2).
Csanadi, A., Eagan, B., Shaffer, D. W., Kollar, I., & Fischer, F. (2018). When coding-and-counting is not enough: Using epistemic network analysis (ENA) to analyze verbal data in CSCL research. International Journal of Computer-Supported Collaborative Learning, 13(4), 419-438.
Fan, Y., Tan, Y., Raković, M., Wang, Y., Cai, Z., Shaffer, D. W., & Gašević, D. (2022). Dissecting learning tactics in MOOC using ordered network analysis. Journal of Computer Assisted Learning. 1– 13.
Fernandez-Nieto, G. M., Martinez-Maldonado, R., Kitto, K., & Buckingham Shum, S. (2021, April). Modelling spatial behaviours in clinical team simulations using epistemic network analysis: methodology and teacher evaluation. In LAK21: 11th International Learning Analytics and Knowledge Conference (pp. 386-396).
Kaur, A., & Kumar, R. (2015). Comparative analysis of parametric and non-parametric tests. Journal of computer and mathematical sciences, 6(6), 336-342.
Oshima, J., Oshima, R., & Fujita, W. (2018). A Mixed-Methods Approach to Analyze Shared Epistemic Agency in Jigsaw Instruction at Multiple Scales of Temporality. Journal of Learning Analytics, 5(1), 10–24. https://doi.org/10.18608/jla.2018.51.2
Phillips, M., Trevisan, O., Carli, M., Mannix, T., Gargiso, R., Gabelli, L., & Lippiello, S. (2023, March). Uncovering patterns and (dis) similarities of pre-service teachers through Epistemic Network Analysis. In Society for Information Technology & Teacher Education International Conference (pp. 1021-1030). Association for the Advancement of Computing in Education (AACE).
Prieto, L. P., Rodríguez-Triana, M. J., Ley, T., & Eagan, B. (2021). The value of epistemic network analysis in single-case learning analytics: A case study in lifelong learning. In Advances in Quantitative Ethnography: Second International Conference, ICQE 2020, Malibu, CA, USA, February 1-3, 2021, Proceedings 2 (pp. 202-217). Springer International Publishing.
Ruis, A. R., Rosser, A. A., Quandt-Walle, C., Nathwani, J. N., Shaffer, D. W., & Pugh, C. M. (2018). The hands and head of a surgeon: modeling operative competency with multimodal epistemic network analysis. American Journal of Surgery, 216(5), 835-840.
Ruis, A. R., Siebert-Evenstone, A. L., Pozen, R., Eagan, B., & Shaffer, D. W. (2019). Finding common ground: A method for measuring recent temporal context in analyses of complex, collaborative thinking. In Lund, K. Niccolai, G., Lavoué, E., Hmelo-Silver, C., Gwon, G. & Baker, M. (Eds.) A Wide Lens: Combining Embodied, Enactive, Extended, and Embedded Learning in Collaborative Settings: 13th International Conference on Computer Supported Collaborative Learning (CSCL), I (pp.136-143).
Siebert-Evenstone, A. L., Arastoopour Irgens, G., Collier, W., Swiecki, Z., Ruis, A. R., & Shaffer, D. W. (2017). In search of conversational grain size: Modelling semantic structure using moving stanza windows. Journal of Learning Analytics, 4(3), 123–139.
Siebert-Evenstone, A. L., & Shaffer, D. W. (2019). Cause and because: Using epistemic network analysis to model causality in the next generation science standards. In Eagan, B., Misfeldt, M., & Siebert-Evenstone, A. L., (Eds.) Advances in Quantitative Ethnography: ICQE 2019. (pp.245-255)
Tan, Y., Hinojosa, C., Marquart, C., Ruis, A., & Shaffer, D. W. (2022). Epistemic network analysis visualization. In B. Wasson & S. Zörgő (Eds.), Advances in Quantitative Ethnography Third International Conference, ICQE 2021, Virtual Event, November 6–11, 2021, Proceedings (pp. 129–143). Springer.
Tan, Y., Ruis, A.R., Marquart C., Cai, Z., Knowles, M., & Shaffer, D.W. (2022). Ordered network analysis. In 2022 International Conference on Quantitative Ethnography.
Wooldridge, A.R, Carayon, P., Shaffer, D. W., & Eagan, B. (2018). Quantifying the qualitative with epistemic network analysis: A human factors case study of task-allocation communication in a primary care team. IISE Transactions on Healthcare Systems Engineering, 8(1) (pp. 72–82).
Zhang, S., Gao, Q., Sun, M., Cai, Z., Li, H., Tang, Y., & Liu, Q. (2022). Understanding student teachers’ collaborative problem solving: Insights from an epistemic network analysis (ENA). Computers & Education, 183, 104485.