As highlighted in this section’s short video, social networks permeate our daily lives and they play an important role in who and what we know, where we live, our beliefs and preferences, and the opportunities and the constraints with which we are presented in our lives. It is therefore unsurprising that many practitioners are looking for new ways to understand a variety of social networks in a more efficient and more effective manner. Social network analysis (SNA), especially when implemented in R, provides us with many of the tools needed to address this real-world challenge.
Formally, SNA is a set of theories and techniques used to understand social structures. Most practitioners use visualizations and SNA-based statistics to examine their social network data. Everton’s (2012) “Four Metrics Families” provides us with a useful way to conceptualize various aspects of social networks:
Network Topography - describes the overall structure of a social network, which allows us to assess its strengths and vulnerabilities.
Cohesive Subgroups - highlights clusters of actors who interact relatively more frequently with one another than with others.
Centrality - identifies actors who are located in structurally advantageous positions and who can diffuse information and/or whose removal may disrupt a social network.
Brokers and Bridges - a focus on brokerage is similar to centrality in that it helps us identify actors in structural advantageous positions; however, in this case we focus on the control over the flow of information and resources. Bridges are crucial relationships in a network, and formally, ties that would create a disconnect in a network of interest if removed.
The purpose of this document is to provide readers with a basic, practical understanding of SNA. We highly encourage users to check out the “Resources” section for some of our favorite references regarding relevant theories, concepts, functions, packages, and coding.
In this tutorial, we will use some basic SNA techniques in R to explore a data set pertaining to the Noordin Top terrorist network. Specifically, we will look at the network at a single snapshot in time right before the network’s first attack in August 2003 on the JW Marriott Hotel in Jakarta. Using Everton’s (2012) “Four Metric Families” as our guide, we will explore the following questions:
Topography
Cohesive Subgroups
We will put all actor-focused questions, including brokerage, under the umbrella of centrality for this tutorial.
Centrality (and brokerage)
From our “answers” to the questions above, we can develop hypotheses that we can test using more sophisticated techniques and statistical models. For demonstration purposes, however, we will keep things simple and limit our ourselves to data exploration and some basic informative techniques. Thus, we will explore our data and then describe our results to a hypothetical audience.
We will leverage four packages in the tutorial. The functions listed in Table 1 are the primary functions we will use for each package, but they do not represent an exhaustive list of the functions and arguments provided below (or that each package offers). We do not list igraph functions in Table 1 because of the large number of them used in this tutorial.
We will not leverage statnet even though it is a commonly used and excellent package for SNA. 1 The goal of Table 1 is simply to provide you with a quick preview; that is, our brief descriptions do not “do justice” for these excellent packages, so we recommend you check out their websites.
Package | Function | Short Description |
---|---|---|
igraph | See each metric family section | A straightforward package to estimate most social network analysis statistics. You can calculate measures related to network topography, centrality, subgroups, and brokers and bridges, among others. 2 |
visNetwork | visIgraph() , visNetwork() , visPhysics() & visOptions() |
A package to build interactive, network visualizations. 3 The functions in column 2 allows us to send igraph objects directly to visNetwork, visualize our data, control the “physics” a network, and customize interactive features. |
dplyr | arrange() , select() , & one_of() |
A tidyverse package for data manipulation. The functions listed in Column 2 allow us to sort variables, select and maintain variables of choice. 4 |
DT | datatable() |
The R package DT provides an R interface to the JavaScript library DataTables. 5 R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides filtering, pagination, sorting, and many other features in the tables. The function in column 2 allows us to create an HTML widget to display R data objects with DataTables. |
You will need to make sure those packages are installed (install.packages
) before calling them from your library.
According to Cunningham, Everton, and Murphy (2016), the Noordin data set can be described as follows:
“The foundation of the Noordin Top Terrorist network data were extracted from two International Crisis Group (ICG) reports (International Crisis Group 2006; International Crisis Group 2009b), which contain rich one- and two-mode data on a variety of relations and affiliations (friendship, kinship, meetings, etc.) along with significant attribute data (education, group membership, physical status, etc.). Because a single source for any network data raises the possibility of bias, the data were supplemented with additional open source literature in order fill gaps in the data and in order to generate monthly time codes from January 2001 through December 2010, which allow us to account for when actors enter and leave the network and examine the network longitudinally. The data were initially structured and analyzed by Defense Analysis students at the Naval Postgraduate School in the course”Tracking and Disrupting Dark Networks’ under the direction of Professors Sean Everton and Professor Nancy Roberts. Dan Cunningham reviewed, cleaned, and updated the data, in particular the time code information."
In this tutorial we will examine a binary, one-mode aggregation/combination of operational, communcation, and trust-based ties among 30 individuals involved in the August 2003 attack on the JW Marriott in Jakarta. These relationships together, which we will refer to as our combined network, are undirected and stored in an edge-list (Noordin_Edgelist.csv). The original data set contained 139 individuals, from which we extracted only those who were alive and active during August of 2003. Furthermore, we extracted out the largest component of the structure.
Finally, we will work with a single attribute, namely each individuals militant group affiliation (i.e., “Primary.Group.Affiliation”) to demonstrate selected techniques in igraph. The file you need containing attributes is, Noordin_Attribute.csv.
For more information on the comprehensive Noordin data set, see Cunningham, Everton, and Murphy (2016).
Let’s go ahead and import our network data and convert it to a “graph object” so that we can work with it in igraph. As previously stated, the data set is stored as an edge list so we’ll bring it using the following:
noordin_df <- as.data.frame(read.csv(file="data/Noordin_Edgelist.csv", header=TRUE))
As we did in other tutorials, we can use head()
to take a look at the first few observations in the newly created data frame.
head(noordin_df)
## from to Relationship
## 1 Abdul Rohim Noordin Mohammed Top Combined
## 2 Abu Dujanah Amrozi Combined
## 3 Abu Dujanah Azhari Husin Combined
## 4 Abu Dujanah Dulmatin Combined
## 5 Abu Dujanah Fathur Rahman Al- Ghozi Combined
## 6 Abu Dujanah Hambali Combined
The noordin_df
file has three columns: “from”, “to”, and " Relationship." The first observation (row one) represents the existence of an Combined (i.e., either an operational, communication, or trust) relationship between the Abdul Rohim and Noordin Top.
we will use the first two columns (hence the [1:2] below) to create a social network graph using igraph’s function called, graph_from_edgelist()
. This functions creates an igraph class object that can be used for graphing or statistical analysis (here we call the object, noordin_g
). Also note the embedded as.matrix
function; we use this here to convert the edge list into a matrix first because the graph_from_edgelist()
function requires a matrix.
noordin_g <- graph_from_edgelist(as.matrix(noordin_df[1:2]), directed=F)
We now have our network data stored as a graph object and can start exploring it.
Before diving into our questions, let’s get acquainted with the data set by calculating some rudimentary statistics as well as creating some basic visualizations in igraph. In terms of the former, the summary()
function gives us a basic description of the the network. The “UN” tells us our data are undirected and the “30 194” tells us we have 30 nodes and 194 relations among them. We will import attributes shortly.
summary(noordin_g)
## IGRAPH e845537 UN-- 30 194 --
## + attr: name (v/c)
We can get additional information using the E()
and V()
functions, which give us a list of all the relationships (i.e., “E” is for edges) and nodes (i.e., “V” is for vertices) in the data set.
E(noordin_g)
## + 194/194 edges from e845537 (vertex names):
## [1] Abdul Rohim --Noordin Mohammed Top
## [2] Abu Dujanah --Amrozi
## [3] Abu Dujanah --Azhari Husin
## [4] Abu Dujanah --Dulmatin
## [5] Abu Dujanah --Fathur Rahman Al- Ghozi
## [6] Abu Dujanah --Hambali
## [7] Abu Dujanah --Ismail1
## [8] Noordin Mohammed Top--Abu Dujanah
## [9] Noordin Mohammed Top--Ahmad Basyir
## [10] Ali --Aris Munandar
## + ... omitted several edges
V(noordin_g)
## + 30/30 vertices, named, from e845537:
## [1] Abdul Rohim Noordin Mohammed Top
## [3] Abu Dujanah Amrozi
## [5] Azhari Husin Dulmatin
## [7] Fathur Rahman Al- Ghozi Hambali
## [9] Ismail1 Ahmad Basyir
## [11] Ali Aris Munandar
## [13] Dani Chandra Hilman
## [15] Muchtar Salman
## [17] Umar2 Zulkarnaen
## [19] Umar Patek Apuy
## + ... omitted several vertices
Let’s plot the network in igraph with a few basic aesthetics. The plot()
function allows us to visualize the network, while vertex.color
customizes the color of the nodes, vertex.label.color
changes the node label color, and edge.curved
depicts ties in a curved format. We can add a tile with the main
parameter.
plot(noordin_g, vertex.color = "lightblue", vertex.label.color = "black",
edge.curved = 0.2, main = "Noordin Top Network (Aug 2003)" )
Figure 1: Simple plot of Noordin’s Network in August 2003
The parameters seen in the previous step are a bit confusing at first if you’re not quite comfortable with R. Table 2 provides a summary of commonly used plotting parameters. See ?igraph.plotting
, Katya Ognyanova’s excellent tutorial on SNA in igraph (https://kateto.net/networks-r-igraph), or igraph’s website (https://igraph.org/r/) for an exhaustive list.
Parameter | Short Description |
---|---|
vertex.color |
Adjusts node color. |
vertex.size |
Parameter for node size. Default is 15. |
vertex.shape |
Parameter for node shape (e.g., “sphere”) |
vertex.label |
Parameter for adjusting and setting node labels. |
vertex.label.font |
Parameter for node font. Font: 1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol |
vertex.label.family |
Adjusts font family. |
vertex.label.cex |
Parameter for changing font size. |
edge.color |
Parameter for setting edge color. |
edge.width |
Sets edge width (default = 1). |
edge.arrow.size |
Sets edge arrow size (default = 1). |
arrow.mode |
Sets arrow aesthetics: 0=no arrow, 1=back, 2=forward, 3=both. |
edge.curved |
Edge curvature (ranges from 0-1). |
Let’s do a few more aesthetic changes using some of the parameters shown in Table 2.
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0, main = "Noordin Network (Aug 2003)")
Figure 2: Simple plot of Noordin’s Network in August 2003
Now, let’s change the layout so we can see structure a bit differently. Here’s a circular layout, which like all layouts, can be created two different ways. The first way is to create a layout object separately and then embed the object within the plot()
function.
lay1<-layout_in_circle(noordin_g)
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout=lay1)
Figure 3: Circular plot of Noordin’s Network in August 2003
The second option is to set the layout type directly within the plot()
function.
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_in_circle)
Figure 4: Circular plot of Noordin’s Network in August 2003
We can compare various layout using the code below. The mfrow=c(1,2)
tells igraph to create multiple plots along a single row with two columns.
par(mfrow=c(2,2))
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_in_circle)# A circular layout
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_on_sphere)#A spherical layout
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_with_kk)# A spring embedded layout (kk = kamada kawai)
plot(noordin_g, vertex.color = "lightblue", vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_with_fr)# A force directed layout (fr = fruchterman and reingold)
Figure 5: Multiple Plots of Noordin’s Network in August 2003
Let’s bring in the “Primary Group Affiliation” attribute to give a little context to our data set.
atts<-as.data.frame(read.csv(file="data/Noordin_Attribute.csv", header=TRUE))
Take a quick look at the data if you haven’t already done so.
head(atts)
## id Primary.Group.Affiliation
## 1 Abdul Rohim Unaffiliated
## 2 Noordin Mohammed Top Jemaah Islamiyah
## 3 Abu Dujanah Jemaah Islamiyah
## 4 Amrozi Jemaah Islamiyah
## 5 Azhari Husin Jemaah Islamiyah
## 6 Dulmatin Jemaah Islamiyah
We will create a new node attribute called “Group” to represent our “Primary.Group.Affiliation” attribute. We can do this by extracting each value within the “Primary.Group.Affiliation” column when the name in the “id” column matches a node’s name in our network object (i.e., noordin_g).
V(noordin_g)$Group<-as.character(atts$Primary.Group.Affiliation[match(V(noordin_g)$name,atts$id)])
Print out the new vertex attribute using the following code.
V(noordin_g)$Group
## [1] "Unaffiliated" "Jemaah Islamiyah" "Jemaah Islamiyah"
## [4] "Jemaah Islamiyah" "Jemaah Islamiyah" "Jemaah Islamiyah"
## [7] "Jemaah Islamiyah" "Jemaah Islamiyah" "Unaffiliated"
## [10] "KOMPAK" "Unaffiliated" "KOMPAK"
## [13] "KOMPAK" "Darul Islam" "Jemaah Islamiyah"
## [16] "KOMPAK" "Darul Islam" "Jemaah Islamiyah"
## [19] "Jemaah Islamiyah" "Darul Islam" "Darul Islam"
## [22] "Unaffiliated" "Jemaah Islamiyah" "Jemaah Islamiyah"
## [25] "Jemaah Islamiyah" "Jemaah Islamiyah" "Darul Islam"
## [28] "Unaffiliated" "Unaffiliated" "Unaffiliated"
We are now ready to build upon our previous visualizations by coloring the nodes by their group affiliation. We will assign colors to each group category and create a “color” attribute to which we can refer back when we want to color nodes by militant group affiliation. The gsub()
function helps us do this.
V(noordin_g)$color<-V(noordin_g)$Group #First, assign the "Primary.Group.Affiliation"" attribute as the vertex color.
V(noordin_g)$color<-gsub("Unaffiliated","orange",V(noordin_g)$color) #Unaffiliated will be orange.
V(noordin_g)$color<-gsub("KOMPAK","red",V(noordin_g)$color) #KOMPAK nodes will be red.
V(noordin_g)$color<-gsub("Jemaah Islamiyah","blue",V(noordin_g)$color) #JI nodes will be blue.
V(noordin_g)$color<-gsub("Darul Islam","green",V(noordin_g)$color) #DI nodes will be green.
Here we will keep things simple and do a single layout using kamada kawai
like we did in Figure 5.
plot(noordin_g,vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_with_kk)
Figure 6: Noordin’s Network in August 2003, Node Color by Group
Now, plot the same visualization but with a legend.
plot(noordin_g,vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0,
main = "Noordin Network (Aug 2003)", layout = layout_with_kk)
colrs<-c("orange", "red", "blue", "green")# We will use these colors in the legend; they match the colors in Figure 6.
legend(x=-1.5, y=-1.1, c("Unaffiliated","KOMPAK", "JI", "DI"), pch=21,
pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1) # The pt.bg assigns the colors to the appropriate group while other parameters set the size, text, and location of the legend.
Figure 7: Noordin’s Network in August 2003, Node Color by Group
The first set of questions we hope to explore is, how interconnected was Noordin’s network prior to the attack? Did Noordin’s network appear to be structured around him? Or was the network decentralized?
Based on these question, we will utilize a handful of metrics that fall under the umbrella of network topography. Table 3 (adapted from Cunningham, Everton, and Murphy (2016)) outlines relevant measures, including a definition, potential ways to interpret the results, and caveats to keep in mind when using them.
Measure | Definition | Interpretation | Caveat |
---|---|---|---|
Density | The total number of ties in a network divided by the total possible number of ties in that network. The output is a range from 0 to 1. | Indicates how interconnected a network is, which sheds light onto potential trade-offs network may have to consider (e.g., efficiency vs. operational security). For example, a dense network, with a focus on strong ties, may have a hard time getting resources from the outside. | Should not be used to compare networks of different sizes. |
Average Degree | The sum of ties in a network divided by the number of actors in the network. | May indicate how interconnected a network is, which sheds light into potential trade-offs. | Networks that adopt a cell-like structure can be locally dense but globally sparse. |
Centralization | The ratio of the actual sum of differences in actor centrality of the theoretical maximum, yielding a score between 0 and 1. | Centralization indicates how centralized, or decentralized, a network is. A network with high degree centralization could indicate that one or few actors are relatively active, as compared to the rest of the actors. | Can be confused with centrality, which is a node level measure. |
We can calculate all of these with igraph functions. First, let’s calculate graph density.
edge_density(noordin_g)
## [1] 0.445977
A graph density of 0.445977 suggests the network is neither too sparse, nor too dense. We can see a relatively dense core of the network but several peripheral actors maintained only a few ties to others during that period of time. Now, let’s take a look at average degree. While there is no function for that, R is pretty good at calculating averages. Therefore, we need to calculate the degree for all nodes and then take an average. This step is easily done in R.
mean(degree(noordin_g))
## [1] 12.93333
This result tells us that, on average, actors have 12.9333333 connections to others in the network. From these two measures of interconnectedness/cohesion (i.e., density and average degree), it appears Noordin kept many folks close for operational purposes but also he maintained direct and indirect connections to “outsiders” who could provide resources and operational guidance to him and his close associates. Existing research into this network suggests a similar pattern (Everton and Roberts (2011); Everton and Cunningham (2016)).
To explore our final question regarding centralization, we can use igraph’s centralization.degree()
function. Note we included the mode="total"
argument because our network is undirected. The results will tell us each node’s number of connections (i.e., degree centrality) under $res
.
centralization.degree(noordin_g, mode="total", normalized=TRUE)
## $res
## [1] 2 28 14 18 26 26 20 22 20 2 14 16 14 14 14 22 14 24 14 6 8 2 8
## [24] 8 10 14 2 2 2 2
##
## $centralization
## [1] 0.5195402
##
## $theoretical_max
## [1] 870
The results indicate the network was fairly centralized prior to its first attack. As with all topographic measures, interpreting the results can be sort of tricky without comparing them to similar structures, such as a comparison of the same network over time. Everton and Cunningham (2016) did just that and found that while Noordin’s network was not substantially centralized in August 2003, it increasingly became centralized over time and prior to its major terrorist attacks.
The next question we want to explore is, were there clusters consisting of individuals from various groups? How can we describe them? This question is interesting because we know that Noordin’s network was comprised of individuals from various militant groups across the region.
As with network topography, we have many options to choose from in terms of subgroup analysis. Here we will look at a sample of those available. Table 4 (adapted from Cunningham, Everton, and Murphy (2016)) outlines relevant measures, including a definition, potential ways to interpret the results, and caveats to keep in mind when using them.
Measure | Definition | Interpretation | Caveat |
---|---|---|---|
Walktrap | An agglomerative clustering approach that models a random walker who would tend to remain in dense part of a network (i.e., communities) since there are fewer paths out than within. Actors are merged into subgroups according to their similarity, estimated through random walks. | Also helps analysts identify larger communities, or relatively dense clusters, within dark networks, which highlights potential seams, or vulnerabilities, between those communities. | Tends to exhibit better sensitivity in dense networks than other community detection models, except for Spinglass. |
Girvan-Newman | Similar to faction analysis in that subgroups are defined as having more ties within and fewer ties between groups than would be expected in a random graph of the same size with the same number of ties. Focuses on edge betweenness. | Helps analysts identify larger communities, or relatively dense clusters, within dark networks, which highlights potential seams, or vulnerabilities, between those communities. | Calculated differently than other community detection algorithms because it begins an iterative process by calculating edge betweenness and subsequently removing the tie with the highest score. Although the approach is intuitive, it tends to exhibit poor sensitivity with dense networks. |
In terms of writing scripts, running these algorithms is fairly straightforward. For both types of subgroups, we will use the appropriate function and then estimate a modularity score, which compares the ties within and across subgroups (i.e., clusters) to what one would expect in a random graph of the same size and having the same number of ties.
cw<-cluster_walktrap(noordin_g)#Walktrap
modularity(cw)
## [1] 0.3814964
eb <- cluster_edge_betweenness(noordin_g)#Girvan Newman
modularity(eb)
## [1] 0.3593899
We will add one item here for the Girvan-Newman algorithm, namely membership()
, which tells us the “community” to which each actor belongs according to the number of clusters that provides the highest possible modularity score.
membership (eb)
There is some debate as to what constitutes a “good” modularity score. A discussion of this topic is beyond the scope of this tutorial, so we will go with walktrap because its modularity suggests it does a bit better on this network than Girvan-Newman, but not by much.
Let’s take a look at the network’s subgroups based on walktrap using convex hulls, which can provide us with a nice visual depiction of them. Here we can compare our visual in Figure 6 depicting each individual’s affiliation with the subgroup to which they belong.
op<-par(mfrow = c(1,2))# Multiple plots with 1 row and 2 columns
## Subgroups
plot(cw, noordin_g,vertex.size = 10, vertex.color = "lightgray",vertex.shape = "sphere", vertex.label = NA, vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0, main = "Subgroups", layout = layout_with_kk)
##Figure 6: Group Affiliation
plot(noordin_g,vertex.size = 10, vertex.shape = "sphere", vertex.label = NA,
vertex.label.cex=0.75, edge.color = "gray", edge.arrow.size = 0, edge.curved = 0, main = "Group Affiliation)", layout = layout_with_kk)
colrs<-c("orange", "red", "blue", "green")
legend(x=-1.5, y=-1.1, c("Unaffiliated","KOMPAK", "JI", "DI"), pch=21,
pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
Figure 8: Noordin’s Network in August 2003, Subgroups vs. Affiliations
Figure 8 suggests we have clusters containing individuals from different militant groups. For instance, we can see the nodes within the blue convex hull (i.e.,the top left subgroup) have representatives from all four affiliations when we compare it to the network on the right (i.e., Unaffiliated, KOMPAK, JI, and DI). At the same time, the subgroup represented by the green convex hull (i.e., middle) is made up mostly of JI members who appear to make up the core of the network.
The final set of questions we want to examine lead us to centrality and brokerage. Specifically, who were key individuals in the operation from a structural perspective besides Noordin? Also, who were the key facilitators of information during the operation?
The centrality metric family is perhaps the most intuitive and the most commonly used. The basic idea is to identify structurally “important” actors. However, the variety of interpretations of what it means to be central or “important”" means that no single measure can be used as a “silver bullet” in SNA. Instead, analysts should focus on using these measures to describe the potential importance of each actor in the network.
Some of the most relevant measures are:
Measure | Definition | Interpretation | Caveat |
---|---|---|---|
Degree | Count of an actor’s ties. | Actor activity; Direct power or influence, or ability to be influenced by others | In some cases, well-connected actors are the result of biased collection. |
Betweenness | How often each actor lies on the shortest path between all pairs of actors. | Brokerage potential; Gatekeepers; Boundary Spanners | Betweenness assumes a desire for efficiency. Actors, resources, and information may not always follow shortest paths. |
Closeness | The average shortest path (i.e., geodesic) distance from an actor to every other actor in the network. | Actor levels of accessibility to others, and to material and non- material goods. | Not designed for use with disconnected networks. |
Eigenvector | Weights an actor’s degree centrality by the degree centrality of its neighbors. | Indirect influence or power; Potential social capital. | In well-connected networks (or sub- networks, such as cliques), it is often difficult to identify a single, or a few, potentially powerful actors. |
We can calculate each measure individually using the following scripts. Note we did not provide the outputs for each function but we will for the dynamic table.
degree(noordin_g, mode = "total", loops=F)# Active Individuals
betweenness(noordin_g, directed = F, normalized = T)# Potential Brokers
eigen_centrality(noordin_g, directed = F, weights = NULL)# People connected to well-connected others
closeness(noordin_g, mode = "all", weights = NULL, normalized = T)# People with potential access to others, materails, etc.
Another option is to calculate each measure, attach it to a data frame, and render it as variables in a dynamic table using the DT package. First, let’s recalculate centrality and put them into a data frame.
metrics<-data.frame(id = V(noordin_g)$name,
Degree = degree(noordin_g,
mode="total",
loops=FALSE,
normalized = FALSE),
Betweenness = round(betweenness(noordin_g,
directed = F,
weights = NULL,
normalized = T),
digits = 2),
Eigenvector = eigen_centrality(noordin_g,
directed=F,
weights = NULL),
Closeness = round(closeness(noordin_g,
mode="total",
weights = NULL,
normalized=T),
digits = 2))
Now let’s put them in a dynamic table so we can interact with the results.
DT::datatable(metrics %>%
arrange(desc(Degree))%>%
select(one_of(c("id","Degree", "Betweenness", "Eigenvector.vector", "Closeness"))),
class = 'cell-border stripe',
rownames = FALSE,
filter="top",
selection="multiple",
escape=FALSE,
options=list(scrollX=TRUE,
pageLength=10,
sDom='<"top">lrt<"bottom">ip')
)
Table 6: Noordin Network August 2003 - Centrality
The centrality results suggest several individuals were key actors in the network during mid-2003. According to multiple measures, we can see that people such as Azhari Husin, Dulmatin, Zulkarnaen, Ismail1, and Fathur Rahman Al-Ghozi (to name a few) all were structurally important individuals. In fact, much has been written about these individuals and the roles they played in numerous terrorist attacks and plots. Because this is a simple demonstration, we will limit our interpretation to that. However, centrality is by no means the end of an analysis but rather a set of indicators about which actors we should analyze more deeply.
With R you have many options to produce interactive visualizations for your reports (e.g., Markdown, which what you’re looking at) 6, briefs (e.g., Reveal JS 7 and Xaringan 8, and/or interactive tools/dashboards (e.g., R Shiny9 and flexdashboard 10). An in-depth tutorial of these options is beyond the scope of this write-up, but we highly recommend that you explore these options as you become more comfortable with R.
One package that is useful for interactive social network visualizations is visNetwork. Unfortuneately, this package is limited in terms of available statistics to analyze networks of interest. In fact, it does not offer users the ability to leverage the metric families we’ve discussed here; however, we can run actor-level measures (e.g., centrality and brokerage) in other programs, store the results as actor attributes, and then use visual properties (e.g, size) to interact with our data.
As with the other packages we’ve used so far, we will keep things simple and build only a few of the same visualizations from above. This time, however, we will include some interactivity in our visualizations.
Several ways exist for you to get your data into visNetwork depending on your starting point. In this example, we can take our igraph object and send it directly to VisNetwork. We can do this in at least two ways (i.e., toVisNetworkData
or visIgraph()
). Note there are tradeoffs for each option, such as the below (i.e., visIgraph()
), which will maintain the colors we produced in igraph.
visIgraph(noordin_g)
Figure 9: Noordin’s Network in August 2003, visNetwork
to Igraph
We recognize that you may not start from igraph but rather begin with a node and edge list in csv. We will focus on this approach here.
Let’s re-import our Noordin network edge list and attribute one more time as if we were starting from scratch. Note we will bring in a slightly different version of our attribute file for this portion of the tutorial; we’ve added degree centrality scores to our attribute file so we can work with some additional visual properties.
edges <- as.data.frame(read.csv(file="data/Noordin_Edgelist.csv", header=TRUE))
nodes <- as.data.frame(read.csv(file="data/Noordin_Attribute (visNet).csv", header=TRUE))
Let’s visualize the network using a few basic lines of script in which we tell visNetwork what the nodes and edges are as well as the layout type.
visNetwork::visNetwork(nodes=nodes,
edges=edges) %>%
visNetwork::visPhysics(enable=T,
solver = "forceAtlas2Based")
Figure 10: Noordin’s Network in August 2003
We can can adjust the nodes color
, shape
, size
, label
, and title
. To do so, we can add these variables to the nodes data frame.
#First resize all nodes to reflect degree centrality.
nodes$size <- nodes$Degree
#Now reshape the vertices
nodes$shape <- "dot"
#Adjust node color based on group affiliation
nodes$color[nodes$Primary.Group.Affiliation =="Unaffiliated"] <- "orange"
nodes$color[nodes$Primary.Group.Affiliation=="KOMPAK"] <- "red"
nodes$color[nodes$Primary.Group.Affiliation=="Jemaah Islamiyah"] <- "blue"
nodes$color[nodes$Primary.Group.Affiliation=="Darul Islam"] <- "green"
#In order to reduce the number of colors in one visualization, recolor all edges to the same color.
edges$color <- "slategrey"
#Remove labels and add titles
nodes$label <- ""
nodes$title <- nodes$id
We can now re-render the network with the edits we just made.
visNetwork::visNetwork(nodes=nodes,
edges=edges) %>%
visNetwork::visPhysics(enable=T,
solver = "forceAtlas2Based")
Figure 11: Noordin’s Network in August 2003
Let’s add the ability to see actor attributes when we hover over the nodes.
nodes$title <- paste("<b>Name: </b>", nodes$id, "<br>",
"<b>Affiliation: </b>", nodes$Primary.Group.Affiliation, "<br>")
Also, let’s use the width = 100%
argument to maximize the visualization window, the main =
argument to add a title, and the visLegend()
function to add a legend.
visNetwork::visNetwork(nodes=nodes,
edges=edges, width = "100%", main = "Noordin Network (Aug 2003)") %>%
visNetwork::visPhysics(enable=T,
solver = "forceAtlas2Based") %>%
visNetwork::visLegend(addNodes=list(
list(label="Unaffiliated", shape="square", size=5, color="orange"),
list(label="KOMPAK", shape="square", size=5, color="red"),
list(label="JI", shape="square", size=5, color="blue"),
list(label="DI", shape="square", size=5, color="green")),
useGroups=FALSE,
position = "left")
Figure 12: Noordin’s Network in August 2003, Affiliations
Finally, let’s add a drop down menu to select nodes based on id
(visoptions(nodeIdSelection = TRUE)
).
visNetwork::visNetwork(nodes=nodes,
edges=edges, width = "100%", main = "Noordin Network (Aug 2003)") %>%
visNetwork::visPhysics(enable=T,
solver = "forceAtlas2Based") %>%
visNetwork::visLegend(addNodes=list(
list(label="Unaffiliated", shape="square", size=10, color="orange"),
list(label="KOMPAK", shape="square", size=10, color="red"),
list(label="JI", shape="square", size=10, color="blue"),
list(label="DI", shape="square", size=10, color="green")),
useGroups=FALSE,
position = "left") %>%
visNetwork::visOptions(nodesIdSelection = TRUE, highlightNearest = TRUE)
Figure 13: Noordin’s Network in August 2003, Affiliations
Remember, this tutorial is very basic and designed to get you interested in using R for SNA. Many useful resources exist that go in far more depth than this document. Here are a few resources (i.e., many other great ones exist; these are just some recent and great resources) to check out pertaining to SNA in R:
Below is a list of useful resources for those who want to learn SNA and who are seeking additional information. This list is certainly not exhaustive but it is a great place to start.
This tutorial did not cover an important process (well, several actually) that you may need to follow as you leverage SNA. The process of converting two-mode data (e.g., people connected to organizations and accounts to comment threats) to one-mode is an important step in many SNA-based investigations.
We will use a different data set to demonstrate this process, namely a two-mode network of hypothetical gang members and their affiliations to fictional gangs. The file is an edge list called, “Affiliation.csv”.
We can convert this two-mode network to a one-mode network rather easily using igraph’s bipartite.mapping()
and bipartite.projection()
functions. First, create an affiliation graph using the graph_from_edgelist()
function used previously.
affiliation <- as.data.frame(read.csv(file="data/Affiliation.csv", header=TRUE))
affiliationNet <- graph_from_edgelist(as.matrix(affiliation[1:2]), directed=F)
Now use the bipartite.mapping()
function to evaluate whether the vertices of a network can be mapped to two sets of nodes in a network. In essence, this function checks whether or not a graph is bipartite (i.e., two-mode).
bipartite.mapping(affiliationNet)
## $res
## [1] TRUE
##
## $type
## [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE
The bipartite.mapping()
function returns two elements:
$res
) where TRUE
indicates that a graph can bipartite, FALSE
otherwise.Once determined that a graph is two-mode, the “type” argument can be assigned to each node in the network as follows:
V(affiliationNet)$type <- bipartite.mapping(affiliationNet)$type
Now that the argument “type” has been assigned as an attribute to each node on the graph, we can begin to manipulate the two-mode network. First, let’s graph it.
plot(affiliationNet,
layout=layout.bipartite,
vertex.size=5,
vertex.label=NA)
Figure 14: Gang Two-Mode/Bipartite Network
Now that we have mapped the network, we can use the bipartite.projection()
function to calculate the actual one-mode projections. In other words, the bipartite.projection()
function serves as the means to create two one-mode projections: one projections for person-to-person coaffiliation ties, and another for organization-to-organization comembership ties.
coaffiliation <- bipartite.projection(affiliationNet)$proj1
comembership <- bipartite.projection(affiliationNet)$proj2
Let’s examine the coaffiliation network:
coaffiliation
## IGRAPH c08b108 UNW- 32 196 --
## + attr: name (v/c), weight (e/n)
## + edges from c08b108 (vertex names):
## [1] All City--Blood Messiah All City--Bat G.
## [3] All City--Big G. All City--Blaze
## [5] All City--Bloodhound All City--Brains
## [7] All City--Clown All City--Droopy
## [9] All City--Fast Trigger All City--Goldie
## [11] All City--O.G. All City--Smiley
## [13] All City--Sharpie All City--Baby Face
## [15] All City--Bananas All City--Book Collector
## + ... omitted several edges
The resulting network contains 32 nodes and 196 edges. These new edges can be extracted from the graph and saved into a new data frame.
coaffiliation_edgelist <- as.data.frame(get.edgelist(coaffiliation))
Again, we need to create a graph from this duelist using the graph_from_edgelist()
function.
coaff_net <- graph_from_edgelist(as.matrix(coaffiliation_edgelist[1:2]),
directed = FALSE)
coaff_net
## IGRAPH 453d14b UN-- 32 196 --
## + attr: name (v/c)
## + edges from 453d14b (vertex names):
## [1] All City--Blood Messiah All City--Bat G.
## [3] All City--Big G. All City--Blaze
## [5] All City--Bloodhound All City--Brains
## [7] All City--Clown All City--Droopy
## [9] All City--Fast Trigger All City--Goldie
## [11] All City--O.G. All City--Smiley
## [13] All City--Sharpie All City--Baby Face
## [15] All City--Bananas All City--Book Collector
## + ... omitted several edges
We can plot the one-mode co-affiliation network as we did before.
plot(coaff_net,
layout=layout_with_kk,
vertex.size=5,
vertex.label=NA)
Figure 15: Gang One-Mode Projection
Footnotes:
A suite or “wrapper” of several SNA packages ranging from descriptive measures (e.g., centrality) to advanced modeling (http://www.statnet.org/). Each package provides users with unique functionality. You can get access to all of these packages by installing statnet.↩
See dplyr at tidyverse’s website, https://dplyr.tidyverse.org/.↩
https://cran.r-project.org/web/packages/revealjs/index.html.↩