class: center, middle, inverse, title-slide # Social Network Analysis in R ## CORE Lab ### Department of Defense Analysis ### 2019-05-09 --- # Background -- Social network analysis (SNA) is one of the many analytic methods commonly used to understand groups and social formations. -- **The focus of this methodology is the relationships among individuals, which influence a person's behavior above and beyond the influence of his or her individual attributes (Valente 2010). ** -- As such, SNA enables analysts to understand how social ties help to define, enable, and constrain the knowledge, reach, and capacities of actors within groups (Cunningham, Everton, and Murphy 2016). -- While social network research is **not** exclusively dependent on software applications, these do increase the efficiency of researchers. Here we will focus on **R**. --- .center[ <br><br> <img src="day-2-SNAwithR_files/figure-html/unnamed-chunk-1-1.png" width="667" /> ] --- # Goals In this session we will explore the key features of the open-source programming language **R**, and a variety of packages - primarily **igraph** and **visNetwork**- developed for SNA. As such, we will: -- - Structure data in R to create network graphs in ORA or **igraph** -- - Create interactive visualizations with the **visNetwork** package -- - Build processes in **R** with **igraph** to streamline analysis <center> 🎉🎉🎂 </center> --- # CSV into R .center[ <br> ![](images/edgelist.png) ] --- # Getting Started: Examining Data Let's bring in data from an edgelist: ```r df <- read.csv(here::here("data/edgelist.csv"), header = TRUE) ```
--- # Excel into R .center[ <br> ![](images/excel_many_tabs.png) ] --- # Getting Started: Examining Data Let's bring in data from an Excel edgelist with multiple tabs: ```r get_xlsx <- function(.path){ if(endsWith(basename(.path), "xlsx")){ sheets <- readxl::excel_sheets(.path) listed_dfs <- purrr::map_dfr(sheets, function(X) readxl::read_excel(.path, sheet = X, col_types = "text")) return(listed_dfs) } } df <- get_xlsx(here::here("data/edgelist.xlsx")) ```
--- # CSVs into R .center[ <br> ![](images/manycsv.png) ] --- # Getting Started: Examining Data Let's bring in data from an multiple spread sheets: ```r files <- list.files(path="~/data/EdgelistSlices/", pattern = ".csv", full.names = TRUE ) df <- files %>% purrr::map_dfr(., ~.x %>% read_csv) ```
--- # Getting Started: Examining Data Let's bring in data from an RMS-like source: ```r con <- RMySQL::dbConnect(RMySQL::MySQL(), user=user, password=password, host= host, port=3306, dbname=dbname) df <- RMySQL::dbReadTable(conn=con, name = "edgelist") ```
--- # From R to ORA One strategy to incorporate R into your analysis in ORA is to manipulate and restructure your data in R as such: 1. Import data 2. Select relevant data fields (Source and Target) 3. Determine number of modes 4. Add required fields (Source Class - Source Id - Target Class - Target Id - Relationship) 5. Export data as CSV --- # From R to ORA: Advanced Import ```r df %>% rename(SourceId=eventId, TargetId=PID) %>% mutate(SourceClass="Event", TargetClass="Actor", Relationship="Co-Event") %>% select(SourceClass, SourceId, TargetClass, TargetId, Relationship) %>% write.csv(file="fromRtoORA_twomode.csv", row.names = F) ``` .center[ ![](images/advancedspreadsheet.png) ] --- # Two-mode Network in ORA .center[ <br> ![](images/ora1.png) ] --- # From R to ORA Additionally, you could manipulate your data to in R to save time in ORA: 1. Import data 2. Select relevant data fields (Source and Target) 3. Join actors based on event 4. Add required fields (Source Class - Source Id - Target Class - Target Id - Relationship) 5. Export data as CSV --- # From R to ORA: Advanced Import ```r targets <- df %>% select(eventId, PID) %>% rename(TargetID=PID) df %>% rename(SourceID=PID, Relationship=Primary.Type) %>% left_join(targets, by="eventId") %>% mutate(SourceClass="Actor", TargetClass="Actor") %>% select(SourceClass, SourceID, TargetClass, TargetID, Relationship) %>% filter(SourceID != TargetID) %>% write.csv(file="fromRtoORA_onemode.csv", row.names = F) ``` .center[ <br> ![](images/advancedspreadsheet2.png) ] --- # One-mode Network in ORA .center[ ![](images/ora2.png) ] --- # One-mode Network in ORA .center[ <br> ![](images/ora3.png) ] --- # SNA with **igraph** You can also do much (or all) of your network data analysis in R using **igraph**. ```r g <-graph_from_data_frame(df[1:2], directed = F) g ``` ``` ## IGRAPH d985866 UN-- 273 200 -- ## + attr: name (v/c) ## + edges from d985866 (vertex names): ## [1] E1 --2 E2 --87 E3 --225 E4 --225 E5 --246 ## [6] E6 --246 E7 --337 E8 --389 E9 --572 E10--574 ## [11] E11--574 E12--622 E13--640 E14--704 E15--704 ## [16] E16--716 E17--716 E18--728 E19--772 E20--772 ## [21] E21--774 E22--774 E23--793 E24--980 E25--1130 ## [26] E26--1142 E27--1142 E28--1142 E29--1346 E30--1568 ## [31] E31--1815 E32--1856 E33--2025 E34--2030 E35--2256 ## [36] E36--2258 E37--2261 E38--2387 E39--2524 E40--2588 ## + ... omitted several edges ``` --- # Plotting with **igraph** .pull-left[ ```r plot(g, # === vertex vertex.color="lightblue", vertex.size=5, vertex.shape="circle", # === vertex label vertex.label=NA, # === edge edge.color="grey", edge.arrow.size=0, edge.curved=TRUE, edge.lty="solid" ) ``` .center[📄 <-> 👨 = 😠] ] .pull-right[ ![](day-2-SNAwithR_files/figure-html/unnamed-chunk-14-1.png)<!-- --> ] --- # Folding with **igraph** .pull-left[ ```r df %>% graph_from_data_frame() %>% set.vertex.attribute( name="type", value=str_detect( V(.)$name, "^E") ) %>% bipartite.projection( which = "false" ) %>% get.data.frame("edges") ``` .center[👨 <-> 👨 = 😀] ] .pull-right[
] --- # Interactive Visuals with **visNetwork** ```r visNetwork::visIgraph(g) ```
--- # Custom Visuals with **visNetwork**
--- # Streamlining Analyis in R Say you wanted to import your data, fold it, and calculate one of the following topographic metrics: Metric | Explanation | Command --------|-------------|--------- Density | Number of observed ties divided by possible number of ties | `edge_density()` Average Degree | Sum of ties divided by number of actors | `mean(degree())` Global Clustering | Sum of each actor's clustering divided by number of actors | `transitivity()` --- # Streamlining Network Analysis ```r read.csv(here::here("data/edgelist.csv"), header = TRUE) %>% graph_from_data_frame() %>% set.vertex.attribute(name="type", value=str_detect(V(.)$name, "^E")) %>% bipartite.projection(which = "false") %>% edge_density() -> density ``` ```r test_density <- function(density_score){ cat("The density is: ", density_score, ". ", sep="") if(density_score <= 0.5){ cat("This network is sparse.") } if(density_score >= 0.5){ cat("This network is highly interconnected.") } } test_density(density) ``` ``` ## The density is: 0.00665412. This network is sparse. ``` --- # Streamlining Network Analysis Say you wanted to import your data, fold it, and calculate one of the following actor metrics: Metric | Explanation | Command --------|-------------|--------- Degree | Count of actor's ties | `degree()` Eigenvector | Weights an actor's centrality by the centrality of its neighbors | `evcent()` Closeness | Average geodesic distance from an actor to another | `closeness()` Betweenness | How often each actor lies on the shortest path between all other actors | `betweenness()` --- # Streamlining Network Analysis ```r read.csv(here::here("data/edgelist.csv"), header = TRUE) %>% graph_from_data_frame() %>% set.vertex.attribute(name="type", value=str_detect(V(.)$name, "^E")) %>% bipartite.projection(which = "false") %>% set.vertex.attribute(name="degree", value = degree(., mode = "total")) %>% set.vertex.attribute(name="eigenvector", value = evcent(.)$vector) %>% set.vertex.attribute(name="betweenness", value = betweenness(.)) %>% get.data.frame("vertices") ```
--- # Let Us Help [Github](https://github.com/NPSCORELAB) packages: ```r devtools::install_github("NPSCORELAB/COREmisc", upgrade="never") ``` ```r html_in <- htmltools::HTML(here::here("data/examplereturn.html")) table_out <- COREmisc::extract_html_table(html_in) head(table_out) ``` ``` ## # A tibble: 6 x 9 ## group_id Author Id `Item-type` `Thread-id` Time sent_to_key Text ## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 1 danie… 2.77… link 3.40282366… 2017… Recipients <NA> ## 2 1 danie… 2.77… link 3.40282366… 2017… Recipients <NA> ## 3 7 dee(5… 2.78… text 3.40282366… 2017… Recipients iigh… ## 4 7 dee(5… 2.78… text 3.40282366… 2017… Recipients iigh… ## 5 14 eric … 2.78… text 3.40282366… 2017… Recipients I'm … ## 6 14 eric … 2.78… text 3.40282366… 2017… Recipients I'm … ## # … with 1 more variable: sent_to_val <chr> ``` --- # Shiny ```r COREmisc::launch_shiny_app() ``` <center> ❤️ </center> --- <br> .center[ ### Questions? ] ![](images/wanna_see_the_code.png) Christopher Callaghan - cjcallag@nps.edu