class: center, middle, inverse, title-slide # Why R? ## CORE Lab ### Department of Defense Analysis ### 2019-08-07 --- <style> .xxl { font-size: 300% } .xl { font-size: 200% } .l { font-size: 130% } .m { font-size: 80% } .s { font-size: 60% } </style> # Questions -- * .l[Who has duties that require them to deal with data on a regular basis?] -- * .l[Who thinks they spend __most__ of their time cleaning data (reshaping, transforming, wrangling, etc.)?] -- * .l[Who thinks their data-related duties will __increase__ in the future?] -- * .l[Who feels they have received __adequate__ training/resources to accomplish their data-related duties?] -- * .l[Who feels confident that they can _efficiently_ tackle __all__ of their data-related challenges?] -- --- # Questions <br> <br> <br> <br> <br> <br> <br> <center> .xl[Who said yes to all of these questions?] </center> --- # Questions <br> <br> <br> <br> <br> <br> <center> .xl[Who has ever wondered... <br> <br> ___"isn't there's a better way??"___] </center> --- # What is __R__? -- ## Their words... -- * .l[_The __R__ language is widely used among statisticians and data miners for developing statistical software and data analysis._] <br> -- * .l[___R__ is a language and environment for statistical computing and graphics._] <br> -- * .l[___R__ is the lingua franca of statistics._] --- # What is R? ## Our words... -- * .l[__R__ represents a comprehensive, __open-source__ ecosystem of tools facilitating __data__...] -- + .l[__ingestion__] -- + .l[__carpentry__] -- + .l[__visualization__] -- + .l[__analysis__] -- + .l[__automation__] -- + .l[__reporting__] --- # Data Ingestion <br><br><br><br> <center> .xxl[Spreadsheets] </center> --- # Data Ingestion ### Spreadsheets <img src="spreadsheet-single.PNG" width="512" style="display: block; margin: auto;" /> --- # Data Ingestion ### Spreadsheets -- ```r file_path <- "spreadsheets/nyc_vehicle_thefts.csv" ``` -- * `file_path` is a variable -- * `<-` is R's assignment operator -- * `"spreadsheets/nyc_vehicle_thefts.csv"`is stored in `file_path` -- ```r nyc_vehicle_thefts <- read_csv(file_path) # read file into R and store in variable ``` -- * `nyc_vehicle_thefts` is a variable -- * `read_csv()` is a function -- + `read_csv(file_path)` reads/loads/imports the data located at `file_path` -- + `nyc_vehicle_thefts` now stores a table, which we call a data frame .footnote[ [1] Source {`crimedata`} Package, get it w/ `install.packages("crimedata")` ] --- # Data Ingestion ### Spreadsheets ```r nyc_vehicle_thefts # "print" resulting data frame ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 9,532 x 6</span><span> uid date_time longitude latitude location_catego… </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'> 1</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>08:00:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.9 street </span><span style='color: #555555;'> 2</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>08:15:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.8 street </span><span style='color: #555555;'> 3</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>09:00:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.9 street </span><span style='color: #555555;'> 4</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>09:00:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 street </span><span style='color: #555555;'> 5</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>10:00:00</span><span> -</span><span style='color: #BB0000;'>74.1</span><span> 40.6 street </span><span style='color: #555555;'> 6</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>11:03:00</span><span> -</span><span style='color: #BB0000;'>74.0</span><span> 40.6 street </span><span style='color: #555555;'> 7</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>11:31:00</span><span> -</span><span style='color: #BB0000;'>74.0</span><span> 40.8 leisure </span><span style='color: #555555;'> 8</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>12:28:00</span><span> -</span><span style='color: #BB0000;'>74.2</span><span> 40.6 hotel </span><span style='color: #555555;'> 9</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>14:30:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 street </span><span style='color: #555555;'>10</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>14:45:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.7 street </span><span style='color: #555555;'># … with 9,522 more rows, and 1 more variable: location_type </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span></CODE></PRE> --- # Data Ingestion ### Spreadsheets <table class="table table-striped table-hover table-condensed table-responsive" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> uid </th> <th style="text-align:left;"> date_time </th> <th style="text-align:right;"> longitude </th> <th style="text-align:right;"> latitude </th> <th style="text-align:left;"> location_category </th> <th style="text-align:left;"> location_type </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 12853401 </td> <td style="text-align:left;"> 2015-01-01 08:00:00 </td> <td style="text-align:right;"> -73.86153 </td> <td style="text-align:right;"> 40.87951 </td> <td style="text-align:left;"> street </td> <td style="text-align:left;"> street </td> </tr> <tr> <td style="text-align:right;"> 12853462 </td> <td style="text-align:left;"> 2015-01-01 08:15:00 </td> <td style="text-align:right;"> -73.94422 </td> <td style="text-align:right;"> 40.79640 </td> <td style="text-align:left;"> street </td> <td style="text-align:left;"> street </td> </tr> <tr> <td style="text-align:right;"> 12853518 </td> <td style="text-align:left;"> 2015-01-01 09:00:00 </td> <td style="text-align:right;"> -73.88682 </td> <td style="text-align:right;"> 40.85644 </td> <td style="text-align:left;"> street </td> <td style="text-align:left;"> street </td> </tr> <tr> <td style="text-align:right;"> 12853522 </td> <td style="text-align:left;"> 2015-01-01 09:00:00 </td> <td style="text-align:right;"> -73.80170 </td> <td style="text-align:right;"> 40.67312 </td> <td style="text-align:left;"> street </td> <td style="text-align:left;"> street </td> </tr> <tr> <td style="text-align:right;"> 12853605 </td> <td style="text-align:left;"> 2015-01-01 10:00:00 </td> <td style="text-align:right;"> -74.08174 </td> <td style="text-align:right;"> 40.61732 </td> <td style="text-align:left;"> street </td> <td style="text-align:left;"> street </td> </tr> <tr> <td style="text-align:right;"> 12853710 </td> <td style="text-align:left;"> 2015-01-01 11:03:00 </td> <td style="text-align:right;"> -73.95344 </td> <td style="text-align:right;"> 40.58318 </td> <td style="text-align:left;"> street </td> <td style="text-align:left;"> street </td> </tr> </tbody> </table> --- # Data Ingestion ### Spreadsheets <img src="spreadsheet-folder.PNG" width="150%" height="150%" style="display: block; margin: auto;" /> --- # Data Ingestion ### Spreadsheets ```r folder_of_spreadsheets <- "spreadsheets" csv_pattern <- "\\.csv$" combo_data <- folder_of_spreadsheets %>% # take folder dir(pattern = csv_pattern, full.names = TRUE) %>% # gather all .csv files map_dfr(read_csv) # read all and combine rows combo_data ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 19,064 x 6</span><span> uid date_time longitude latitude location_catego… </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'> 1</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-21 </span><span style='color: #555555;'>14:23:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 2</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-03-13 </span><span style='color: #555555;'>22:06:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 3</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-03-16 </span><span style='color: #555555;'>01:07:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 4</span><span> 1.30</span><span style='color: #555555;'>e</span><span>7 2015-04-10 </span><span style='color: #555555;'>21:30:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 5</span><span> 1.30</span><span style='color: #555555;'>e</span><span>7 2015-04-11 </span><span style='color: #555555;'>03:14:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.8 transportation </span><span style='color: #555555;'> 6</span><span> 1.30</span><span style='color: #555555;'>e</span><span>7 2015-05-01 </span><span style='color: #555555;'>19:32:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 7</span><span> 1.30</span><span style='color: #555555;'>e</span><span>7 2015-05-10 </span><span style='color: #555555;'>21:00:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 8</span><span> 1.30</span><span style='color: #555555;'>e</span><span>7 2015-05-13 </span><span style='color: #555555;'>14:00:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'> 9</span><span> 1.30</span><span style='color: #555555;'>e</span><span>7 2015-06-02 </span><span style='color: #555555;'>20:30:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 transportation </span><span style='color: #555555;'>10</span><span> 1.31</span><span style='color: #555555;'>e</span><span>7 2015-06-16 </span><span style='color: #555555;'>17:30:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.8 transportation </span><span style='color: #555555;'># … with 19,054 more rows, and 1 more variable: location_type </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span></CODE></PRE> --- # Data Ingestion <br> <br> <br> <br> <br> <center> .xxl[Databases] </center> --- # Data Ingestion ### Databases * Connect Directly to Database -- ```r library(DBI) library(SQLite) DB_connection <- dbConnect(drv = SQLite(), dbname = "<HIGH-SPEED-DEPARTMENT'S-RMS>") ``` -- * Upload Data ```r dbWriteTable( conn = DB_connection, name = "DB_combo_data", # name to use in RMS value = combo_data # variable to upload ) dbListFields(DB_connection, name = "DB_combo_data") # columns now in RMS ``` ``` [1] "uid" "date_time" "longitude" [4] "latitude" "location_category" "location_type" ``` --- # Data Ingestion ## Databases #### Structured Query Language (SQL) -- ```sql SELECT uid, location_category, location_type -- SELECT these columns... FROM DB_combo_data -- FROM this table... WHERE location_category is 'retail' -- WHERE this condition is met... LIMIT 6 -- LIMIT result to first 6 rows ``` -- <table class="table table-striped table-hover table-condensed table-responsive" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> uid </th> <th style="text-align:left;"> location_category </th> <th style="text-align:left;"> location_type </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 12886009 </td> <td style="text-align:left;"> retail </td> <td style="text-align:left;"> convenience store </td> </tr> <tr> <td style="text-align:right;"> 12984487 </td> <td style="text-align:left;"> retail </td> <td style="text-align:left;"> convenience store </td> </tr> <tr> <td style="text-align:right;"> 13025248 </td> <td style="text-align:left;"> retail </td> <td style="text-align:left;"> convenience store </td> </tr> <tr> <td style="text-align:right;"> 13205971 </td> <td style="text-align:left;"> retail </td> <td style="text-align:left;"> convenience store </td> </tr> <tr> <td style="text-align:right;"> 13258034 </td> <td style="text-align:left;"> retail </td> <td style="text-align:left;"> convenience store </td> </tr> <tr> <td style="text-align:right;"> 12861488 </td> <td style="text-align:left;"> retail </td> <td style="text-align:left;"> gas station </td> </tr> </tbody> </table> --- # Data Ingestion ## Databases #### Let `R` Generate `SQL` for You ```r library(dbplyr) DB_connection %>% tbl(from = "DB_combo_data") %>% # from <this table> filter(location_category == "retail") %>% # filter <for these rows> select(uid, location_category, location_type) %>% # select <these columns> head() # <1st 6 rows of result> ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># Source: lazy query [?? x 3]</span><span> </span><span style='color: #555555;'># Database: sqlite 3.29.0 [:memory:]</span><span> uid location_category location_type </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'>1</span><span> 12</span><span style='text-decoration: underline;'>886</span><span>009 retail convenience store </span><span style='color: #555555;'>2</span><span> 12</span><span style='text-decoration: underline;'>984</span><span>487 retail convenience store </span><span style='color: #555555;'>3</span><span> 13</span><span style='text-decoration: underline;'>025</span><span>248 retail convenience store </span><span style='color: #555555;'>4</span><span> 13</span><span style='text-decoration: underline;'>205</span><span>971 retail convenience store </span><span style='color: #555555;'>5</span><span> 13</span><span style='text-decoration: underline;'>258</span><span>034 retail convenience store </span><span style='color: #555555;'>6</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>488 retail gas station </span></CODE></PRE> --- # Data Carpentry -- ### `nyc_vehicle_thefts` ```r nyc_vehicle_thefts ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 9,532 x 6</span><span> uid date_time longitude latitude location_catego… </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'> 1</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>08:00:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.9 street </span><span style='color: #555555;'> 2</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>08:15:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.8 street </span><span style='color: #555555;'> 3</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>09:00:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.9 street </span><span style='color: #555555;'> 4</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>09:00:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 street </span><span style='color: #555555;'> 5</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>10:00:00</span><span> -</span><span style='color: #BB0000;'>74.1</span><span> 40.6 street </span><span style='color: #555555;'> 6</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>11:03:00</span><span> -</span><span style='color: #BB0000;'>74.0</span><span> 40.6 street </span><span style='color: #555555;'> 7</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>11:31:00</span><span> -</span><span style='color: #BB0000;'>74.0</span><span> 40.8 leisure </span><span style='color: #555555;'> 8</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>12:28:00</span><span> -</span><span style='color: #BB0000;'>74.2</span><span> 40.6 hotel </span><span style='color: #555555;'> 9</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>14:30:00</span><span> -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 street </span><span style='color: #555555;'>10</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-01 </span><span style='color: #555555;'>14:45:00</span><span> -</span><span style='color: #BB0000;'>73.9</span><span> 40.7 street </span><span style='color: #555555;'># … with 9,522 more rows, and 1 more variable: location_type </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span></CODE></PRE> --- # Data Carpentry ```r library(tidyverse) ``` ### `select()` Columns ```r nyc_vehicle_thefts %>% * select(uid, date_time, location_category, longitude, latitude) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 9,532 x 5</span><span> uid date_time location_category longitude latitude </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;'> 1</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>401 2015-01-01 </span><span style='color: #555555;'>08:00:00</span><span> street -</span><span style='color: #BB0000;'>73.9</span><span> 40.9 </span><span style='color: #555555;'> 2</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>462 2015-01-01 </span><span style='color: #555555;'>08:15:00</span><span> street -</span><span style='color: #BB0000;'>73.9</span><span> 40.8 </span><span style='color: #555555;'> 3</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>518 2015-01-01 </span><span style='color: #555555;'>09:00:00</span><span> street -</span><span style='color: #BB0000;'>73.9</span><span> 40.9 </span><span style='color: #555555;'> 4</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>522 2015-01-01 </span><span style='color: #555555;'>09:00:00</span><span> street -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 </span><span style='color: #555555;'> 5</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>605 2015-01-01 </span><span style='color: #555555;'>10:00:00</span><span> street -</span><span style='color: #BB0000;'>74.1</span><span> 40.6 </span><span style='color: #555555;'> 6</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>710 2015-01-01 </span><span style='color: #555555;'>11:03:00</span><span> street -</span><span style='color: #BB0000;'>74.0</span><span> 40.6 </span><span style='color: #555555;'> 7</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>764 2015-01-01 </span><span style='color: #555555;'>11:31:00</span><span> leisure -</span><span style='color: #BB0000;'>74.0</span><span> 40.8 </span><span style='color: #555555;'> 8</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>844 2015-01-01 </span><span style='color: #555555;'>12:28:00</span><span> hotel -</span><span style='color: #BB0000;'>74.2</span><span> 40.6 </span><span style='color: #555555;'> 9</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>980 2015-01-01 </span><span style='color: #555555;'>14:30:00</span><span> street -</span><span style='color: #BB0000;'>73.8</span><span> 40.7 </span><span style='color: #555555;'>10</span><span> 12</span><span style='text-decoration: underline;'>853</span><span>991 2015-01-01 </span><span style='color: #555555;'>14:45:00</span><span> street -</span><span style='color: #BB0000;'>73.9</span><span> 40.7 </span><span style='color: #555555;'># … with 9,522 more rows</span><span> </span></CODE></PRE> --- # Data Carpentry ### `filter()` Rows ```r nyc_vehicle_thefts %>% select(uid, date_time, location_category) %>% * filter(location_category == "retail") ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 171 x 3</span><span> uid date_time location_category </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'> 1</span><span> 12</span><span style='text-decoration: underline;'>856</span><span>310 2015-01-04 </span><span style='color: #555555;'>02:37:00</span><span> retail </span><span style='color: #555555;'> 2</span><span> 12</span><span style='text-decoration: underline;'>860</span><span>245 2015-01-08 </span><span style='color: #555555;'>06:26:00</span><span> retail </span><span style='color: #555555;'> 3</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>488 2015-01-09 </span><span style='color: #555555;'>17:17:00</span><span> retail </span><span style='color: #555555;'> 4</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>638 2015-01-09 </span><span style='color: #555555;'>21:00:00</span><span> retail </span><span style='color: #555555;'> 5</span><span> 12</span><span style='text-decoration: underline;'>862</span><span>700 2015-01-10 </span><span style='color: #555555;'>19:30:00</span><span> retail </span><span style='color: #555555;'> 6</span><span> 12</span><span style='text-decoration: underline;'>864</span><span>685 2015-01-12 </span><span style='color: #555555;'>19:10:00</span><span> retail </span><span style='color: #555555;'> 7</span><span> 12</span><span style='text-decoration: underline;'>868</span><span>839 2015-01-16 </span><span style='color: #555555;'>06:37:00</span><span> retail </span><span style='color: #555555;'> 8</span><span> 12</span><span style='text-decoration: underline;'>872</span><span>210 2015-01-19 </span><span style='color: #555555;'>04:00:00</span><span> retail </span><span style='color: #555555;'> 9</span><span> 12</span><span style='text-decoration: underline;'>885</span><span>291 2015-01-31 </span><span style='color: #555555;'>09:07:00</span><span> retail </span><span style='color: #555555;'>10</span><span> 12</span><span style='text-decoration: underline;'>886</span><span>009 2015-02-01 </span><span style='color: #555555;'>03:18:00</span><span> retail </span><span style='color: #555555;'># … with 161 more rows</span><span> </span></CODE></PRE> --- # Data Carpentry ### `arrange()` Rows ```r nyc_vehicle_thefts %>% select(uid, date_time, location_category) %>% filter(location_category == "retail") %>% * arrange(date_time) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 171 x 3</span><span> uid date_time location_category </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'> 1</span><span> 12</span><span style='text-decoration: underline;'>856</span><span>310 2015-01-04 </span><span style='color: #555555;'>02:37:00</span><span> retail </span><span style='color: #555555;'> 2</span><span> 12</span><span style='text-decoration: underline;'>860</span><span>245 2015-01-08 </span><span style='color: #555555;'>06:26:00</span><span> retail </span><span style='color: #555555;'> 3</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>488 2015-01-09 </span><span style='color: #555555;'>17:17:00</span><span> retail </span><span style='color: #555555;'> 4</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>638 2015-01-09 </span><span style='color: #555555;'>21:00:00</span><span> retail </span><span style='color: #555555;'> 5</span><span> 12</span><span style='text-decoration: underline;'>862</span><span>700 2015-01-10 </span><span style='color: #555555;'>19:30:00</span><span> retail </span><span style='color: #555555;'> 6</span><span> 12</span><span style='text-decoration: underline;'>864</span><span>685 2015-01-12 </span><span style='color: #555555;'>19:10:00</span><span> retail </span><span style='color: #555555;'> 7</span><span> 12</span><span style='text-decoration: underline;'>868</span><span>839 2015-01-16 </span><span style='color: #555555;'>06:37:00</span><span> retail </span><span style='color: #555555;'> 8</span><span> 12</span><span style='text-decoration: underline;'>872</span><span>210 2015-01-19 </span><span style='color: #555555;'>04:00:00</span><span> retail </span><span style='color: #555555;'> 9</span><span> 12</span><span style='text-decoration: underline;'>885</span><span>291 2015-01-31 </span><span style='color: #555555;'>09:07:00</span><span> retail </span><span style='color: #555555;'>10</span><span> 12</span><span style='text-decoration: underline;'>886</span><span>009 2015-02-01 </span><span style='color: #555555;'>03:18:00</span><span> retail </span><span style='color: #555555;'># … with 161 more rows</span><span> </span></CODE></PRE> --- # Data Carpentry ### `mutate()` Columns ```r nyc_vehicle_thefts %>% select(uid, date_time, location_category) %>% filter(location_category == "retail") %>% arrange(date_time) %>% * mutate(in_january_before_6am = month(date_time) == 1 & hour(date_time) < 6) ``` <PRE class="fansi fansi-output"><CODE><span style='color: #555555;'># A tibble: 171 x 4</span><span> uid date_time location_category in_january_before_6am </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><lgl></span><span> </span><span style='color: #555555;'> 1</span><span> 12</span><span style='text-decoration: underline;'>856</span><span>310 2015-01-04 </span><span style='color: #555555;'>02:37:00</span><span> retail TRUE </span><span style='color: #555555;'> 2</span><span> 12</span><span style='text-decoration: underline;'>860</span><span>245 2015-01-08 </span><span style='color: #555555;'>06:26:00</span><span> retail FALSE </span><span style='color: #555555;'> 3</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>488 2015-01-09 </span><span style='color: #555555;'>17:17:00</span><span> retail FALSE </span><span style='color: #555555;'> 4</span><span> 12</span><span style='text-decoration: underline;'>861</span><span>638 2015-01-09 </span><span style='color: #555555;'>21:00:00</span><span> retail FALSE </span><span style='color: #555555;'> 5</span><span> 12</span><span style='text-decoration: underline;'>862</span><span>700 2015-01-10 </span><span style='color: #555555;'>19:30:00</span><span> retail FALSE </span><span style='color: #555555;'> 6</span><span> 12</span><span style='text-decoration: underline;'>864</span><span>685 2015-01-12 </span><span style='color: #555555;'>19:10:00</span><span> retail FALSE </span><span style='color: #555555;'> 7</span><span> 12</span><span style='text-decoration: underline;'>868</span><span>839 2015-01-16 </span><span style='color: #555555;'>06:37:00</span><span> retail FALSE </span><span style='color: #555555;'> 8</span><span> 12</span><span style='text-decoration: underline;'>872</span><span>210 2015-01-19 </span><span style='color: #555555;'>04:00:00</span><span> retail TRUE </span><span style='color: #555555;'> 9</span><span> 12</span><span style='text-decoration: underline;'>885</span><span>291 2015-01-31 </span><span style='color: #555555;'>09:07:00</span><span> retail FALSE </span><span style='color: #555555;'>10</span><span> 12</span><span style='text-decoration: underline;'>886</span><span>009 2015-02-01 </span><span style='color: #555555;'>03:18:00</span><span> retail FALSE </span><span style='color: #555555;'># … with 161 more rows</span><span> </span></CODE></PRE> --- # Data Carpentry <img src="https://github.com/gadenbuie/tidyexplain/raw/master/images/static/png/original-dfs.png" height="350px" style="display: block; margin: auto;" /> .footnote[ [1] Source: [tidyexplain](https://www.github.com/gadenbuie/tidyexplain) ] --- # Data Carpentry ### `*_join()` Separate Data Sets <img src="https://raw.githubusercontent.com/gadenbuie/tidyexplain/master/images/left-join.gif" style="display: block; margin: auto;" /> --- # Data Carpentry ### Reshaping Data <img src="https://github.com/gadenbuie/tidyexplain/blob/master/images/static/png/original-dfs-tidy.png?raw=true" height="450px" style="display: block; margin: auto;" /> --- # Data Carpentry ### Reshaping Data <img src="https://github.com/gadenbuie/tidyexplain/blob/master/images/tidyr-spread-gather.gif?raw=true" height="450px" style="display: block; margin: auto;" /> --- # Data Visualization -- ```r nyc_vehicle_thefts %>% filter(location_category %in% c("retail", "commercial", "residence")) %>% mutate(week = floor_date(date_time, unit = "week")) %>% group_by(location_category, week) %>% summarise(total = n()) %>% ungroup() %>% ggplot(aes(x = week, y = total)) + geom_line(aes(color = location_category)) + labs(x = NULL, y = NULL, title = "# of NYC Vehicle Thefts", caption = "binned by week") + theme(legend.title = element_blank()) ``` <img src="Why_R_files/figure-html/unnamed-chunk-31-1.png" width="504" style="display: block; margin: auto;" /> --- <br> <img src="gg_ts.png" width="768" style="display: block; margin: auto;" /> --- <br> <img src="gg_ts2.png" width="768" style="display: block; margin: auto;" /> --- # Spatial Data -- ```r library(tidycensus) manhattan <- get_acs(state = "NY", county = "New York", geography = "tract", variables = "B19013_001", geometry = TRUE) manhattan %>% select(geometry) ``` ``` Simple feature collection with 288 features and 0 fields geometry type: MULTIPOLYGON dimension: XY bbox: xmin: -74.04731 ymin: 40.68419 xmax: -73.907 ymax: 40.88207 epsg (SRID): 4269 proj4string: +proj=longlat +datum=NAD83 +no_defs First 10 features: geometry 1 MULTIPOLYGON (((-74.04692 4... 2 MULTIPOLYGON (((-73.98716 4... 3 MULTIPOLYGON (((-73.98534 4... 4 MULTIPOLYGON (((-74.00651 4... 5 MULTIPOLYGON (((-73.99256 4... 6 MULTIPOLYGON (((-73.9995 40... 7 MULTIPOLYGON (((-73.9975 40... 8 MULTIPOLYGON (((-74.00049 4... 9 MULTIPOLYGON (((-73.97996 4... 10 MULTIPOLYGON (((-73.97999 4... ``` --- # Spatial Data ```r library(sf) nyc_vehicle_thefts_sf <- nyc_vehicle_thefts %>% filter(between(hour(date_time), 2, 5)) %>% mutate(monthly = month(date_time, label = TRUE, abbr = FALSE)) %>% st_as_sf(coords = c("longitude", "latitude"), crs = st_crs(manhattan)) %>% st_join(manhattan, left = FALSE) nyc_vehicle_thefts_sf ``` <PRE class="fansi fansi-output"><CODE>Simple feature collection with 162 features and 10 fields geometry type: POINT dimension: XY bbox: xmin: -74.01441 ymin: 40.70338 xmax: -73.90801 ymax: 40.87723 epsg (SRID): 4269 proj4string: +proj=longlat +datum=NAD83 +no_defs <span style='color: #555555;'># A tibble: 162 x 11</span><span> uid date_time location_catego… location_type monthly GEOID </span><span style='color: #555555;font-style: italic;'><dbl></span><span> </span><span style='color: #555555;font-style: italic;'><dttm></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;font-style: italic;'><ord></span><span> </span><span style='color: #555555;font-style: italic;'><chr></span><span> </span><span style='color: #555555;'> 1</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-02 </span><span style='color: #555555;'>05:40:00</span><span> street street January 3606… </span><span style='color: #555555;'> 2</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-07 </span><span style='color: #555555;'>04:30:00</span><span> government government January 3606… </span><span style='color: #555555;'> 3</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-14 </span><span style='color: #555555;'>05:30:00</span><span> street street January 3606… </span><span style='color: #555555;'> 4</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-17 </span><span style='color: #555555;'>04:24:00</span><span> commercial office January 3606… </span><span style='color: #555555;'> 5</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-18 </span><span style='color: #555555;'>02:32:00</span><span> street street January 3606… </span><span style='color: #555555;'> 6</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-18 </span><span style='color: #555555;'>05:50:00</span><span> street street January 3606… </span><span style='color: #555555;'> 7</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-20 </span><span style='color: #555555;'>04:45:00</span><span> open space vehicle park… January 3606… </span><span style='color: #555555;'> 8</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-01-28 </span><span style='color: #555555;'>03:19:00</span><span> street street January 3606… </span><span style='color: #555555;'> 9</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-02-01 </span><span style='color: #555555;'>02:36:00</span><span> street street Februa… 3606… </span><span style='color: #555555;'>10</span><span> 1.29</span><span style='color: #555555;'>e</span><span>7 2015-02-07 </span><span style='color: #555555;'>05:35:00</span><span> street street Februa… 3606… </span><span style='color: #555555;'># … with 152 more rows, and 5 more variables: NAME </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, variable </span><span style='color: #555555;font-style: italic;'><chr></span><span style='color: #555555;'>, # estimate </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, moe </span><span style='color: #555555;font-style: italic;'><dbl></span><span style='color: #555555;'>, geometry </span><span style='color: #555555;font-style: italic;'><POINT [°]></span><span> </span></CODE></PRE> --- # Spatial Data ```r nyc_vehicle_thefts_sf %>% ggplot() + geom_sf(data = manhattan) + geom_sf(aes(color = location_category), show.legend = "point") + facet_wrap(~ monthly, nrow = 2) + theme(axis.text = element_blank()) + labs(title = "Manhattan Vehicle Thefts, 2015", subtitle = "2am - 5am", color = "Location\nCategory") ``` <img src="Why_R_files/figure-html/geom_sf-1.png" width="504" style="display: block; margin: auto;" /> --- <br> <img src="gg_sf.png" width="853" style="display: block; margin: auto;" /> --- # Get R * [Get R at _cran.r-project.org_](cran.r-project.org) <br><br><br> * [Get RStudio at _www.rstudio.com_](www.rstudio.com) --- # Learning Resources * R for Data Science + Free Online <img src="Why_R_files/figure-html/r4ds-site-1.png" width="504" style="display: block; margin: auto;" /> --- # Learning Resources .pull-left[ <center> <a href="https://www.datacamp.com/">www.datacamp.com/</a> </center> <img src="Why_R_files/figure-html/datacamp-site-1.png" width="504" style="display: block; margin: auto;" /> ] .pull-right[ <center> <a href="https://www.dataquest.io/">www.dataquest.io/</a> </center> <img src="Why_R_files/figure-html/dataquest-site-1.png" width="504" style="display: block; margin: auto;" /> ] # Learning Resources * [Geocomputation with R](https://geocompr.robinlovelace.net/) + Free Online <img src="Why_R_files/figure-html/geocompr-site-1.png" width="504" style="display: block; margin: auto;" /> --- # Questions? * Brendan Knapp - brendan.knapp@nps.edu