Due to its open-source nature, R can handle virtually any type of data that exists.
vroomSince everyone knows about readr, readxl, haven and data.table, I’d like to suggest reading up on the impressive vroom package.
library(magrittr) # Give me %>% or give me death.
library(vroom)
vroom_example("mtcars.csv") %>% vroom()## Rows: 32
## Columns: 12
## Delimiter: ","
## chr [ 1]: model
## dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
##
## Use `spec()` to retrieve the guessed column specification
## Pass a specification to the `col_types` argument to quiet this message
## # A tibble: 32 x 12
## model mpg cyl disp hp drat wt qsec vs am gear carb
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 Mazda RX4 ~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 Hornet 4 D~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 Hornet Spo~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 Merc 240D 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 Merc 230 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # ... with 22 more rows
etl and dbplyrThe etl package allows R users to populate a local SQL database and analyze data using the familiar dplyr verbs.
I am vastly more skilled in wrangling data within the tidyverse framework than in SQL. Luckily, R can connect to almost any type of SQL database via the amazing DBI package, and the dbplyr package can convert dplyr code to SQL queries.
sparklyr and h2oAs one might imagine, social science academics rarely, if ever, encounter truly big data (I know I never did when completing my MA in political science or PhD in public administration). Cluster computing is simply unnecessary in that world. Consequently, I am only familiar with packages that are well-known amongst data scientists, such as sparklyr (R’s interface for Apache Spark) and h2o (R’s interface for H2O.ai’s platform).
tidymodelsComing soon!
rangerComing soon!
xgboostComing soon!