Due to its open-source nature, R can handle virtually any type of data that exists.
vroom
Since everyone knows about readr
, readxl
, haven
and data.table
, I’d like to suggest reading up on the impressive vroom
package.
library(magrittr) # Give me %>% or give me death.
library(vroom)
vroom_example("mtcars.csv") %>% vroom()
## Rows: 32
## Columns: 12
## Delimiter: ","
## chr [ 1]: model
## dbl [11]: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
##
## Use `spec()` to retrieve the guessed column specification
## Pass a specification to the `col_types` argument to quiet this message
## # A tibble: 32 x 12
## model mpg cyl disp hp drat wt qsec vs am gear carb
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 Mazda RX4 ~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 Hornet 4 D~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 Hornet Spo~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 Merc 240D 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 Merc 230 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # ... with 22 more rows
etl
and dbplyr
The etl
package allows R users to populate a local SQL database and analyze data using the familiar dplyr
verbs.
I am vastly more skilled in wrangling data within the tidyverse
framework than in SQL. Luckily, R can connect to almost any type of SQL database via the amazing DBI
package, and the dbplyr
package can convert dplyr
code to SQL queries.
sparklyr
and h2o
As one might imagine, social science academics rarely, if ever, encounter truly big data (I know I never did when completing my MA in political science or PhD in public administration). Cluster computing is simply unnecessary in that world. Consequently, I am only familiar with packages that are well-known amongst data scientists, such as sparklyr
(R’s interface for Apache Spark) and h2o
(R’s interface for H2O.ai’s platform).
tidymodels
Coming soon!
ranger
Coming soon!
xgboost
Coming soon!