Update R by installing new versions using the process described above or by using the handy installr
package.
library(tidyverse)
since doing loads ggplot2
, tibble
, tidyr
, readr
, purrr
, dplyr
, stringr
, and forcats
.If you only need a single command from a given package, use a double colon to call it; in some cases, this can prevent conflicts in the command namespace.
R can read virtually any tabular data file (and is rapidly improving its database capabilities).
library(readr)
df_csv = read_csv("Comma Separated Values.csv")
library(haven)
df_stata = read_dta("Stata File.dta")
df_sas = read_sas("SAS File.sas7bdat")
df_spss = read_spss("SPSS File.sav")
library(readxl)
df_excel = read_excel("Excel Spreadsheet.xlsx")
The package “dplyr,” included in the tidyverse, includes several verbs that ease data munging:
library(dplyr)
df_example %>% select() # Subset columns.
df_example %>% filter() # Subset rows.
df_example %>% arrange() # Sort/order by columns.
df_example %>% mutate() # Create new variables/columns.
df_example %>% group_by() # Groups data by variable.
df_example %>% summarize() # Reduce multiple values to a single value.
The arduous process of data tidying falls outside the scope of a cheat sheet, but learning how to combine the above verbs with the powerful pipe operator will make your life significantly easier.
## Rows: 30
## Columns: 7
## $ rating <dbl> 43, 63, 71, 61, 81, 43, 58, 71, 72, 67, 64, 67, 69, 68, ...
## $ complaints <dbl> 51, 64, 70, 63, 78, 55, 67, 75, 82, 61, 53, 60, 62, 83, ...
## $ privileges <dbl> 30, 51, 68, 45, 56, 49, 42, 50, 72, 45, 53, 47, 57, 83, ...
## $ learning <dbl> 39, 54, 69, 47, 66, 44, 56, 55, 67, 47, 58, 39, 42, 45, ...
## $ raises <dbl> 61, 63, 76, 54, 71, 54, 66, 70, 71, 62, 58, 59, 55, 59, ...
## $ critical <dbl> 92, 73, 86, 84, 83, 49, 68, 66, 83, 80, 67, 74, 63, 77, ...
## $ advance <dbl> 45, 47, 48, 35, 47, 34, 35, 41, 31, 41, 34, 41, 25, 35, ...
##
## Pearson's product-moment correlation
##
## data: rating and complaints
## t = 7.737, df = 28, p-value = 1.988e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6620128 0.9139139
## sample estimates:
## cor
## 0.8254176
The easystats project’s correlation
package, while not on CRAN, is the best at examining multiple correlations at once.
## # A tibble: 21 x 10
## Parameter1 Parameter2 r CI_low CI_high t df p Method n_Obs
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <chr> <int>
## 1 rating complaints 0.825 0.662 0.914 7.74 28 4.17e-7 Pears~ 30
## 2 rating privileges 0.426 0.0778 0.682 2.49 28 1.89e-1 Pears~ 30
## 3 rating learning 0.624 0.340 0.803 4.22 28 4.16e-3 Pears~ 30
## 4 rating raises 0.590 0.292 0.784 3.87 28 9.57e-3 Pears~ 30
## 5 rating critical 0.156 -0.216 0.489 0.838 28 1.00e+0 Pears~ 30
## 6 rating advance 0.155 -0.217 0.488 0.831 28 1.00e+0 Pears~ 30
## 7 complaints privileges 0.558 0.248 0.765 3.56 28 1.88e-2 Pears~ 30
## 8 complaints learning 0.597 0.301 0.788 3.94 28 8.50e-3 Pears~ 30
## 9 complaints raises 0.669 0.407 0.829 4.77 28 1.05e-3 Pears~ 30
## 10 complaints critical 0.188 -0.185 0.513 1.01 28 1.00e+0 Pears~ 30
## # ... with 11 more rows