These can also be found on the right side menu under “Packages”. Packages are made by other R users and are groups of functions that make using R easier. There are certain packages we tend to use more often than not so it’s usually good to start with loading these packages. If you don’t have these in your library, you will have to click the “install” button in the packages menu.
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
[37m── [1mAttaching packages[22m ────────────────────────────────────────── tidyverse 1.3.0 ──[39m
[37m[32m✓[37m [34mggplot2[37m 3.3.2 [32m✓[37m [34mpurrr [37m 0.3.4
[32m✓[37m [34mtibble [37m 3.0.3 [32m✓[37m [34mdplyr [37m 1.0.0
[32m✓[37m [34mtidyr [37m 1.1.0 [32m✓[37m [34mstringr[37m 1.4.0
[32m✓[37m [34mreadr [37m 1.3.1 [32m✓[37m [34mforcats[37m 0.5.0[39m
package ‘ggplot2’ was built under R version 3.6.2package ‘tibble’ was built under R version 3.6.2package ‘tidyr’ was built under R version 3.6.2package ‘purrr’ was built under R version 3.6.2package ‘dplyr’ was built under R version 3.6.2[37m── [1mConflicts[22m ───────────────────────────────────────────── tidyverse_conflicts() ──
[31mx[37m [34mdplyr[37m::[32mfilter()[37m masks [34mstats[37m::filter()
[31mx[37m [34mdplyr[37m::[32mlag()[37m masks [34mstats[37m::lag()[39m
library(psych)
Attaching package: ‘psych’
The following objects are masked from ‘package:ggplot2’:
%+%, alpha
library(haven)
package ‘haven’ was built under R version 3.6.2
There are lots of ways to load data into R. But here is an easy one: go to “Files” on the right side, and select the file you would like to use, click the file and “import dataset”. Copy and paste the code preview into R notebook (or just click import dataset). View and glimpse are great ways to get an initial feel for the structure of your data.
glimpse(gss)
Rows: 2,765
Columns: 16
$ id [3m[38;5;246m<dbl>[39m[23m 2331, 2003, 1221, 2051, 2465, 546, 1291, 732, 303, 2700, 855, 6…
$ hrs1 [3m[38;5;246m<dbl+lbl>[39m[23m NA, NA, NA, NA, 50, 60, 40, 25, NA, 40, 64, 45, 60, 85, NA,…
$ marital [3m[38;5;246m<dbl+lbl>[39m[23m 1, 3, 1, 1, 1, 1, 5, 2, 1, 1, 1, 1, 1, 1, 2, 5, 1, 1, 5, 5,…
$ childs [3m[38;5;246m<dbl+lbl>[39m[23m 3, 8, 3, 2, 0, 0, 0, 3, 3, 2, 3, 3, 2, 3, 6, 0, 4, 2, 0, 0,…
$ age [3m[38;5;246m<dbl+lbl>[39m[23m 71, 69, 40, 60, 31, 37, 23, 86, 70, 42, 41, 30, 43, 48, 70,…
$ educ [3m[38;5;246m<dbl+lbl>[39m[23m 18, 11, 19, 13, 11, 19, 11, 11, 13, 12, 12, 12, 13, 20, 12,…
$ sex [3m[38;5;246m<dbl+lbl>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ polviews [3m[38;5;246m<dbl+lbl>[39m[23m 4, 4, NA, 5, 4, 2, 1, 4, 4, 3, 6, NA, 5, 5, NA,…
$ wwwhr [3m[38;5;246m<dbl+lbl>[39m[23m NA, NA, 7, 1, 0, 3, NA, NA, NA, 3, 5, 20, 1, 2, NA,…
$ trustpeo [3m[38;5;246m<dbl+lbl>[39m[23m 4, 1, NA, 1, 2, 2, NA, 1, 4, 1, 2, NA, 5, 2, NA,…
$ wantbest [3m[38;5;246m<dbl+lbl>[39m[23m 2, 4, NA, 2, 2, 4, NA, 4, 1, 2, 2, NA, 1, 3, NA,…
$ advantge [3m[38;5;246m<dbl+lbl>[39m[23m 3, 2, NA, 1, 2, 2, NA, 2, 2, 2, 1, NA, 4, 4, NA,…
$ goodlife [3m[38;5;246m<dbl+lbl>[39m[23m 4, NA, NA, NA, NA, 1, 2, 2, NA, NA, NA, NA, 3, NA, NA,…
$ deckids [3m[38;5;246m<dbl+lbl>[39m[23m NA, NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA,…
$ strsswrk [3m[38;5;246m<dbl+lbl>[39m[23m NA, NA, 5, NA, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, 3,…
$ satjob7 [3m[38;5;246m<dbl+lbl>[39m[23m NA, NA, 3, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA,…
str function can help…str(gss$marital)
dbl+lbl [1:2765] 1, 3, 1, 1, 1, 1, 5, 2, 1, 1, 1, 1, 1, 1, 2, 5, 1, 1, 5, 5, ...
@ label : chr "marital status"
@ format.stata: chr "%16.0g"
@ labels : Named num [1:6] 1 2 3 4 5 9
..- attr(*, "names")= chr [1:6] "married" "widowed" "divorced" "separated" ...
describe Function from the psych Packagetable commandFor categorical variables, the table command is a workhorse for data exploration. The table command allows you to examine frequencies for different levels of categorical variables.
table(gss$childs)
0 1 2 3 4 5 6 7 8
799 469 657 481 185 73 40 22 34
A frequency historgram is a nice “quick and dirty” way to explore a continuous variable. The histogram function comes from the ggplot2 package. There are different ways to create a histogram in Rstudio. Here is the easiest way, again using the childs variable:
ggplot(data = gss, mapping = aes(x = childs)) + geom_bar()
Here is a fancier version (say, for publication) with a title and labels added with the labs function:
ggplot(data = gss, mapping = aes(x = childs)) + geom_bar() +
labs(title = "Distribution of Number of Children per Family",
x = "Number of Children",
caption = "Data from the General Social Survey (2012). N = 2,765.")