Week 1 Objectives

  • Create a new R file
  • Install a library
  • Load a library
  • Import a dataset
  • See the dataset’s structure
  • Summarize a dataset
  • Create a new variable
  • find the minimum
  • find the maximum
  • find the mean
  • Create simple graph

Create a new R file

First go to “File” then “New File”.

Then select “R Script”.

Save the new R file

Save the new R script”.

Install a library

Libraries allow you to use functions created by other developers around the world. This is the essence that makes R so powerful.

Installing your first library:

install.packages("tidyverse")

Once installed, you do not need to install again. However…

Load a library

When you want to use the functionality of an installed library, you will need to load it.

library(tidyverse )
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Import a dataset

Click on “Import” then select what type of file you will import”.

Ask yourself: what file extension does my data have: .csv, .xlsx, other?

Importing a .csv dataset

select “readr” for .csv files”.

Importing a .csv dataset

Use the ***Browse” button to select the file to import.

Importing a .csv dataset

Copy the code to import the data”.

Importing a .csv dataset

Paste the code in your R script and run it.

affairs <- read_csv("~/Library/Mobile Documents/com~apple~CloudDocs/Fairfield University/Spring 2024/Data/affairs.csv")
## Rows: 601 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): gender, children
## dbl (7): affairs, age, yearsmarried, religiousness, education, occupation, r...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

See the dataset’s structure

str(affairs)
## spec_tbl_df [601 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ affairs      : num [1:601] 0 0 0 0 0 0 0 0 0 0 ...
##  $ gender       : chr [1:601] "male" "female" "female" "male" ...
##  $ age          : num [1:601] 37 27 32 57 22 32 22 57 32 22 ...
##  $ yearsmarried : num [1:601] 10 4 15 15 0.75 1.5 0.75 15 15 1.5 ...
##  $ children     : chr [1:601] "no" "no" "yes" "yes" ...
##  $ religiousness: num [1:601] 3 4 1 5 2 2 2 2 4 4 ...
##  $ education    : num [1:601] 18 14 12 18 17 17 12 14 16 14 ...
##  $ occupation   : num [1:601] 7 6 1 6 6 5 1 4 1 4 ...
##  $ rating       : num [1:601] 4 4 4 5 3 5 3 4 2 5 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   affairs = col_double(),
##   ..   gender = col_character(),
##   ..   age = col_double(),
##   ..   yearsmarried = col_double(),
##   ..   children = col_character(),
##   ..   religiousness = col_double(),
##   ..   education = col_double(),
##   ..   occupation = col_double(),
##   ..   rating = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Summarize a dataset

summary(affairs)
##     affairs          gender               age         yearsmarried   
##  Min.   : 0.000   Length:601         Min.   :17.50   Min.   : 0.125  
##  1st Qu.: 0.000   Class :character   1st Qu.:27.00   1st Qu.: 4.000  
##  Median : 0.000   Mode  :character   Median :32.00   Median : 7.000  
##  Mean   : 1.456                      Mean   :32.49   Mean   : 8.178  
##  3rd Qu.: 0.000                      3rd Qu.:37.00   3rd Qu.:15.000  
##  Max.   :12.000                      Max.   :57.00   Max.   :15.000  
##    children         religiousness     education       occupation   
##  Length:601         Min.   :1.000   Min.   : 9.00   Min.   :1.000  
##  Class :character   1st Qu.:2.000   1st Qu.:14.00   1st Qu.:3.000  
##  Mode  :character   Median :3.000   Median :16.00   Median :5.000  
##                     Mean   :3.116   Mean   :16.17   Mean   :4.195  
##                     3rd Qu.:4.000   3rd Qu.:18.00   3rd Qu.:6.000  
##                     Max.   :5.000   Max.   :20.00   Max.   :7.000  
##      rating     
##  Min.   :1.000  
##  1st Qu.:3.000  
##  Median :4.000  
##  Mean   :3.932  
##  3rd Qu.:5.000  
##  Max.   :5.000

Create a new variable and verify it exists

affairs$age_married <- affairs$age - affairs$yearsmarried

str(affairs$age_married)
##  num [1:601] 27 23 17 42 21.2 ...

find the minimum of age_married

min(affairs$age_married)
## [1] 7.5

find the maximum of age_married

max(affairs$age_married)
## [1] 45

find the mean of age_married

mean(affairs$age_married)
## [1] 24.30983

Create simple graph of age vs education

plot(affairs$age , affairs$education)