2 Introduction to RStudio

  • Tools -> Global Options -> Appearance

  • Sessions -> Set Working Directory -> Choose Directory

3 R Studio Cheat sheets and Primer

  • RStudio IDE

  • RStudio Markdown

  • Data Import

  • ggplot2 (not required)

4 Swirl Package - Learn R in R

  1. We install our first package, then load it to use it.
getOption("defaultPackages")
## [1] "datasets"  "utils"     "grDevices" "graphics"  "stats"     "methods"
# install.packages("swirl")

library("swirl")
## 
## | Hi! Type swirl() when you are ready to begin.
swirl()
info()

5 Getting Help

## GETTING HELP when you know the function name
help("mean")
help('mean')
?mean


## GETTING HELP when you DO NOT know the function name (or function not loaded yet)
help.search("mean")  
??mean
  • GOOGLE is your best friend… often, it will redirect you to public forums like StackOverflow ( platform collecting of coding questions & answers ) that collect data on R…

  • POSTING ON STACKOVERFLOW or other online forums requires you to follow some best practices - tell us your OS, R version, give us your data, code, error, and things you have tried that did not work…

  • Other websites like InterviewBit, HackerRank, LeetCode usually contain data science questions too, but not usual for R…

  • There are forums that train students SQL from Udemy for data science jobs…

6 Part 1: Basics of R

# Using R as calculator
1+1
## [1] 2
(2+3)*2   # parenthesis 
## [1] 10
exp(1)    # exponent to the power 1
## [1] 2.718282
2/3       # division
## [1] 0.6666667
options(digits = 3)
2/3       # division rounded off but the entire number calculated - 7 is default
## [1] 0.667
2*3       # multiplication
## [1] 6
2+3       # addition
## [1] 5
2-3       # substraction
## [1] -1
# PEDMAS 
48 / (2 * 12)
## [1] 2
48 / 2 * 12
## [1] 288
2*2*2
## [1] 8
2**3      # power
## [1] 8
sqrt(4)   # square root 
## [1] 2
pi
## [1] 3.14
# CODE SO THAT IT IS EASY TO FOLLOW
(3 + (5 * (2 ^ 2))) # hard to read
## [1] 23
3 + 5 * 2 ^ 2       # clear, if you remember the BODMAS rules
## [1] 23
3 + 5 * (2 ^ 2)     # if you forget some rules, this might help
## [1] 23
# Assignment Operator
x <- 1.25
y = x              # Best practice is not to use "=" in assignment
y <- x
# x <- z + 1         # Can't assign values of non-existing objects 
# Attempting to assign values of non-existing objects
tryCatch({
  x <- z + 1
}, error = function(e) {
  message("Error: ", e$message)
})
## Error: object 'z' not found

7 Importing Data

7.1 Internal Data

data()
mtcars
##                      mpg cyl  disp  hp drat   wt qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.62 16.5  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.88 17.0  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.32 18.6  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.21 19.4  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.44 17.0  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.46 20.2  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.57 15.8  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.25 18.0  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.42 17.8  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.34 17.4  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.20 19.5  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.61 18.5  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.83 19.9  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.46 20.0  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.52 16.9  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.44 17.3  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.84 15.4  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.85 17.1  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.94 18.9  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.14 16.7  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.51 16.9  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.17 14.5  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.77 15.5  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.57 14.6  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.78 18.6  1  1    4    2
df <- as.data.frame(mtcars)

7.2 External Data

setwd("/Users/arvindsharma/Dropbox/WCAS/Data Analysis/Data Analysis - Spring II 2024/Data Analysis - Spring II 2024 (shared files)/W1/Week_1-2/titanic/")

train <- read.csv("train.csv")

8 Exploring Object Properties

  • An object’s class defines how the object is implemented. The class defines object’s internal state and the implementation of its operations.

  • In contrast, an object’s type only refers to its interface - a set of requests to which it can respond.

  • An object can have many types, and objects of different classes can have the same type.

class(df)  # An object's class defines how the object is implemented. The class defines object's internal state and the implementation of its operations.
## [1] "data.frame"
typeof(df) # In contrast, an object's type only refers to its interface - a set of requests to which it can respond.
## [1] "list"

9 EDA

head(df)
##                    mpg cyl disp  hp drat   wt qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.62 16.5  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.88 17.0  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.32 18.6  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.21 19.4  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.0  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.46 20.2  1  0    3    1
tail(df)
##                 mpg cyl  disp  hp drat   wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.14 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.51 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.17 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.77 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.57 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.78 18.6  1  1    4    2
# install.packages("visdat")
??visdat

library("visdat")
vis_dat(df)