Untitled.knit

Juan Pablo Olazaba STA-6233

Exercise 1

Assign the following values to a vector named Student: “Jack”, “Diana”, “Steve”, “Morgan”.

Student <- c("Jack", "Diana", "Steve", "Morgan")
Student

## [1] "Jack"   "Diana"  "Steve"  "Morgan"

Exercise 2

Add one more student named “Vivian” to the vector Student

Student <- c(Student, "Vivian")
Student

## [1] "Jack"   "Diana"  "Steve"  "Morgan" "Vivian"

Exercise 3

Print the 2nd and 5th names in the vector Student.

Student[c(2, 5)]

## [1] "Diana"  "Vivian"

Exercise 4

In R, if V1 <- c(2, 4, 5), V2 <- seq(1, 10,2), what is the result of V1+V2?

V1 <- c(2, 4, 5)
V2 <- seq(1, 10,2)
V1+V2

## Warning in V1 + V2: longer object length is not a multiple of shorter object
## length

## [1]  3  7 10  9 13

Exercise 5

If V3 <- c(3, 5, “Apple”, NA), what is the class of V3? What is the result of is.na(V3)?

V3 <- c(3, 5, "Apple", NA)
class(V3)

## [1] "character"

is.na(V3)

## [1] FALSE FALSE FALSE  TRUE

Exercise 6

For the vector size <- c(1,2,2,1,3,4,4,3), complete the following task: a. Convert it to a factor named size_factor. b. Print the levels of the data c. Label the levels as 1 for small, 2 for medium, 3 for large and 4 for xlarge d. Add a level xsmall as the first level to the current factor

size <- c(1,2,2,1,3,4,4,3)
#a 
size_factor <- factor(size)
size_factor

## [1] 1 2 2 1 3 4 4 3
## Levels: 1 2 3 4

#b
levels(size_factor)

## [1] "1" "2" "3" "4"

#c
levels(size_factor) <- c("small", "meidum", "large", "xlarge")
#d
levels(size_factor) <- c("xsmall", levels(size_factor))
levels(size_factor)

## [1] "xsmall" "small"  "meidum" "large"  "xlarge"

Exercise 7

For the mtcars data frame, complete the following task: a. Create a new data frame named high_mpg which includes the rows where mpg is greater than 20 for all columns

Create a new data frame named high_mpg_4cyl which include which includes the rows where mpg > 20 and the cyl = 4 for all columns
Retrieve all elements of mpg column that the cyl = 4 or 6

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

data(mtcars)
#a
high_mpg <- mtcars[mtcars$mpg > 20,]
#b
high_mpg_4cyl <- mtcars[mtcars$mpg > 20 & mtcars$cyl == 4,]
high_mpg_4cyl

##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

#c
mtcars[mtcars$cyl == 4 | mtcars$cyl == 6, "mpg"]

##  [1] 21.0 21.0 22.8 21.4 18.1 24.4 22.8 19.2 17.8 32.4 30.4 33.9 21.5 27.3 26.0
## [16] 30.4 19.7 21.4

Exercise 8

Working with the following vectors.

v1 <- c(1,3,4,6)
v2 <- c("A", "B", "C","D")
v3 <- factor(c(1, 2, 3, 4, 5))

#a Generate a list that contains the above vectors named list_data.
list_data <- list("v1", "v2", "v3")
list_data

## [[1]]
## [1] "v1"
## 
## [[2]]
## [1] "v2"
## 
## [[3]]
## [1] "v3"

#b Name each list as vec1, vec2, fac3.
list_data <- list(vec1 = v1, vec2 = v2, vec3 = v3)
  names(list_data) <- c("vec1", "vec2", "vec3")
#c Retrieve v1 as a list.
list_data[1]

## $vec1
## [1] 1 3 4 6

#d Retrieve v2 as its original structure.
list_data[[2]]

## [1] "A" "B" "C" "D"

#e Retrieve the 2nd element in v1 from List.
list_data$vec1[2]

## [1] 3

Exercise 9

Use the data diamonds.csv to complete the following tasks.

#a
Data <- read.csv('C:\\Users\\Owner\\Documents\\diamonds.csv', header=TRUE, stringsAsFactors=FALSE)

#b
str(Data)

## 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : chr  "Ideal" "Premium" "Good" "Premium" ...
##  $ color  : chr  "E" "E" "E" "I" ...
##  $ clarity: chr  "SI2" "SI1" "VS1" "VS2" ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

# There are 6 numerical variables & 3 categorical variables.

#c
names(Data)

##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"

#d
Data$volume <- Data$x * Data$y * Data$z

#e
Data$clarity <- factor(Data$clarity, ordered = TRUE, levels = c("I1", "SI2", "SI1", "VS2", "VS1", "VVS2", "VVS1", "IF"))

#f
FourC <- Data[, c(1, 2, 3, 4)]

#g
Premium <- Data[Data$color == "D" & Data$cut == "Ideal",]
write.csv(Premium, "Premium.csv")

#h returns 7 rows
Premium[Premium$carat > 2,]

##       carat   cut color clarity depth table price    x    y    z   volume
## 24448  2.12 Ideal     D     SI2  62.9    55 12707 8.17 8.14 5.13 341.1645
## 24785  2.75 Ideal     D      I1  60.9    57 13156 9.04 8.98 5.49 445.6738
## 26634  2.11 Ideal     D     SI2  62.7    56 16404 8.25 8.18 5.14 346.8729
## 26753  2.21 Ideal     D     SI2  62.0    57 16558 8.36 8.31 5.18 359.8629
## 27563  2.06 Ideal     D     SI2  60.3    56 18371 8.29 8.25 4.99 341.2786
## 27668  2.01 Ideal     D     SI2  62.1    56 18674 8.02 8.11 5.01 325.8614
## 27677  2.19 Ideal     D     SI2  61.8    57 18693 8.23 8.49 5.17 361.2419

#i
Data[is.na(Data$carat),]

##  [1] carat   cut     color   clarity depth   table   price   x       y      
## [10] z       volume 
## <0 rows> (or 0-length row.names)

#j
library(ggplot2)
ggplot(Data, aes(price)) +
    geom_density(fill = "pink")

#k
ggplot(Data, aes(cut, fill = clarity)) + 
  geom_bar(position = "fill", alpha = 0.6)

#l
ggplot(Premium, aes(carat, price, color = clarity)) +
  geom_point(size = 4) + 
  geom_smooth() + 
  labs(title = "Price of diamonds with difference carats", subtitle = "and different clairty", y = "Price in dollars") + 
  theme(legend.position = c(0.8, 0.2))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#m
ggplot(Premium, aes(carat, price)) + geom_point() + facet_wrap(~clarity)

Exercise 10

The working directory is the location where R will look to load files and it also uses it to write and export files.