This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(tidyverse)
data("diamonds")
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
Question 1
avgdiamonds <- diamonds %>%
select(depth, price) %>%
summarise(avgdepth = mean(depth), avgprice = mean(price))
avgdiamonds
## # A tibble: 1 x 2
## avgdepth avgprice
## <dbl> <dbl>
## 1 61.7 3933.
Question 2
diamonds %>%
mutate(ppc = price/carat)
## # A tibble: 53,940 x 11
## carat cut color clarity depth table price x y z ppc
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 1417.
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 1552.
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 1422.
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 1152.
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 1081.
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 1400
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 1400
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 1296.
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 1532.
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 1470.
## # … with 53,930 more rows
Question 3
avgpercut <- diamonds %>%
group_by(cut) %>%
summarise(avgprice = mean(price))
avgpercut$avgprice
## [1] 4358.758 3928.864 3981.760 4584.258 3457.542
Question 5
diamondbycolor <- diamonds %>%
group_by(color) %>%
summarise(avgdepth = mean(depth), avgtable = mean(table))
diamondbycolor$avgdepth
## [1] 61.69813 61.66209 61.69458 61.75711 61.83685 61.84639 61.88722
diamondbycolor$avgtable
## [1] 57.40459 57.49120 57.43354 57.28863 57.51781 57.57728 57.81239
Extra credit
newcolumns <- diamonds %>%
group_by(color) %>%
mutate(avgdepth = mean(depth), avgtable = mean(table))
# View(newcolumns)
totalcolumns <- left_join(diamonds, newcolumns)
# View(totalcolumns)
Question 6
largest <- diamonds %>%
group_by(color) %>%
summarise(avgprice = mean(price))
#Color J diamonds seem the be the biggest, with an average of 5324
largest
## # A tibble: 7 x 2
## color avgprice
## <ord> <dbl>
## 1 D 3170.
## 2 E 3077.
## 3 F 3725.
## 4 G 3999.
## 5 H 4487.
## 6 I 5092.
## 7 J 5324.
Question 7
idealdiamonds <- diamonds %>%
filter(cut == "Ideal") %>%
group_by(color)
# Color G is the most frequent color in Ideal cut with 4884 diamonds in this category.
summary(idealdiamonds$color)
## D E F G H I J
## 2834 3903 3826 4884 3115 2093 896
Question 8
tablepercarats <- diamonds %>%
select(clarity, table, carat) %>%
mutate(tpc = table/carat) %>%
group_by(clarity) %>%
summarise(avgtpc = mean(tpc))
# VVS1 has the largest table per carat with 141 tables per carat.
tablepercarats
## # A tibble: 8 x 2
## clarity avgtpc
## <ord> <dbl>
## 1 I1 56.3
## 2 SI2 69.1
## 3 SI1 89.6
## 4 VS2 103.
## 5 VS1 107.
## 6 VVS2 127.
## 7 VVS1 141.
## 8 IF 140.
Question 9
ppc <- diamonds %>%
filter(price > 10000) %>%
mutate(ppc = price/carat) %>%
summarise(avg = mean(ppc))
# The average price per carat of diamonds over $10,000 is $8,044.
ppc
## # A tibble: 1 x 1
## avg
## <dbl>
## 1 8044.
Question 10
commonclarity <- diamonds %>%
filter(price > 10000) %>%
group_by(clarity) %>%
summarise(clarity)
# Diamonds over $10,000 are most commonly SI2 clarity.
summary(commonclarity)
## clarity
## SI2 :1239
## SI1 :1184
## VS2 :1155
## VS1 : 747
## VVS2 : 452
## VVS1 : 247
## (Other): 198
# Exploring the dataset
data("ToothGrowth")
#?ToothGrowth
# View(ToothGrowth)
Question 1
Each row represents one observation for each of the guinea pigs.
Question 2
Each column represents one of the variables. These include tooth length- continuous numeric, supplement type- categorical, and dose- ordinal categorical.
Question 3
The response variable is tooth growth, and the explanatory variables are supplement type and dose.
Question 4
I predict that higher dosage will correlate with more tooth growth, and that the VC supplement will be more effective than OJ.
Question 5
ggplot(ToothGrowth, aes(supp, len, fill = supp)) +
geom_boxplot()
Question 6
ggplot(ToothGrowth, aes(supp, len, fill = supp)) +
geom_boxplot() +
facet_wrap(~dose)
Question 7
It seems like generally VC results in higher tooth length, and that length increases with dose.
Question 8
Part of my hypothesis is supported in that tooth growth increases with dose, but not that VC was more effective than OJ.