library(tidyverse)
avgPandD = diamonds %>% summarise(mean(price), mean(depth))
avgPandD
## # A tibble: 1 x 2
## `mean(price)` `mean(depth)`
## <dbl> <dbl>
## 1 3933. 61.7
diamonds_ppc = diamonds %>% mutate(price_per_c = price/carat)
diamonds_ppc
## # A tibble: 53,940 x 11
## carat cut color clarity depth table price x y z price_per_c
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 1417.
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 1552.
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 1422.
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 1152.
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 1081.
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 1400
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 1400
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 1296.
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 1532.
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 1470.
## # … with 53,930 more rows
cp = diamonds %>% group_by(cut) %>% summarise(mean(price))
## `summarise()` ungrouping output (override with `.groups` argument)
cp
## # A tibble: 5 x 2
## cut `mean(price)`
## <ord> <dbl>
## 1 Fair 4359.
## 2 Good 3929.
## 3 Very Good 3982.
## 4 Premium 4584.
## 5 Ideal 3458.
cd = diamonds %>% group_by(color) %>% summarise(mean(depth), mean(table))
## `summarise()` ungrouping output (override with `.groups` argument)
cd
## # A tibble: 7 x 3
## color `mean(depth)` `mean(table)`
## <ord> <dbl> <dbl>
## 1 D 61.7 57.4
## 2 E 61.7 57.5
## 3 F 61.7 57.4
## 4 G 61.8 57.3
## 5 H 61.8 57.5
## 6 I 61.8 57.6
## 7 J 61.9 57.8
df1 = diamonds %>% group_by(color) %>% summarise(mean(depth))
## `summarise()` ungrouping output (override with `.groups` argument)
df2 = diamonds %>% group_by(color) %>% summarise(mean(table))
## `summarise()` ungrouping output (override with `.groups` argument)
main = left_join(diamonds, df1)
## Joining, by = "color"
main
## # A tibble: 53,940 x 11
## carat cut color clarity depth table price x y z `mean(depth)`
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 61.7
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 61.7
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 61.7
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 61.8
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 61.9
## 6 0.24 Very G… J VVS2 62.8 57 336 3.94 3.96 2.48 61.9
## 7 0.24 Very G… I VVS1 62.3 57 336 3.95 3.98 2.47 61.8
## 8 0.26 Very G… H SI1 61.9 55 337 4.07 4.11 2.53 61.8
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 61.7
## 10 0.23 Very G… H VS1 59.4 61 338 4 4.05 2.39 61.8
## # … with 53,930 more rows
size_v_col = diamonds %>% group_by(color) %>% summarise(mean(carat))
## `summarise()` ungrouping output (override with `.groups` argument)
size_v_col
## # A tibble: 7 x 2
## color `mean(carat)`
## <ord> <dbl>
## 1 D 0.658
## 2 E 0.658
## 3 F 0.737
## 4 G 0.771
## 5 H 0.912
## 6 I 1.03
## 7 J 1.16
Color J seems to be the largest on average.
cut_color = diamonds %>% filter(cut == "Ideal") %>% count(color)
cut_color
## # A tibble: 7 x 2
## color n
## <ord> <int>
## 1 D 2834
## 2 E 3903
## 3 F 3826
## 4 G 4884
## 5 H 3115
## 6 I 2093
## 7 J 896
## You can also do
cut_color2 = diamonds %>%
filter(cut == "Ideal")%>%
group_by(color)%>%
summarise(n=n())
## `summarise()` ungrouping output (override with `.groups` argument)
cut_color2
## # A tibble: 7 x 2
## color n
## <ord> <int>
## 1 D 2834
## 2 E 3903
## 3 F 3826
## 4 G 4884
## 5 H 3115
## 6 I 2093
## 7 J 896
There are the most ideally cut diamonds with a color of G.
tpc = diamonds %>% group_by(clarity) %>% summarise(mean(table/carat))
## `summarise()` ungrouping output (override with `.groups` argument)
tpc
## # A tibble: 8 x 2
## clarity `mean(table/carat)`
## <ord> <dbl>
## 1 I1 56.3
## 2 SI2 69.1
## 3 SI1 89.6
## 4 VS2 103.
## 5 VS1 107.
## 6 VVS2 127.
## 7 VVS1 141.
## 8 IF 140.
VVS1 has the highest average table per carats.
avg_ppc = diamonds %>% filter(price > 10000) %>% summarise(mean(price/carat))
avg_ppc
## # A tibble: 1 x 1
## `mean(price/carat)`
## <dbl>
## 1 8044.
comclar = diamonds %>% filter(price > 10000) %>% count(clarity)
comclar
## # A tibble: 8 x 2
## clarity n
## <ord> <int>
## 1 I1 30
## 2 SI2 1239
## 3 SI1 1184
## 4 VS2 1155
## 5 VS1 747
## 6 VVS2 452
## 7 VVS1 247
## 8 IF 168
## YOU CAN ALSO DO
comclar2 = diamonds %>%
filter(price > 10000) %>%
group_by(clarity)%>%
summarise(n=n())
## `summarise()` ungrouping output (override with `.groups` argument)
comclar2
## # A tibble: 8 x 2
## clarity n
## <ord> <int>
## 1 I1 30
## 2 SI2 1239
## 3 SI1 1184
## 4 VS2 1155
## 5 VS1 747
## 6 VVS2 452
## 7 VVS1 247
## 8 IF 168
The most common clarity for diamonds over $10000 is SI2.
data("ToothGrowth")
?ToothGrowth
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Individual Guinea Pigs.
The first column represents tooth length. This variable is numerical and continuous. The second represents supplement type. This is categorical but not ordinal. The final column represents dose. This is numeric and discrete.
The response variable is tooth length in this study. The explanatory variable are supplement type and dose.
Vitamin C supplementation does not impact the length of odontoblasts in Guinea Pigs.
ggplot(ToothGrowth, aes(supp, len, fill = supp)) + geom_boxplot()
ggplot(ToothGrowth, aes(supp, len, fill = supp)) + geom_boxplot() + facet_wrap(~dose)
When viewing the box plot from question 6 you can immediately see that as the supplement dosage increased so did the tooth length. Additionally, the Guinea Pigs that received OJ had significantly a higher response value in Guinea Pigs that received a dosage of 0.5 or 1 milligrams a day. In Guinea Pigs that received 2 mg/day, the medians are about even.
Initially, it looks like vitamin C supplementation could impact odontoblast lengths in Guinea Pigs. This would be a rejection of the null hypothesis. Additionally, it appears orange juice could be a more effective way of delivering the vitamin. When the dosage is lowered, it seems the delivery method is more impactful, potentially due to asorbic being a less efficient method of delivery.