10-20-2024

Dataset Mtcars

data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

model:\(\text{cyl} = \beta_0+\beta_1\cdot\text{mpg} +\varepsilon; \hspace{1 cm} \varepsilon\sim N(0;\sigma^2)\)

  fitted:\(\text{Cyl} = \hat{\beta_0} + \hat{\beta_1} \cdot \text{mpg}\)

Dataset mtcars; mpg vs. cyl

model:\(\text{cyl} = \beta_0+\beta_1\cdot\text{mpg} +\varepsilon; \hspace{1 cm} \varepsilon\sim N(0;\sigma^2)\)

  fitted:\(\text{cyl} = \hat{\beta_0} + \hat{\beta_1} \cdot \text{mpg}\)

data(mtcars)

mod = lm(cyl ~ mpg, data = mtcars)
x = mtcars$mpg; y = mtcars$cyl

xax <- list(
  title = "mpg",
  titlefont = list(family = "Modern Computer Roman")
)

yax <- list(
  title = "cyl",
  titlefont = list(family = "Modern Computer Roman"),
  range= c(0, 10)
)

fig <- plot_ly(x = x, y = y, type = "scatter", mode = "markers", name = "data",
               width = 800, height = 430) %>%
       add_lines(x=x, y = fitted(mod), name = "fitted") %>%
       layout(xaxis = xax, yaxis = yax) %>%
       layout(margin = list(
         l = 10,
         r = 20,
         b = 10,
         t = 20
       ))
config(fig, displaylogo = T)

Dataset ChickWeight

data(ChickWeight)
head(ChickWeight)
##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1

Dataset ChickWeight Weight vs. Diet

model:\(\text{time} = \beta_0+\beta_1\cdot\text{weight} +\varepsilon; \hspace{1 cm} \varepsilon\sim N(0;\sigma^2)\)

  fitted:\(\text{Time} = \hat{\beta_0} + \hat{\beta_1} \cdot \text{weight}\)

data(ChickWeight)
ggplot(aes(x = Time, y = weight), data = ChickWeight) +
  geom_point() + geom_smooth(aes(colour = Diet), method = 'lm', se =
                               FALSE)

comments: As shown in this plot, we can clearly see how the weights of chicks is positively correlated with time. Additionally, it seems that diet 3 has the lowest weight average for chicks, but quickly becomes the diet with the biggest average for chick weights at around 7.5 units of time.

Dataset ChickWeight; Diet vs. Weight

model:\(\text{weight} = \beta_0+\beta_1\cdot\text{diet} +\varepsilon; \hspace{1 cm} \varepsilon\sim N(0;\sigma^2)\)

  fitted:\(\text{weight} = \hat{\beta_0} + \hat{\beta_1} \cdot \text{diet}\)

data(ChickWeight)
avg_weights_chick = aggregate(weight ~ Diet, data = ChickWeight, mean)
print(avg_weights_chick)
##   Diet   weight
## 1    1 102.6455
## 2    2 122.6167
## 3    3 142.9500
## 4    4 135.2627
barplot(
  avg_weights_chick$weight,                              
  names.arg = paste("Diet", avg_weights_chick$Diet),     
  col = "pink",                                 
  main = "Average Weights by Diet",                
  xlab = "Diet",                                   
  ylab = "Average Weight",                         
  border = "black"                                 
)

comments: This data is slightly skewed right, but there doesn’t seem to be any outlying diet that indicates a cause for difference in chick weight. Diet 3 did obtain the highest average weight at 140 units per chick. More analysis would have to be done to conclude that the diet caused this higher average weight.

Dataset Iris

data("iris")
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Dataset Iris; Sepal.Length vs. Sepal.width by species

data("iris")
ggplot(aes(x = Sepal.Length, y = Sepal.Width, colour = Species), data = iris) +
  geom_point() + geom_smooth(method = 'lm', se = F) +
  facet_wrap(~Species)

comments: This shows that the best line of fit for Setosa species is more steep than Versicolor and Virginica species. Additionally, the data for Virginica flowers is more spread out, indicating that there seems to be more variance for that species’ sepal length and width than other species