library('tidyverse')
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

This package only needs to be installed once or updated, # it out on my local machine, but will need to be installed for those who have not previously used it

#install.packages('palmerpenguins')
library(palmerpenguins)
?penguins

Part A.)

It does appear that there is a simple linear relationship between Flipper Length and Body Mass. The relationship is as flipper length increases so does body mass.

penguins%>%
ggplot(aes(x=flipper_length_mm, y=body_mass_g))+ 
  geom_point(alpha  = 0.5)+
  theme(plot.title = element_text(hjust = 0.5))+
  labs(x="Flipper Length (mm)", y="Body Mass (g)", title="Penguins")
## Warning: Removed 2 rows containing missing values (geom_point).

Part B.)

When looking at each of the species, Gentoo is the largest in terms of Flipper Length and Body Mass, then for Adelie and Chinstrap they are both smaller in size. It still appears that there is a relationship with as flipper length gets larger so does body mass.

penguins%>%
ggplot(aes(x=flipper_length_mm, y=body_mass_g, color=species))+ 
  geom_point(alpha  = 0.5)+
  theme(plot.title = element_text(hjust = 0.5))+
  labs(x="Flipper Length (mm)", y="Body Mass (g)", title="Penguins")
## Warning: Removed 2 rows containing missing values (geom_point).

### Part C.)

First I wanted to look at each of the penguins as individuals before looking at just the Gentoo.

penguins%>%
ggplot(aes(x=flipper_length_mm, y=body_mass_g, color=species))+ 
  geom_point(alpha  = 0.5)+
  facet_wrap(~species)
## Warning: Removed 2 rows containing missing values (geom_point).

  theme(plot.title = element_text(hjust = 0.5))+
  labs(x="Flipper Length (mm)", y="Body Mass (g)", title="Penguins")
## List of 4
##  $ plot.title:List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : NULL
##   ..$ hjust        : num 0.5
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ x         : chr "Flipper Length (mm)"
##  $ y         : chr "Body Mass (g)"
##  $ title     : chr "Penguins"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE

After filtering for the Gentoo penguin only the flipper length still appears to have a linear relationship with bodyy mass, however, it does appear weaker now that the x variable has a larger variance.

penguins%>%
  filter(species == 'Gentoo')%>%
ggplot(aes(x=flipper_length_mm, y=body_mass_g))+ 
  geom_point(alpha  = 0.5)+
  theme(plot.title = element_text(hjust = 0.5))+
  labs(x="Flipper Length (mm)", y="Body Mass (g)", title="Penguins")
## Warning: Removed 1 rows containing missing values (geom_point).

Part D.)

Gen2 <- penguins%>%
  filter(!is.na(body_mass_g),!is.na(flipper_length_mm),species == 'Gentoo')

The correlation between flipper length and body mass is .70. This is indicating that there is a high positive correlation. Correlation does not mean causation, therefore we will need to do further tests to see how reliable this information is.

cor(Gen2$flipper_length_mm, Gen2$body_mass_g)
## [1] 0.7026665

Part E,F,G.)

body mass g = -6787.281 + 54.623*flipper length mm Test -> 5776.009 = -6787.281 + 54.623(230)

The slope makes sense because as the flipper length gets larger, then the weight of the penguin will increase. We see this in the scatterplot as having a linear relationship and is displayed in the slope.

The intercept is negative because the smallest flipper length of ~200 mm has a weight of ~4600 g whereas the largest flipper length is ~235 mm with a weight of ~5600 g. The intercept could have been positive with a smaller coefficient, but this could have impacted the regression line. This negative intercept is also suggesting that the model is over predicting which comes from what appear to be outliers in the range of 205 to 222 in flipper length, that have higher body mass than other penguins with longer flippers and lower body mass.

There are no penguins that would have a negative weight, meaning the intercept does not entirely make sense.

Gen2_result <- lm(body_mass_g~flipper_length_mm, data=Gen2)
summary(Gen2_result)
## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = Gen2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -911.18 -235.76  -51.93  170.75 1015.71 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -6787.281   1092.552  -6.212 7.65e-09 ***
## flipper_length_mm    54.623      5.028  10.863  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 360.2 on 121 degrees of freedom
## Multiple R-squared:  0.4937, Adjusted R-squared:  0.4896 
## F-statistic:   118 on 1 and 121 DF,  p-value: < 2.2e-16

Part H.)

The value of r^2 here is to show that there is a positive moderately strong relationship. The data points are not entirely centered around the linear regression line, but do show that as flipper length increases we can expect that body mass in grams will also increase.

About half of the variation in body mass is explained by flipper length with an R^2 of 0.4937.

summary(Gen2_result)$r.squared
## [1] 0.4937402

Part I.)

sigma(Gen2_result)
## [1] 360.1676

Part J.)

The predicted body mass for a Gentoo penguin with a flipper length of 220mm is 5229.78g.

5229.779 = -6787.281 + 54.623(220)

Part K.)

names(Gen2_result)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
anova.Gen2 <- anova(Gen2_result)
anova.Gen2
## Analysis of Variance Table
## 
## Response: body_mass_g
##                    Df   Sum Sq  Mean Sq F value    Pr(>F)    
## flipper_length_mm   1 15308045 15308045  118.01 < 2.2e-16 ***
## Residuals         121 15696203   129721                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SST<-sum(anova.Gen2$"Sum Sq")
SST
## [1] 31004248
anova.Gen2$"Sum Sq"[1]/SST
## [1] 0.4937402

Part L.)

Null Hypothesis: Flipper length in mm is not a predictor of body mass in grams Alternative Hypothesis: A linear relationship exists between flipper length in mm and body mass in grams

Part M.)

The F Statistic of 118.01 is found by taking the variance of the group means / mean of the within group variances

Part N.)

The numerator df is the df for the thing your null hypothesis relates to, the denominator df are the degrees of freedom for the residual.

numerator = df1 denominator = df2

If your calculated F value (statistic) in a test is larger than your F critical value, you can reject the null hypothesis. The lower the p-value, the greater the statistical significance of the observed difference

F Critical Value is 3.9 whereas the F Statistic is 118 and has a very small p-value, which means we can reject our null hypothesis providing evidence that there is a linear relationship between flipper length and body mass for Gentoo penguins.

qf(.05,df1 = 1, df2 = 121,lower.tail=FALSE)
## [1] 3.919465

Part O.)

confint(Gen2_result,level = 0.95)
##                         2.5 %      97.5 %
## (Intercept)       -8950.27535 -4624.28587
## flipper_length_mm    44.66777    64.57724

Part P.)

Are your results from parts 4n and 4o consistent? Briefly explain

Since the confidence interval does not contain 0, we can reject the null hypothesis, which is the same conclusion I came to by comparing the F Critical Value and F Statistic, as well as the results of the p-value.

Part Q.)

The predicted body mass for a Gentoo penguin with a flipper length of 200mm is 4,137.39g.

4,137.319g = -6787.281 + 54.623(200)

newdata<-data.frame(flipper_length_mm=200)
predict(Gen2_result,newdata,level=0.95,interval="confidence")
##       fit      lwr      upr
## 1 4137.22 3954.446 4319.993

Part R.)

predict(Gen2_result,newdata,level=0.95,interval="prediction")
##       fit      lwr      upr
## 1 4137.22 3401.121 4873.319

Part S.)

A researcher hypothesizes that for Gentoo penguins, the predicted body mass increases by more than 50 g for each additional mm in flipper length. Conduct an appropriate hypothesis test. What is the null and alternative hypotheses, test statistic, and conclusion?

Ho :Bo = 50 Ha :B1 not equal 50

T-test, will look to see if the slope is equal

Using the confidence interval at 95% we fail to reject the null hypothesis because the range from 2.5% to 97.5% is from 44.6g to 64.6g, which includes our null hypothesis of 50g and below that number.

t.test(Gen2$flipper_length_mm,Gen2$body_mass_g,mu=50)
## 
##  Welch Two Sample t-test
## 
## data:  Gen2$flipper_length_mm and Gen2$body_mass_g
## t = -107.99, df = 122.04, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 50
## 95 percent confidence interval:
##  -4948.818 -4768.840
## sample estimates:
## mean of x mean of y 
##   217.187  5076.016
confint(Gen2_result,level = 0.95)
##                         2.5 %      97.5 %
## (Intercept)       -8950.27535 -4624.28587
## flipper_length_mm    44.66777    64.57724
newdata<-data.frame(flipper_length_mm=200)
predict(Gen2_result,newdata,level=0.95,interval="confidence")
##       fit      lwr      upr
## 1 4137.22 3954.446 4319.993
newdata1<-data.frame(flipper_length_mm=201)
predict(Gen2_result,newdata1,level=0.95,interval="confidence")
##        fit      lwr      upr
## 1 4191.842 4018.352 4365.332