Goal:
In this milestone we will start fitting our first simple model. Please report all your findings for this assignment in a R Markdown document.
Step 0:
Identify a numeric response variable in your data set and a numeric explanatory variable. (10 points)
response: Bill length
explanatory: Flipper length
Perform a simple linear regression analysis:
PenMod1 <- lm(flipper_length_mm ~ bill_length_mm,data = penguins)
summary(PenMod1)
##
## Call:
## lm(formula = flipper_length_mm ~ bill_length_mm, data = penguins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.708 -7.896 0.664 8.650 21.179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 126.6844 4.6651 27.16 <2e-16 ***
## bill_length_mm 1.6901 0.1054 16.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.63 on 340 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.4306, Adjusted R-squared: 0.4289
## F-statistic: 257.1 on 1 and 340 DF, p-value: < 2.2e-16
Step 2:
Perform a test for the slope.
The summary is above. It looks like the Estimate slope is 1.6901
State the reference distribution, degrees of freedom, the test statistic, and p-value.
Reference distribution:10.63 with 340 degrees of freedom
Test statistic: 16.03
p-value: 2e-16
Write a five-part conclusion incorporating the hypotheses in the context of the problem. (10 points)
We can reject the null hypothesis that there is not correlation of bill length and flipper length with a p-value at 2.2e-16 < 0.05 significance level. There is convicing evidence to strongly suggest that there is infact the alternative hypothesis of the coorelation between bill length and flipper length has a significant linear relationship between the two.
Step 3/4:
Create an ANOVA table and produce the F-statistic and discuss the R-squared value.
Create diagnostic plots to assess model assumptions
R^2 is smaller so the residuals have more unexplained variance in the responds variable in the model since we cannot fully understand biological behaviors at this time.this doesn’t mean that the model is not a good model even though it is smaller and could assume that it is not a good fit. It is a good idea to look at the residual plots to gather more information of your model. However, the F-statistic is larger so there is a larger variance of the mean, which we can say that this can be a good model. Standardized residuals shows there could be some outliers in the data set that has shown an minor decrease in the slop of the line.I would want to investigate this further in my analysis
## Analysis of Variance Table
##
## Response: flipper_length_mm
## Df Sum Sq Mean Sq F value Pr(>F)
## bill_length_mm 1 29032 29032.1 257.09 < 2.2e-16 ***
## Residuals 340 38394 112.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the T-value is at the higher end and the P-value is low, we can assume that this model will be a good fit for the data. Just because the R^2 value was low did not loose the significance of the responds variable, just shows that there is a high variance level and that could be because we are looking at a biological statistics instead of mechanical. The F-value showed that there is high level of variance which is good for a linear regression. I can say that this model is a good fit model to proceed with analysis