Dev J. Amin, RUID: 216002438
tapply(filtered$Acc060, filtered$MakeClean, summary)
## $Audi
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.500 5.950 7.350 7.050 8.075 8.300
##
## $BMW
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.7 6.1 6.1 6.2 6.3 6.8
##
## $Chevrolet
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.300 6.900 7.900 7.833 8.100 12.800
##
## $Kia
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.20 8.60 8.80 8.84 9.50 10.10
tapply(filtered$Acc060, filtered$MakeClean, sd)
## Audi BMW Chevrolet Kia
## 1.258173 0.400000 2.486463 1.092245
boxplot(Acc060 ~ MakeClean, data = filtered, main = "Boxplot of Acc060 by Make", ylab = "Acc060", xlab = "Make")
pie(table(filtered$Type), main = "Pie Chart of Car Types")
barplot(table(filtered$Drive), main = "Bar Graph of Drive Types", xlab = "Drive", ylab = "Count")
summary(filtered$HwyMPG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.00 27.00 29.00 29.92 34.00 39.00
sd(filtered$HwyMPG)
## [1] 5.073789
qqnorm(filtered$HwyMPG, main="")
qqline(filtered$HwyMPG, col="red")
t.test(filtered$HwyMPG, conf.level = 0.99)$conf.int
## [1] 27.08178 32.75822
## attr(,"conf.level")
## [1] 0.99
Note Part D). Interpret the Confidence Interval in terms of the problem: We are 99% confident that the true population mean for Highway Mileage among these four makes falls between the two values calculated above.
filtered$WB_Group <- ifelse(filtered$Wheelbase < 111, "< 111", ">= 111")
par(mfrow=c(1,2))
qqnorm(filtered$UTurn[filtered$WB_Group == "< 111"], main="Q-Q: WB < 111")
qqline(filtered$UTurn[filtered$WB_Group == "< 111"], col="red")
qqnorm(filtered$UTurn[filtered$WB_Group == ">= 111"], main="Q-Q: WB >= 111")
qqline(filtered$UTurn[filtered$WB_Group == ">= 111"], col="red")
par(mfrow=c(1,1))
t.test(UTurn ~ WB_Group, data = filtered, conf.level = 0.99)
##
## Welch Two Sample t-test
##
## data: UTurn by WB_Group
## t = -4.6396, df = 18.087, p-value = 0.0002014
## alternative hypothesis: true difference in means between group < 111 and group >= 111 is not equal to 0
## 99 percent confidence interval:
## -5.638658 -1.322381
## sample estimates:
## mean in group < 111 mean in group >= 111
## 37.42857 40.90909
Note: A) State the Null and Alternate Hypothesis: Ho: mu1 = mu2 (Mean U-turn diameter is equal for Wheelbase < 111 and >= 111) Ha: mu1 != mu2 (Mean U-turn diameter is unequal for the two groups)
Note: B) Explain why the test is appropriate: We use an independent two-sample t-test because we are comparing the means of two distinct, unrelated categorical groups based on a continuous variable (UTurn).
Note: C) Are the two populations normally distributed?: Yes they are normally distributed
Note: D) Write your conclusion: If the p-value from the t-test is < 0.01, reject Ho and conclude the means are significantly different. If > 0.01, fail to reject Ho.
plot(filtered$Weight, filtered$HwyMPG,
main = "Scatter Plot: Weight vs HwyMPG",
xlab = "Weight", ylab = "HwyMPG",
col = as.factor(filtered$MakeClean), pch = 16)
legend("topright", legend = unique(filtered$MakeClean), col = 1:4, pch = 16)
cor(filtered$Weight, filtered$HwyMPG)
## [1] -0.878204
Note: Part A) Choose 2 continuous variables: Weight and HwyMPG
Note: Part C) Calculate and comment on correlation: There exists no correlation as the data point is non linear
regression <- lm(HwyMPG ~ Weight, data = filtered)
summary(regression)
##
## Call:
## lm(formula = HwyMPG ~ Weight, data = filtered)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2342 -1.9335 -0.0976 1.7110 4.6161
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.4360500 2.2710364 21.768 < 2e-16 ***
## Weight -0.0051800 0.0005882 -8.806 7.95e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.479 on 23 degrees of freedom
## Multiple R-squared: 0.7712, Adjusted R-squared: 0.7613
## F-statistic: 77.54 on 1 and 23 DF, p-value: 7.954e-09
predict(regression, newdata = data.frame(Weight = 3500))
## 1
## 31.30616
Note: Part A) Choose one independent (X) and dependent (Y) variable: X = Weight, Y = HwyMPG
Note: Part D) Calculate R2 and interpret: Multiple R-squared: 0.7712, Adjusted R-squared: 0.7613
Note: Part E) Test whether variable x is useful: (i) Hypotheses: Ho: beta1 = 0 (X is not useful) vs Ha: beta1 != 0 (X is useful) (ii) t value (intercept): 21.768, t value (weight): -8.806 (iii) P value: 7.954e-09. (iv) Summary: P-value is significantly less than .05 meaning reject null hypothesis. There is sufficient evidence to support the claim of the alternate hypothesis.
anova <- aov(HwyMPG ~ MakeClean, data = filtered)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## MakeClean 3 77.5 25.83 1.004 0.411
## Residuals 21 540.4 25.73
TukeyHSD(anova, conf.level = 0.90)
## Tukey multiple comparisons of means
## 90% family-wise confidence level
##
## Fit: aov(formula = HwyMPG ~ MakeClean, data = filtered)
##
## $MakeClean
## diff lwr upr p adj
## BMW-Audi 2.1333333 -5.361101 9.627767 0.8980260
## Chevrolet-Audi -0.1111111 -6.634179 6.411956 0.9999733
## Kia-Audi 4.3333333 -3.161101 11.827767 0.5067878
## Chevrolet-BMW -2.2444444 -9.147810 4.658921 0.8566840
## Kia-BMW 2.2000000 -5.627681 10.027681 0.9013683
## Kia-Chevrolet 4.4444444 -2.458921 11.347810 0.4157683
Note: Part 1: Set up Null and Alternate Hypothesis: Ho: mu_Audi = mu_BMW = mu_Chevy = mu_Kia (All means for HwyMPG are equal) Ha: At least one mean HwyMPG is different.
Note: Part 3: Write down the F statistic: F-val: 1.004
Note: Part 4: Summarize your results: The P-value is greater than .10 and therefore do not reject the null hypothesis. There is sufficient evidence to support the claim.