**Dev J. Amin, RUID: 216002438**

Dev J. Amin, RUID: 216002438

Part 1A

A. Calculate Descriptive statistics for Acc060 for each of the four makes.

B. Obtain box plot for Acc060 for each of the 4 makes and comment.

tapply(filtered$Acc060, filtered$MakeClean, summary)
## $Audi
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.500   5.950   7.350   7.050   8.075   8.300 
## 
## $BMW
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     5.7     6.1     6.1     6.2     6.3     6.8 
## 
## $Chevrolet
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.300   6.900   7.900   7.833   8.100  12.800 
## 
## $Kia
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.20    8.60    8.80    8.84    9.50   10.10
tapply(filtered$Acc060, filtered$MakeClean, sd)  
##      Audi       BMW Chevrolet       Kia 
##  1.258173  0.400000  2.486463  1.092245
boxplot(Acc060 ~ MakeClean, data = filtered, main = "Boxplot of Acc060 by Make", ylab = "Acc060", xlab = "Make")

Part 1B

A. One Pie chart on Type.

B. One Bar graph on Drive.

pie(table(filtered$Type), main = "Pie Chart of Car Types")

barplot(table(filtered$Drive), main = "Bar Graph of Drive Types", xlab = "Drive", ylab = "Count")

Part 2: Calculate 99% confidence interval for the average Highway Mileage for all the 4 makes combined.

A. Display full descriptive statistics.

B. Obtain the normal probability plot. Is it normally distributed?.

C. Obtain the 99 % confidence interval for the average Highway Mileage. ### D. Interpret the Confidence Interval in terms of the problem.

summary(filtered$HwyMPG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   27.00   29.00   29.92   34.00   39.00
sd(filtered$HwyMPG)
## [1] 5.073789
qqnorm(filtered$HwyMPG, main="")
qqline(filtered$HwyMPG, col="red")

t.test(filtered$HwyMPG, conf.level = 0.99)$conf.int
## [1] 27.08178 32.75822
## attr(,"conf.level")
## [1] 0.99

Note Part D). Interpret the Confidence Interval in terms of the problem: We are 99% confident that the true population mean for Highway Mileage among these four makes falls between the two values calculated above.

Part 3: Sort by wheelbase by less than 111 and >= 111. Compare mean diameter needed for U turn, for the sorted data, using test of hypothesis for comparing the two means.

Test whether the means are unequal. Use alpha =0.01. Use appropriate test for comparing two means.

A. State the Null and Alternate Hypothesis.

B. Explain why the test you used is appropriate for the problem.

C. Obtain the normal probability plot. Individual plot for each sample. Are the two populations normally distributed? Yes or No. State clearly.

D. Write your conclusion based on the result you obtain.

filtered$WB_Group <- ifelse(filtered$Wheelbase < 111, "< 111", ">= 111")
 
par(mfrow=c(1,2))
qqnorm(filtered$UTurn[filtered$WB_Group == "< 111"], main="Q-Q: WB < 111")
qqline(filtered$UTurn[filtered$WB_Group == "< 111"], col="red")
qqnorm(filtered$UTurn[filtered$WB_Group == ">= 111"], main="Q-Q: WB >= 111")
qqline(filtered$UTurn[filtered$WB_Group == ">= 111"], col="red")

par(mfrow=c(1,1)) 

t.test(UTurn ~ WB_Group, data = filtered, conf.level = 0.99)
## 
##  Welch Two Sample t-test
## 
## data:  UTurn by WB_Group
## t = -4.6396, df = 18.087, p-value = 0.0002014
## alternative hypothesis: true difference in means between group < 111 and group >= 111 is not equal to 0
## 99 percent confidence interval:
##  -5.638658 -1.322381
## sample estimates:
##  mean in group < 111 mean in group >= 111 
##             37.42857             40.90909

Note: A) State the Null and Alternate Hypothesis: Ho: mu1 = mu2 (Mean U-turn diameter is equal for Wheelbase < 111 and >= 111) Ha: mu1 != mu2 (Mean U-turn diameter is unequal for the two groups)

Note: B) Explain why the test is appropriate: We use an independent two-sample t-test because we are comparing the means of two distinct, unrelated categorical groups based on a continuous variable (UTurn).

Note: C) Are the two populations normally distributed?: Yes they are normally distributed

Note: D) Write your conclusion: If the p-value from the t-test is < 0.01, reject Ho and conclude the means are significantly different. If > 0.01, fail to reject Ho.

Part 4: Correlation.

A. Choose 2 continuous variables in your data.

B. Plot one combined scatter diagram reflecting the 4 makes using the 2 continuous variable.

C. Calculate the correlation and Comment on the correlation.

plot(filtered$Weight, filtered$HwyMPG, 
     main = "Scatter Plot: Weight vs HwyMPG",
     xlab = "Weight", ylab = "HwyMPG", 
     col = as.factor(filtered$MakeClean), pch = 16)
legend("topright", legend = unique(filtered$MakeClean), col = 1:4, pch = 16)

cor(filtered$Weight, filtered$HwyMPG)
## [1] -0.878204

Note: Part A) Choose 2 continuous variables: Weight and HwyMPG

Note: Part C) Calculate and comment on correlation: There exists no correlation as the data point is non linear

Part 5: Regression Analysis

A. Choose one independent variable and a dependent variable. (You can choose the same two variables used in part IV).

B. Obtain regression equation.

C. Assume a value of X within the range of the data and predict the Y hat using the Regression Equation.

D. Calculate R2 and interpret.

E. Test whether variable x is useful in predicting Y.

(i). Write down the Null and Alternate hypothesis.

(ii) T statistic.

(iii) what is the P value?

(iv) Summary:

regression <- lm(HwyMPG ~ Weight, data = filtered)
summary(regression)
## 
## Call:
## lm(formula = HwyMPG ~ Weight, data = filtered)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2342 -1.9335 -0.0976  1.7110  4.6161 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 49.4360500  2.2710364  21.768  < 2e-16 ***
## Weight      -0.0051800  0.0005882  -8.806 7.95e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.479 on 23 degrees of freedom
## Multiple R-squared:  0.7712, Adjusted R-squared:  0.7613 
## F-statistic: 77.54 on 1 and 23 DF,  p-value: 7.954e-09
predict(regression, newdata = data.frame(Weight = 3500))
##        1 
## 31.30616

Note: Part A) Choose one independent (X) and dependent (Y) variable: X = Weight, Y = HwyMPG

Note: Part D) Calculate R2 and interpret: Multiple R-squared: 0.7712, Adjusted R-squared: 0.7613

Note: Part E) Test whether variable x is useful: (i) Hypotheses: Ho: beta1 = 0 (X is not useful) vs Ha: beta1 != 0 (X is useful) (ii) t value (intercept): 21.768, t value (weight): -8.806 (iii) P value: 7.954e-09. (iv) Summary: P-value is significantly less than .05 meaning reject null hypothesis. There is sufficient evidence to support the claim of the alternate hypothesis.

Part 6: Comparing Mean Highway mileage for 4 makes. (Use ANOVA for comparison) Conduct Multiple comparisons (if rejecting Ho).

1. Set up Null and Alternate Hypothesis.

2. Do the ANOVA test, use alpha = 0.10.

3. Write down the F statistic.

4. Summarize your result.

anova <- aov(HwyMPG ~ MakeClean, data = filtered)
summary(anova)
##             Df Sum Sq Mean Sq F value Pr(>F)
## MakeClean    3   77.5   25.83   1.004  0.411
## Residuals   21  540.4   25.73
TukeyHSD(anova, conf.level = 0.90)
##   Tukey multiple comparisons of means
##     90% family-wise confidence level
## 
## Fit: aov(formula = HwyMPG ~ MakeClean, data = filtered)
## 
## $MakeClean
##                      diff       lwr       upr     p adj
## BMW-Audi        2.1333333 -5.361101  9.627767 0.8980260
## Chevrolet-Audi -0.1111111 -6.634179  6.411956 0.9999733
## Kia-Audi        4.3333333 -3.161101 11.827767 0.5067878
## Chevrolet-BMW  -2.2444444 -9.147810  4.658921 0.8566840
## Kia-BMW         2.2000000 -5.627681 10.027681 0.9013683
## Kia-Chevrolet   4.4444444 -2.458921 11.347810 0.4157683

Note: Part 1: Set up Null and Alternate Hypothesis: Ho: mu_Audi = mu_BMW = mu_Chevy = mu_Kia (All means for HwyMPG are equal) Ha: At least one mean HwyMPG is different.

Note: Part 3: Write down the F statistic: F-val: 1.004

Note: Part 4: Summarize your results: The P-value is greater than .10 and therefore do not reject the null hypothesis. There is sufficient evidence to support the claim.