We must view the Red Wine data set below before we answer questions

  1. Produce summary statistics of residual.sugar and use its median to divide the data into two groups A and B. We want to test if “density” in Group A and Group B has the same population mean.

Summary Statistics for residual sugar

summary(winequality$residual.sugar) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500

Now we have to divide the median 2.2 into the groups A and B

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500
##     winequality.residual.sugar residual.sugar.group winequality.density
## 1                         1.90                    A              0.9978
## 2                         2.60                    B              0.9968
## 3                         2.30                    B              0.9970
## 4                         1.90                    A              0.9980
## 5                         1.90                    A              0.9978
## 6                         1.80                    A              0.9978
## 7                         1.60                    A              0.9964
## 8                         1.20                    A              0.9946
## 9                         2.00                    A              0.9968
## 10                        6.10                    B              0.9978
## 11                        1.80                    A              0.9959
## 12                        6.10                    B              0.9978
## 13                        1.60                    A              0.9943
## 14                        1.60                    A              0.9974
## 15                        3.80                    B              0.9986
## 16                        3.90                    B              0.9986
## 17                        1.80                    A              0.9969
## 18                        1.70                    A              0.9968
## 19                        4.40                    B              0.9974
## 20                        1.80                    A              0.9969
## 21                        1.80                    A              0.9968
## 22                        2.30                    B              0.9982
## 23                        1.60                    A              0.9966
## 24                        2.30                    B              0.9968
## 25                        2.40                    B              0.9968
## 26                        1.40                    A              0.9955
## 27                        1.80                    A              0.9962
## 28                        1.60                    A              0.9966
## 29                        1.90                    A              0.9972
## 30                        2.00                    A              0.9964
## 31                        2.40                    B              0.9958
## 32                        2.50                    B              0.9966
## 33                        2.30                    B              0.9966
## 34                       10.70                    B              0.9993
## 35                        1.80                    A              0.9957
## 36                        5.50                    B              0.9986
## 37                        2.40                    B              0.9975
## 38                        2.10                    A              0.9968
## 39                        1.50                    A              0.9940
## 40                        5.90                    B              0.9978
## 41                        5.90                    B              0.9978
## 42                        2.80                    B              0.9976
## 43                        2.60                    B              0.9968
## 44                        2.20                    B              0.9968
## 45                        1.80                    A              0.9962
## 46                        2.10                    A              0.9934
## 47                        2.20                    B              0.9970
## 48                        1.60                    A              0.9969
## 49                        1.60                    A              0.9958
## 50                        1.40                    A              0.9954
## 51                        1.70                    A              0.9971
## 52                        2.20                    B              0.9956
## 53                        2.10                    A              0.9955
## 54                        3.00                    B              0.9970
## 55                        2.80                    B              0.9955
## 56                        3.80                    B              0.9978
## 57                        3.40                    B              0.9971
## 58                        5.10                    B              0.9983
## 59                        2.30                    B              0.9975
## 60                        2.40                    B              0.9962
## 61                        2.20                    B              0.9980
## 62                        1.80                    A              0.9968
## 63                        1.90                    A              0.9968
## 64                        2.00                    A              0.9966
## 65                        4.65                    B              0.9962
## 66                        4.65                    B              0.9962
## 67                        1.50                    A              0.9968
## 68                        1.60                    A              0.9962
## 69                        2.00                    A              0.9969
## 70                        1.90                    A              0.9962
## 71                        1.90                    A              0.9967
## 72                        2.10                    A              0.9962
## 73                        1.90                    A              0.9961
## 74                        2.10                    A              0.9976
## 75                        2.50                    B              0.9984
## 76                        2.20                    B              0.9986
## 77                        2.20                    B              0.9986
## 78                        2.40                    B              0.9966
## 79                        2.00                    A              0.9958
## 80                        1.50                    A              0.9972
## 81                        1.60                    A              0.9958
## 82                        1.90                    A              0.9974
## 83                        2.00                    A              0.9970
## 84                        1.80                    A              0.9969
## 85                        1.80                    A              0.9959
## 86                        2.20                    B              0.9961
## 87                        1.90                    A              0.9972
## 88                        1.90                    A              0.9966
## 89                        2.10                    A              0.9978
## 90                        1.80                    A              0.9978
## 91                        1.90                    A              0.9964
## 92                        1.90                    A              0.9972
## 93                        2.00                    A              0.9972
## 94                        1.90                    A              0.9966
## 95                        1.40                    A              0.9938
## 96                        2.30                    B              0.9932
## 97                        3.00                    B              0.9965
## 98                        2.00                    A              0.9963
## 99                        2.50                    B              0.9967
## 100                       1.90                    A              0.9972

A. State the Null Hypothesis –The density means between A and B show no signficiant difference.

boxplot(winequality$density ~ residual.sugar.group)

It is safe to say the null hypothesis is rejected

What test are you going to use?

–we are going to use the t.test so we can prove true difference in means is not equal to 0 on 95% percent confidence interval

What is the p-value

t.test(winequality$density ~ residual.sugar.group)
## 
##  Welch Two Sample t-test
## 
## data:  winequality$density by residual.sugar.group
## t = -14.955, df = 1571.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -0.001479826 -0.001136653
## sample estimates:
## mean in group A mean in group B 
##       0.9960537       0.9973619

The p-value is 2.2e-16

What is your conclusion? –Based on the evidence we have stated like the null hypothesis there is no difference in density mean between the groups A and B.

Does your conclusion imply that there is an association between “density” and “residual.sugar”?

–Yes we can conclude there is a relationship in both residual sugar and density

  1. Produce summary statistics of “residual.sugar” and use its 1st, 2nd, and 3rd quantiles to divide the data into four groups A, B, C, and D. We want to test if “density” in the four groups has the same population mean. Please answer the following questions.

We will need to view the summary stats for residual sugar again

summary(winequality$residual.sugar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500

Now we are going to seperate the groups based on the quantiles for groups A, B, C, and D

residual.sugar.group2 <- NULL
for (i in 1:length(winequality$residual.sugar)){
  if(winequality$residual.sugar[i] <= 1.9) residual.sugar.group2[i] <- "A"
    else if(winequality$residual.sugar[i] <= 2.2) residual.sugar.group2[i] <- "B"
      else if(winequality$residual.sugar[i] <= 2.6) residual.sugar.group2[i] <- "C"
        else residual.sugar.group2[i] <- "D"
}

table(residual.sugar.group2)
## residual.sugar.group2
##   A   B   C   D 
## 464 419 361 355

State the null hypothesis –No difference density between means groups A, B, C, D

boxplot(winequality$density ~ residual.sugar.group2)

It is safe to say the null hypothesis is rejected

What test are you going to use? -Going to use the anova test for this time because we want to see if there are significant difference between the means of many independ groups like A, B, C, and D unlike the t-test to see one two groups

What is the p-value?

summary(aov(winequality$density ~ residual.sugar.group2))
##                         Df   Sum Sq   Mean Sq F value Pr(>F)    
## residual.sugar.group2    3 0.000996 0.0003321   112.8 <2e-16 ***
## Residuals             1595 0.004696 0.0000029                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value 2e-16

What is your conclusion?

We actually can conclude that the p-value decrease across mulitple groups does show a difference of the density mean groups

Does your conclusion imply that there is an association between “density” and “residual.sugar”? Compare your result here with that in Question 1. Do you think increasing the number of groups help identify the association? Would you consider dividing the data into 10 groups so as to help the discovery of the association? Why?

  1. First we have indicated there is an association between density and residual sugar. Second The p-value is less than t-test after we did aov test. Showing there is mean difference occuring across the multiple groups 2.The problem with dividing more into 10 groups will have the p-value go extremenly small where we will not see any signficiant difference across all the groups when dividing more and more. The best thing we can do is gradually see the groups divide more in a steady pace. As of now, we have found a decrease in the p-value which shows a relationshiop already there for both density and residual sugar.

  2. Create a 2 by 4 contingency table using the categories A, B, C, D of “residual.sugar” and the binary variable “excellent” you created in Part B. Note that you have two factors: the categorical levels of “residual.sugar” (A, B, C and D) and an indicator of excellent wines (yes or no).

here is building the contingency table and establishing the yes and no values

winequality$excellent <-winequality$excellent <- ifelse(winequality$quality >= 7, "Yes", "No")

contingency_table <- table(data.frame(winequality$excellent, residual.sugar.group2))
print(contingency_table)
##                      residual.sugar.group2
## winequality.excellent   A   B   C   D
##                   No  411 367 308 296
##                   Yes  53  52  53  59

–Use the Chi-square test to test if these two factors are correlated or not

chisq.test(contingency_table)
## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 5.5, df = 3, p-value = 0.1386

Based on the chi test there is no correlation between wine excellence and residual sugar because p value is less than 0.05

P-value is 0.1386

Use the permutation test to do the same and compare the result to that in (a)

chisq.test(contingency_table, simulate.p.value = T)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  contingency_table
## X-squared = 5.5, df = NA, p-value = 0.1359

Based on the permutation test, there is still no correlation less than 0.05 (p-value)

P-Value is 0.1439

Can you conclude that “residual.sugar” is a significant factor contributing to the excellence of wine? Why?

-We can’t say after these new discoveries between excellence and wine there is a correlation because the p-values are less than 0.05. So therefore we have to conclude this based in the principal of the null hypothesis.