##Problem 2 This problem is of calculating correlations between some input attributes (or predictive attributes) and the output attribute (or predictable attribute) in the a2-p2.csv dataset. Calculate following correlations: correl(A1, A4) correl(A2, A4) correl(A3, A4)
library(readxl)
a <- read_excel("C:/Users/Baha/Downloads/a2-p2.xlsx")
##Correlation between variable A1 and A4
library(stats)
library(corrr)
cor.test(a$A1, a$A4)
##
## Pearson's product-moment correlation
##
## data: a$A1 and a$A4
## t = 1.5756, df = 98, p-value = 0.1183
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04048968 0.34300702
## sample estimates:
## cor
## 0.1571785
##Conclusion At 5% level of significance, the correlation value is 0.1571785, which indicates a positive correlation between variable A1 and A4. The correlation is not very strong since it is below 0.5.
##Correlation between variables A2 and A4
cor.test(a$A2, a$A4)
##
## Pearson's product-moment correlation
##
## data: a$A2 and a$A4
## t = -1.2631, df = 98, p-value = 0.2096
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.31514917 0.07163332
## sample estimates:
## cor
## -0.1265656
##Conclusion Testing at 5% level of significance, the correlation between the variables A2 and A4 is negative as indicated by the value -0.1265656, although they are not strongly negatively correlated. This means that as one variable increases another one decreases.
##Correlation between variable A3 and A4
cor.test(a$A3, a$A4)
##
## Pearson's product-moment correlation
##
## data: a$A3 and a$A4
## t = 3.8463, df = 98, p-value = 0.0002134
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1784321 0.5214805
## sample estimates:
## cor
## 0.3621576
##Conclusion At 5% level of significance, the correlation between variable A3 and A4 is positive, though slightly positive. This is indicated by the value 0.3621576.
##Correlation of the a2_p2 dataset, of all the variables
##Calculate the correlation matrix
correlation_matrix <- cor(a)
##print the correlation matrix
print(correlation_matrix)
## A1 A2 A3 A4
## A1 1.0000000 -0.1444047 0.2045155 0.1571785
## A2 -0.1444047 1.0000000 -0.2828461 -0.1265656
## A3 0.2045155 -0.2828461 1.0000000 0.3621576
## A4 0.1571785 -0.1265656 0.3621576 1.0000000
##Conclusion. From the results above of the correlation matrix, it is shown that variable A3 had the strongest correlation with A4 with a value of 0.3621576, followed by A1 which had a stronger correlation with A4 with a value of 0.1571785 and finally A2 which had a negative correlation with A4.
##Problem 3. This problem is of determining correlation between two nominal attributes using the chi-square test. Consider the a2-p3.csv dataset. (1) Determine whether there is a correlation between attribute A1 and attribute A4.
##Using the chi squared test, the correlation between attribute A1 and A4 is determined as follows...
library(readxl)
p <- read_excel("C:/Users/Baha/Downloads/a2-p3.xlsx")
## The contingency table is..
contingencytable <-table(p$A1, p$A4)
contingencytable
##
## No Yes
## Middle 80 205
## Old 36 77
## Young 5 39
##Calculating the expected frequencies is as follows
Expected <-prop.table(contingencytable) * dim(contingencytable)[1]
Expected
##
## No Yes
## Middle 0.54298643 1.39140271
## Old 0.24434389 0.52262443
## Young 0.03393665 0.26470588
##Calculating the chi squared test statistic is as follows
test_statistic<- sum((contingencytable - Expected)^2 / Expected)
test_statistic
## [1] 64240.33
# Determining the degrees of freedom
df <- (dim(contingencytable)[1] - 1) * (dim(contingencytable)[2] - 1);df
## [1] 2
# Obtaining the p-value
p_value <- pchisq(test_statistic, df)
p_value
## [1] 1
# Interpreting the results
if (p_value < 0.05) {
cat("There is a significant correlation between attribute A1 and A4.\n")
} else {
cat("There is no enough evidence to conclude that there is significant correlation between attribute A1 and A4.\n")
}
## There is no enough evidence to conclude that there is significant correlation between attribute A1 and A4.
## The contingency table is..
contingencytable <-table(p$A2, p$A4)
contingencytable
##
## No Yes
## High 8 103
## Low 46 57
## Middle 67 161
##Calculating the expected number
Expected <-prop.table(contingencytable) * dim(contingencytable)[2]
Expected
##
## No Yes
## High 0.0361991 0.4660633
## Low 0.2081448 0.2579186
## Middle 0.3031674 0.7285068
##Calculating the chi squared test statistic,
test_statistic<- sum((contingencytable - Expected)^2 / Expected)
test_statistic
## [1] 96800
# Determining the degrees of freedom
df <- (dim(contingencytable)[1] - 1) * (dim(contingencytable)[2] - 1);df
## [1] 2
# Obtaining the p-value
p_value <- pchisq(test_statistic, df)
p_value
## [1] 1
# Interpreting the results
if (p_value < 0.05) {
cat("There is a significant correlation between attribute A2 and A4.\n")
} else {
cat("There is no enough evidence to conclude that there is significant correlation between attribute A2 and A4.\n")
}
## There is no enough evidence to conclude that there is significant correlation between attribute A2 and A4.