library(pander)
library(ggpubr)
\(~\)
Problem 1: The dean of a business school undertakes a study to relate starting salary after graduation to grade point average GPA in major courses. He then randomly selects records of 10 students shown in the accompanying table. Perform a correlation analysis using Pearson correlation. (5 pts.)
Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
GPA | 78 | 81 | 85 | 87 | 75 | 79 | 83 | 88 | 85 | 77 |
Starting salary | 17 | 18 | 18 | 28 | 17 | 22 | 30 | 34 | 30 | 28 |
\(~\)
Solution:
# Enter data manually.
gpa <- c(78, 81, 85, 87, 75, 79, 83, 88, 85, 77)
ssalary <- c(17, 18, 18, 28, 17, 22, 30, 34, 30, 28)
cor.data1 <- data.frame(gpa, ssalary)
pander(cor.test(cor.data1$gpa, cor.data1$ssalary, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
2.103 | 8 | 0.06858 | two.sided | 0.5967 |
The analysis result shows that there is a moderately strong, positive correlation between GPA and Starting salary (\(r = 0.5967\)). There is no significant linear relationship, however, between gpa and starting salary as indicated by the hypothesis test on the correlation coefficient (\(p > 0.05\)).
\(~\)
The following is a scatterplot of the data.
ggscatter(cor.data1, x="gpa", y="ssalary", add = "reg.line", cor.coef = TRUE,
cor.method = "pearson", xlab = "GPA", ylab = "Starting salary")
\(~\)
Problem 2: The following are the number of sales contacts made by 9 salespersons during a week and the number of sales made. Perform a correlation analysis using the Pearson correlation coefficient and interpret. (5 pts.)
Salesperon | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Sales contacts | 71 | 64 | 100 | 105 | 75 | 79 | 82 | 68 | 110 |
Sales | 25 | 14 | 37 | 40 | 18 | 10 | 22 | 12 | 42 |
\(~\)
Solution:
salescon <- c(71, 64, 100, 105, 75, 79, 82, 68, 110)
sales <- c(25, 14, 37, 40, 18, 10, 22, 12, 42)
cor.data2 <- data.frame(salescon, sales)
pander(cor.test(cor.data2$salescon, cor.data2$sales, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
5.554 | 7 | 0.0008559 * * * | two.sided | 0.9028 |
The results indicate a very strong positive correlation between the number of sales contacts during a week and the number of sales made (\(r = 0.9028\)). The hypothesis test further show that there is a significant linear relationship between the number of sales contacts made and the number of sales (\(p < 0.05\)).
\(~\)
Here is ascatterplot of the given data.
ggscatter(cor.data2, x="salescon", y="sales", add = "reg.line", cor.coef = TRUE,
cor.method = "pearson", xlab = "Sales Contacts", ylab = "sales")
\(~\)
Problem 3: the owner of a car wants to study the relationship between the age of a car and its selling price. Listed below is a random sample of 12 used cars at a dealership during the last year. Perform a correlation analysis using (a) Pearson correlation; (b) Spearman rank correlation. Interpret the results. (10 pts.)
Car | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Age (years) | 9 | 7 | 11 | 12 | 8 | 7 | 8 | 11 | 10 | 12 | 6 | 6 |
Selling Price (in $1000) | 8.1 | 6.0 | 3.6 | 4.0 | 5.0 | 10.0 | 7.6 | 8.0 | 8.0 | 6.0 | 8.6 | 8.0 |
\(~\)
Solution:
age <- c(9, 7, 11, 12, 8, 7, 8, 11, 10, 12, 6, 6)
price <- c(8.1, 6.0, 3.6, 4.0, 5.0, 10.0, 7.6, 8.0, 8.0, 6.0, 8.6, 8.0)
cor.data3 <- data.frame(age, price)
# Using Pearson correlation:
pander(cor.test(cor.data3$age, cor.data3$price, method = "pearson"))
Test statistic | df | P value | Alternative hypothesis | cor |
---|---|---|---|---|
-2.048 | 10 | 0.0677 | two.sided | -0.5436 |
# Using Spearman correlation:
pander(cor.test(cor.data3$age, cor.data3$price, method = "spearman"))
Test statistic | P value | Alternative hypothesis | rho |
---|---|---|---|
442.7 | 0.06507 | two.sided | -0.548 |
(a) The Pearson correlation analysis results show a moderately strong, negative correlation between the age of a car and the selling price. There is no significant linear relationship, however, between age and selling price of the car (\(p > 0.05\)).
(b) The Spearman rank correlation analysis also show a moderately strong, negative relationship between the age and selling price of a car. The test of significance of the correlation coefficient also indicate that there is no significant linear relationship between age and selling price of a car (\(p > 0.05\)).
\(~\)
The following is a scatterplot of the given data.
ggscatter(cor.data3, x="age", y="price", add = "reg.line", cor.coef = TRUE,
cor.method = "pearson", xlab = "Age (in years)",
ylab = "Selling Price (in $1000)")