A histogram of the GPA of 132 students from this course in Fall 2012 class is presented below. Which estimates of the mean and median are most plausible?
#### Answer -> a. mean=3.3, median=3.5
Above distribution is left skewed, then the mean is smaller than the median. The middle value could be around 66 ( 132+1)/2. When considering the density of the above histogram this value could fall close to 3.5 GPA which is the median of the distribution. 3.8 median is not possible and it is bit higher.
Above study is based on two categorical variables done in a large population. It is possible to create below probability calculation table to find values for these two categorical variables.
## brunette blond red
## blue | n1 | n2 | n3
## green | n4 | n5 | n6
## brown | n7 | n8 | n9
I can setting up the hypothesis test as follows:
\(H_0:\) Exists association between natural hair color and eye color.
\(H_A:\) No association between natural hair color and eye color.
Since we have a large chi-square values it would suggest strong evidence favoring the alternative hypothesis. There for the answer would be -> a.there is a difference between average eye color and average hair color.
\(IQR = Q_3 - Q_1 => IQR=49.8 - 37 => 12.8\)
\(Lower_outlier => Q_1 - 1.5*IQR => 37-1.5*12.8 =>17.8\)
\(Upper_outlier => Q_3 + 1.5*IQR => 49.8+1.5*12.8 => 69\)
\(\mu = 5.05\) , \(\sigma = 3.22\) , \(n = 500\)
\(\mu\bar{x} = 5.04\) , \(\sigma\bar{x} = \frac{\sigma}{\sqrt(n)} = 0.58\) , \(n = 30\)
options(digits=2)
data1 <- data.frame(x=c(10,8,13,9,11,14,6,4,12,7,5),
y=c(8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68))
data2 <- data.frame(x=c(10,8,13,9,11,14,6,4,12,7,5),
y=c(9.14,8.14,8.74,8.77,9.26,8.1,6.13,3.1,9.13,7.26,4.74))
data3 <- data.frame(x=c(10,8,13,9,11,14,6,4,12,7,5),
y=c(7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73))
data4 <- data.frame(x=c(8,8,8,8,8,8,8,19,8,8,8),
y=c(6.58,5.76,7.71,8.84,8.47,7.04,5.25,12.5,5.56,7.91,6.89))
d1mean.x <- round(mean(data1$x),2)
d1mean.y <- round(mean(data1$y),2)
d2mean.x <- round(mean(data2$x),2)
d2mean.y <- round(mean(data2$y),2)
d3mean.x<- round(mean(data3$x),2)
d3mean.y <- round(mean(data3$y),2)
d4mean.x <- round(mean(data4$x),2)
d4mean.y <- round(mean(data4$y),2)
Data | Mean x | Mean y |
---|---|---|
data1 | 9 | 7.5 |
data2 | 9 | 7.5 |
data3 | 9 | 7.5 |
data4 | 9 | 7.5 |
———————————– |
d1med.x <- round(median(data1$x),2)
d1med.y <- round(median(data1$y),2)
d2med.x <- round(median(data2$x),2)
d2med.y <- round(median(data2$y),2)
d3med.x<- round(median(data3$x),2)
d3med.y <- round(median(data3$y),2)
d4med.x <- round(median(data4$x),2)
d4med.y <- round(median(data4$y),2)
Data | Median x | Median y |
---|---|---|
data1 | 9 | 7.58 |
data2 | 9 | 8.14 |
data3 | 9 | 7.11 |
data4 | 8 | 7.04 |
————————————— |
d1sd.x <- round(sd(data1$x),2)
d1sd.y <- round(sd(data1$y),2)
d2sd.x <- round(sd(data2$x),2)
d2sd.y <- round(sd(data2$y),2)
d3sd.x <- round(sd(data3$x),2)
d3sd.y <- round(sd(data3$y),2)
d4sd.x <- round(sd(data4$x),2)
d4sd.y <- round(sd(data4$y),2)
Data | SD x | SD y |
---|---|---|
data1 | 3.32 | 2.03 |
data2 | 3.32 | 2.03 |
data3 | 3.32 | 2.03 |
data4 | 3.32 | 2.03 |
summary(data1)
## x y
## Min. : 4.0 Min. : 4.3
## 1st Qu.: 6.5 1st Qu.: 6.3
## Median : 9.0 Median : 7.6
## Mean : 9.0 Mean : 7.5
## 3rd Qu.:11.5 3rd Qu.: 8.6
## Max. :14.0 Max. :10.8
summary(data2)
## x y
## Min. : 4.0 Min. :3.1
## 1st Qu.: 6.5 1st Qu.:6.7
## Median : 9.0 Median :8.1
## Mean : 9.0 Mean :7.5
## 3rd Qu.:11.5 3rd Qu.:8.9
## Max. :14.0 Max. :9.3
summary(data3)
## x y
## Min. : 4.0 Min. : 5.4
## 1st Qu.: 6.5 1st Qu.: 6.2
## Median : 9.0 Median : 7.1
## Mean : 9.0 Mean : 7.5
## 3rd Qu.:11.5 3rd Qu.: 8.0
## Max. :14.0 Max. :12.7
summary(data4)
## x y
## Min. : 8 Min. : 5.2
## 1st Qu.: 8 1st Qu.: 6.2
## Median : 8 Median : 7.0
## Mean : 9 Mean : 7.5
## 3rd Qu.: 8 3rd Qu.: 8.2
## Max. :19 Max. :12.5
par(mfrow=c(2,2))
plot(data1,main = "data1")
plot(data2,main = "data2")
plot(data3,main = "data3")
plot(data4,main = "data4")
#plot(data1)
round(cor(data1),2)
## x y
## x 1.00 0.82
## y 0.82 1.00
#plot(data2)
round(cor(data2),2)
## x y
## x 1.00 0.82
## y 0.82 1.00
#plot(data3)
round(cor(data3),2)
## x y
## x 1.00 0.82
## y 0.82 1.00
#plot(data4)
round(cor(data4),2)
## x y
## x 1.00 0.82
## y 0.82 1.00
lm1<-lm(y~x,data=data1)
summary(lm1)
##
## Call:
## lm(formula = y ~ x, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9213 -0.4558 -0.0414 0.7094 1.8388
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.000 1.125 2.67 0.0257 *
## x 0.500 0.118 4.24 0.0022 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.2 on 9 degrees of freedom
## Multiple R-squared: 0.667, Adjusted R-squared: 0.629
## F-statistic: 18 on 1 and 9 DF, p-value: 0.00217
par(mfrow=c(2,2))
plot(lm1)
lm2<-lm(y~x,data=data2)
summary(lm2)
##
## Call:
## lm(formula = y ~ x, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.901 -0.761 0.129 0.949 1.269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.001 1.125 2.67 0.0258 *
## x 0.500 0.118 4.24 0.0022 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.2 on 9 degrees of freedom
## Multiple R-squared: 0.666, Adjusted R-squared: 0.629
## F-statistic: 18 on 1 and 9 DF, p-value: 0.00218
par(mfrow=c(2,2))
plot(lm2)
lm3<-lm(y~x,data=data3)
summary(lm3)
##
## Call:
## lm(formula = y ~ x, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.159 -0.615 -0.230 0.154 3.241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.002 1.124 2.67 0.0256 *
## x 0.500 0.118 4.24 0.0022 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.2 on 9 degrees of freedom
## Multiple R-squared: 0.666, Adjusted R-squared: 0.629
## F-statistic: 18 on 1 and 9 DF, p-value: 0.00218
par(mfrow=c(2,2))
plot(lm3)
lm4<-lm(y~x,data=data4)
summary(lm4)
##
## Call:
## lm(formula = y ~ x, data = data4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.751 -0.831 0.000 0.809 1.839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.002 1.124 2.67 0.0256 *
## x 0.500 0.118 4.24 0.0022 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.2 on 9 degrees of freedom
## Multiple R-squared: 0.667, Adjusted R-squared: 0.63
## F-statistic: 18 on 1 and 9 DF, p-value: 0.00216
par(mfrow=c(2,2))
plot(lm4)
## Warning: not plotting observations with leverage one:
## 8
## Warning: not plotting observations with leverage one:
## 8
\(Y1 = 0.5*X + 3\)
\(Y2 = 0.5*X + 3\)
\(Y3 = 0.5*X + 3\)
\(Y4 = 0.5*X + 3\)
par(mfrow=c(2,2))
plot(data1$y ~ data1$x )
abline(lm1)
plot(data2$y ~ data2$x)
abline(lm2)
plot(data3$y ~ data3$x)
abline(lm3)
plot(data4$y ~ data4$x)
abline(lm4)
round(summary(lm1)$r.squared,2)
## [1] 0.67
round(summary(lm2)$r.squared,2)
## [1] 0.67
round(summary(lm3)$r.squared,2)
## [1] 0.67
round(summary(lm4)$r.squared,2)
## [1] 0.67
par(mfrow=c(2,2))
plot(data1, main = "Data1")
hist(lm1$residuals)
qqnorm(lm1$residuals)
qqline(lm1$residuals)
par(mfrow=c(2,2))
plot(data2, main = "Data2")
hist(lm2$residuals)
qqnorm(lm2$residuals)
qqline(lm2$residuals)
par(mfrow=c(2,2))
plot(data3, main = "Data3")
hist(lm1$residuals)
qqnorm(lm3$residuals)
qqline(lm3$residuals)
par(mfrow=c(2,2))
plot(data4, main = "Data4")
hist(lm4$residuals)
qqnorm(lm4$residuals)
qqline(lm4$residuals)