Ans: b) daysDrive
daysDrive is the variable that is quantitative and discrete. Both car and color are not quantitative and while gasMonth is quantitative it is continuous.
Ans: a) mean = 3.3, median = 3.5
gpa <- c((1.9*3.3),(2.1*3.3),(2.5*6.6),(2.7*6.6),(2.9*19.8),(3.1*6.6),(3.4*18.48),(3.5*18.48),(3.7*27.72),(3.0*26.4))
sum(gpa)/132## [1] 3.293
Ans: d) Both a) and c)
Using a random selection for testing and looking at how new testing affects one group will both help to see if the treatment causes improvement in Ebola patients.
Ans: a) There’s a difference between average eye color and average hair color
Having a large chi square means that we will reject the null hypothesis that the there is no difference in the averages.
Ans: b) 17.8 and 69.0
\[IQR = 49.8 - 37 = 12.8\]
\[lower\quad limit = 37 - (1.5*12.8) = 17.8\]
\[upper\quad limit = 49.8 + (1.5*12.8) = 69\]
Ans: d) The median and IQR are resistant to outliers, whereas the mean and SD are not.
Distribution A is uni-modal and skewed to the right. It has a mean around 5 and the spread is small.
Distribution B is uni-modal with no skew. The spread is wide and the sample size is 30.
The means of the two distributions are similar because distribution B is a sample of A. The standard deviations are different because distribution B has wider spread with a smaller population that A.
The Central Limit Theorem
options(digits=2)
data1 <- data.frame(x=c(10,8,13,9,11,14,6,4,12,7,5),
y=c(8.04,6.95,7.58,8.81,8.33,9.96,7.24,4.26,10.84,4.82,5.68))
data2 <- data.frame(x=c(10,8,13,9,11,14,6,4,12,7,5),
y=c(9.14,8.14,8.74,8.77,9.26,8.1,6.13,3.1,9.13,7.26,4.74))
data3 <- data.frame(x=c(10,8,13,9,11,14,6,4,12,7,5),
y=c(7.46,6.77,12.74,7.11,7.81,8.84,6.08,5.39,8.15,6.42,5.73))
data4 <- data.frame(x=c(8,8,8,8,8,8,8,19,8,8,8),
y=c(6.58,5.76,7.71,8.84,8.47,7.04,5.25,12.5,5.56,7.91,6.89))mean1 <- data.frame(c(meanx= mean(data1$x), meany=mean(data1$y)))
mean1## c.meanx...mean.data1.x...meany...mean.data1.y..
## meanx 9.0
## meany 7.5
mean2 <- data.frame(c(meanx= mean(data2$x), meany=mean(data2$y)))
mean2## c.meanx...mean.data2.x...meany...mean.data2.y..
## meanx 9.0
## meany 7.5
mean3 <- data.frame(c(meanx= mean(data3$x), meany=mean(data3$y)))
mean3## c.meanx...mean.data3.x...meany...mean.data3.y..
## meanx 9.0
## meany 7.5
mean4 <- data.frame(c(meanx= mean(data4$x), meany=mean(data4$y)))
mean4## c.meanx...mean.data4.x...meany...mean.data4.y..
## meanx 9.0
## meany 7.5
median1 <- data.frame(c(medianx= median(data1$x), mediany=median(data1$y)))
median1## c.medianx...median.data1.x...mediany...median.data1.y..
## medianx 9.0
## mediany 7.6
median2 <- data.frame(c(medianx= median(data2$x), mediany=median(data2$y)))
median2## c.medianx...median.data2.x...mediany...median.data2.y..
## medianx 9.0
## mediany 8.1
median3 <- data.frame(c(medianx= median(data3$x), mediany=median(data3$y)))
median3## c.medianx...median.data3.x...mediany...median.data3.y..
## medianx 9.0
## mediany 7.1
median4 <- data.frame(c(medianx= mean(data4$x), mediany=median(data4$y)))
median4## c.medianx...mean.data4.x...mediany...median.data4.y..
## medianx 9
## mediany 7
sd1 <- data.frame(c(sdx= sd(data1$x), sdy=sd(data1$y)))
sd1## c.sdx...sd.data1.x...sdy...sd.data1.y..
## sdx 3.3
## sdy 2.0
sd2 <- data.frame(c(sdx= sd(data2$x), sdy=sd(data2$y)))
sd2## c.sdx...sd.data2.x...sdy...sd.data2.y..
## sdx 3.3
## sdy 2.0
sd3 <- data.frame(c(sdx= sd(data3$x), sdy=sd(data3$y)))
sd3## c.sdx...sd.data3.x...sdy...sd.data3.y..
## sdx 3.3
## sdy 2.0
sd4 <- data.frame(c(sdx= sd(data4$x), sdy=sd(data4$y)))
sd4## c.sdx...sd.data4.x...sdy...sd.data4.y..
## sdx 3.3
## sdy 2.0
cor(data1)## x y
## x 1.00 0.82
## y 0.82 1.00
cor(data2)## x y
## x 1.00 0.82
## y 0.82 1.00
cor(data3)## x y
## x 1.00 0.82
## y 0.82 1.00
cor(data4)## x y
## x 1.00 0.82
## y 0.82 1.00
eq1<- lm(data1$y ~ data1$x)
eq1##
## Call:
## lm(formula = data1$y ~ data1$x)
##
## Coefficients:
## (Intercept) data1$x
## 3.0 0.5
eq2<- lm(data2$y ~ data2$x)
eq2##
## Call:
## lm(formula = data2$y ~ data2$x)
##
## Coefficients:
## (Intercept) data2$x
## 3.0 0.5
eq3<- lm(data3$y ~ data3$x)
eq3##
## Call:
## lm(formula = data3$y ~ data3$x)
##
## Coefficients:
## (Intercept) data3$x
## 3.0 0.5
eq4<- lm(data4$y ~ data4$x)
eq4##
## Call:
## lm(formula = data4$y ~ data4$x)
##
## Coefficients:
## (Intercept) data4$x
## 3.0 0.5
\[Equation: y = .5x + 3\]
summary(eq1)$r.squared## [1] 0.67
summary(eq2)$r.squared## [1] 0.67
summary(eq3)$r.squared## [1] 0.67
summary(eq4)$r.squared## [1] 0.67
\[R^2 = 0.67\]
#Data 1
par(mfrow=c(2,2))
plot(data1)
plot(eq1$residuals)
hist(eq1$residuals)
qqnorm(eq1$residuals)
qqline(eq1$residuals) Although the data plot looks like there is linearity, Data 1 does not have residuals that follow a normal distribution.
#Data 2
par(mfrow=c(2,2))
plot(data2)
plot(eq2$residuals)
hist(eq2$residuals)
qqnorm(eq2$residuals)
qqline(eq2$residuals) Data 2 doesn’t have a plot that shows linearity or have residuals that follow a normal distribution.
#Data 3
par(mfrow=c(2,2))
plot(data3)
plot(eq3$residuals)
hist(eq3$residuals)
qqnorm(eq3$residuals)
qqline(eq3$residuals) There is an outlier in Data 3 but the plot seems to show linearity and the residuals look to follow a normal distribution.
#Data 4
par(mfrow=c(2,2))
plot(data4)
plot(eq4$residuals)
hist(eq4$residuals)
qqnorm(eq4$residuals)
qqline(eq4$residuals) Data 4’s plot has no linearity and the residuals don’t follow any normal distribution.
Visualizations give proof to any statements when analyzing data. They are able to help show trends and insights that cannot be seen by just looking at numbers. For example the outlier in Data 3 is hard to see by just looking at the numbers but when put in a plot it is clear. Visualizations help us easily see any problems like this.
plot(data3)