Load necessary libraries -
library(ggplot2)
library('DATA606')
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
A 90% confidence interval for a population mean is (65,77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.
Ans:
Sample Mean:
Since we know that the sample mean is - \[\frac { \left( { x }_{ 1 }+{ x }_{ 2 } \right) }{ 2 }\] where the confidence interval is (\(x1\),\(x2\))
n <- 25
x1 <- 65
x2 <- 77
SMean <- (x2 + x1) / 2
cat("The sample mean is ",SMean)
## The sample mean is 71
Marging of Error:
ince we know that the margin of error is - \[\frac { \left( { x }_{ 1 }-{ x }_{ 2 } \right) }{ 2 }\] where the confidence interval is (\(x1\),\(x2\))
n <- 25
x1 <- 65
x2 <- 77
ME <- (x2 - x1) / 2
cat("The margin of error is ",ME)
## The margin of error is 6
Sample standard deviation:
To calculate the sample standard deviation we use \(ME=t??????SE\) by using the qt() function and \(df = 25 - 1\).
df <- 25 - 1
p <- 0.9
p_2tails <- p + (1 - p)/2
t_val <- qt(p_2tails, df)
# Since ME = t * SE
SE <- ME / t_val
# Since SE = sd/sqrt(n)
sd <- SE * sqrt(n)
cat("The standard deviation is ",sd)
## The standard deviation is 17.53481
The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
(a)Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.
Ans: The hypotheses for this ANOVA test follow:
\({ H }_{ 0 }\): The difference of ALL averages is equal. That is: \({ \mu }_{ l }={ \mu }_{ h }={ \mu }_{ j }={ \mu }_{ b }={ \mu }_{ g }\)
\({ H }_{ A }\): There is one average that is NOT equal to the other ones.
(b)Check conditions and describe any assumptions you must make to proceed with the test.
Ans:
The observations are independent within and across groups: I will assume independence within and across the groups based on the nature of the provided data.
The data within each group are nearly normal: The box plots do not support nearly normal data within each group. Each group has outliers some groups seem to follow a normal distribution.
The variability across the groups is about equal: There seems to be a similarity of variability in between some of the groups just by observing the standard deviations.
(c)Below is part of the output associated with this test. Fill in the empty cells.
Ans:
mu <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
data_table <- data.frame (mu, sd, n)
n <- sum(data_table$n)
k <- length(data_table$mu)
# Finding degrees of freedom
df <- k - 1
dfResidual <- n - k
dfTotal <- df + dfResidual
# Using the qf function on the Pr(>F) to get the F-statistic:
Prf <- 0.0682
F_statistic <- qf( 1 - Prf, df , dfResidual)
# F-statistic = MSG/MSE
MSG <- 501.54
MSE <- MSG / F_statistic
# MSG = 1 / df * SSG
SSG <- df * MSG
SSE <- 267382
SST <- SSG + SSE
cat("Degree of Freedom : ", df)
## Degree of Freedom : 4
cat("Degree of Freedom Residuals : ", dfResidual)
## Degree of Freedom Residuals : 1167
cat("Degree of Freedom Total: ", dfTotal)
## Degree of Freedom Total: 1171
cat("Sum Square degree : ", SSG)
## Sum Square degree : 2006.16
cat("Sum Square Total : ", SST)
## Sum Square Total : 269388.2
cat("Mean Square Residuals : ", MSE)
## Mean Square Residuals : 229.1255
cat("F Value :", F_statistic)
## F Value : 2.188931
(d)What is the conclusion of the test?
Ans: Since the p-value = 0.0682 is greater than 0.05, We conclude that there is not a significant difference between the groups and the null hypothesis does not get rejected.