library(mosaic)
## Warning: package 'mosaic' was built under R version 3.2.5
## Warning: package 'dplyr' was built under R version 3.2.5
## Warning: package 'mosaicData' was built under R version 3.2.5
An educator conducted an experiment to test whether new directed reading activities in the classroom will help elementary school pupils improve some aspects of their reading ability.
She arranged for a third grade class of 21 students to follow these activities for an 8-week period. A control classroom of 23 third graders followed the same curriculum without the activities. At the end of the 8 weeks, all students took a Degree of Reading Power (DRP) test, which measures the aspects of reading ability that the treatment is designed to improve.
reading <- read.table("http://asta.math.aau.dk/dan/static/datasets?file=reading.dat", header=TRUE)
head(reading)
## Treatment Response
## 1 Treated 24
## 2 Treated 43
## 3 Treated 58
## 4 Treated 71
## 5 Treated 43
## 6 Treated 49
DRP for Treated(direct reading activities) and Control visually.boxplot(Response ~ Treatment, data = reading)
favstats to make a numerical summary of the measurements for Treated and Control.favstats(Response ~ Treatment, data = reading)
## Treatment min Q1 median Q3 max mean sd n missing
## 1 Control 10 30.5 42 53.5 85 41.52174 17.14873 23 0
## 2 Treated 24 44.0 53 58.0 71 51.47619 11.00736 21 0
The point estimate mean for treated student is 51.47619, and is the average of the sample.
The point estimate standard deviation is 11.00736 and it describes how much the data varies around the mean. We subtract the mean for each value and square the result, then we calculate the mean of the found squared results, and apply the square root to the new mean.
To find confidence interval we will first need to find t-score.
t_score <- qdist("t", 1-0.025, df=21-1, plot=FALSE)
t_score
## [1] 2.085963
As we find the t-score we can now find the confidence intreval, by using the following formula mean plus/minus t_score times standard deviation devided by square root of sample size.
51.48 +- 2.09*11/sqrt21
We get (56.49, 46.47).
t.test to compare the mean DRP of the two groups.t.test(Response ~ Treatment, data = reading, conf.level = 0.95)
##
## Welch Two Sample t-test
##
## data: Response by Treatment
## t = -2.3109, df = 37.855, p-value = 0.02638
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.67588 -1.23302
## sample estimates:
## mean in group Control mean in group Treated
## 41.52174 51.47619
Go through the details of the output from t.test. Your analysis must include an account of
The relevant null hypothesis would assume that true mean difference is equal to zero, while the alternative hypothesis in this case two-tailed assumes that mean is not equal to zero.
the t score = -2.3109. That means that we are a little more than two standard erros away from our h0 hypothesis. We are using a t-score, since we’re using a t distribution and not a normal one. as we can see the difference is 10, so our standard error is 10/2.3 = 4.35.
The P-value measures how compatible your data are with the null hypothesis. High P-value=your data are likely with a true null Low P-value= your data are unlikely with a true null Since our p-value = 0.02638 and therefore is a low p-value it suggests that our sample provides enough evidence that we can reject the null hypothesis for the entire population.
Our 95% Confidence interval = -18.67588 -1.23302
In this part there is no dataset to load into R and analyze. You should just use R as a calculator when you apply the relevant formulas (which are towards the end of the lecture notes for module 1).
To estimate the proportion of danish companies with less than 10 employees determine the necessary sample size for the estimate to be accurate to within 0.06 with probability 0.90. Based on results from a previous study in 2013, we expect the proportion to be about 0.70.
We set in 0.05 because the confidence level should be 0.90. we find the z value
M = 0.06 #marginal error
z = -qnorm(0.05) # z value
z
## [1] 1.644854
n = (z/(2*M))^2
n
## [1] 187.885
this tells us that we need a sample size of 188 employes for the estimate to be accurate within a 0.06 with a porbability of 0.90.