Exam 2

Part I: Directed reading activities

library(mosaic)
## Warning: package 'mosaic' was built under R version 3.2.5
## Warning: package 'dplyr' was built under R version 3.2.5
## Warning: package 'mosaicData' was built under R version 3.2.5

An educator conducted an experiment to test whether new directed reading activities in the classroom will help elementary school pupils improve some aspects of their reading ability.

She arranged for a third grade class of 21 students to follow these activities for an 8-week period. A control classroom of 23 third graders followed the same curriculum without the activities. At the end of the 8 weeks, all students took a Degree of Reading Power (DRP) test, which measures the aspects of reading ability that the treatment is designed to improve.

reading <- read.table("http://asta.math.aau.dk/dan/static/datasets?file=reading.dat", header=TRUE)
head(reading)
##   Treatment Response
## 1   Treated       24
## 2   Treated       43
## 3   Treated       58
## 4   Treated       71
## 5   Treated       43
## 6   Treated       49
  1. Use a boxplot to compare the of measurements of DRP for Treated(direct reading activities) and Control visually.
boxplot(Response ~ Treatment,  data = reading)

  1. Use favstats to make a numerical summary of the measurements for Treated and Control.
favstats(Response ~ Treatment, data = reading) 
##   Treatment min   Q1 median   Q3 max     mean       sd  n missing
## 1   Control  10 30.5     42 53.5  85 41.52174 17.14873 23       0
## 2   Treated  24 44.0     53 58.0  71 51.47619 11.00736 21       0
    1. Write down a point estimate of the mean of DRP for students following the new directed reading activities and explain how this is calculated.

The point estimate mean for treated student is 51.47619, and is the average of the sample.

    1. Write down a point estimate of the standard deviation of DRP for this group and explain how this is calculated.

The point estimate standard deviation is 11.00736 and it describes how much the data varies around the mean. We subtract the mean for each value and square the result, then we calculate the mean of the found squared results, and apply the square root to the new mean.

    1. Write down a 95% confidence interval for the mean of DRP for this group and explain how this is calculated.

To find confidence interval we will first need to find t-score.

t_score <- qdist("t", 1-0.025, df=21-1, plot=FALSE)
t_score
## [1] 2.085963

As we find the t-score we can now find the confidence intreval, by using the following formula mean plus/minus t_score times standard deviation devided by square root of sample size.

51.48 +- 2.09*11/sqrt21

We get (56.49, 46.47).

  1. Use the command t.test to compare the mean DRP of the two groups.
t.test(Response ~ Treatment, data = reading, conf.level = 0.95)
## 
##  Welch Two Sample t-test
## 
## data:  Response by Treatment
## t = -2.3109, df = 37.855, p-value = 0.02638
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.67588  -1.23302
## sample estimates:
## mean in group Control mean in group Treated 
##              41.52174              51.47619

Go through the details of the output from t.test. Your analysis must include an account of

    1. What the relevant null hypothesis and the corresponding alternative hypothesis is.

The relevant null hypothesis would assume that true mean difference is equal to zero, while the alternative hypothesis in this case two-tailed assumes that mean is not equal to zero.

    1. Choice and calculation of test statistic.

the t score = -2.3109. That means that we are a little more than two standard erros away from our h0 hypothesis. We are using a t-score, since we’re using a t distribution and not a normal one. as we can see the difference is 10, so our standard error is 10/2.3 = 4.35.

    1. Calculation of p-value and its interpretation in connection to a conclusion of the analysis.

The P-value measures how compatible your data are with the null hypothesis. High P-value=your data are likely with a true null Low P-value= your data are unlikely with a true null Since our p-value = 0.02638 and therefore is a low p-value it suggests that our sample provides enough evidence that we can reject the null hypothesis for the entire population.

    1. Calculation and interpretation of a relevant confidence interval.

Our 95% Confidence interval = -18.67588 -1.23302

Part II: Determining sample size

In this part there is no dataset to load into R and analyze. You should just use R as a calculator when you apply the relevant formulas (which are towards the end of the lecture notes for module 1).

To estimate the proportion of danish companies with less than 10 employees determine the necessary sample size for the estimate to be accurate to within 0.06 with probability 0.90. Based on results from a previous study in 2013, we expect the proportion to be about 0.70.

We set in 0.05 because the confidence level should be 0.90. we find the z value

M = 0.06 #marginal error
z = -qnorm(0.05) # z value
z
## [1] 1.644854
n = (z/(2*M))^2 
n
## [1] 187.885

this tells us that we need a sample size of 188 employes for the estimate to be accurate within a 0.06 with a porbability of 0.90.