Assignment #5: Inference of Numnerical Data

Import Libraries

# Good Practise: Basic house keeping: cleanup the env before you start new work
rm(list=ls())

# Libraries 
library(DATA606)
## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.
## 
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
## 
##     demo
library(StMoSim)
## Loading required package: RcppParallel
## Loading required package: Rcpp
## 
## Attaching package: 'Rcpp'
## The following object is masked from 'package:RcppParallel':
## 
##     LdFlags
library(tidyverse)
## -- Attaching packages --------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.1     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.2.0
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
  1. Exercise: 5.6 Working Backwards, Part II.

Exercise: 5.6

The Margin of error

\[M = 6 \]

The Mean

\[\mu = 65 + 6 \] \[\mu = 71 \]

The Standard Deviation

\[ \sigma \] can be calculated from the margin of error; critical value of 1.64 due to it being a 90% confidence interval

\[\sigma = 18.2927 \]

  1. Exercise: 5.14 SAT Scores

Exercise: 5.14

a.

\[n= [1.64 (250/25)]^2\] \[n= 268.96 = 269\]

b.

Luke’s sample will need to be larger because from the formula we can see that sample size needed is proportional to the square of the test statistic

C.

With 99% confidence intervals

\[n= [2.576 (250/25)]^2\] \[n= 663.58 = 664\]

  1. Exercise: 5.20 High School and Beyond, Part I

a.

No, the histogram of the difference of the reading minus writing score is centered on zero and is symmetrically distributed. Even though the median values of each score are slightly different, their boxplots overlap

b.

Yes, 200 is pretty low for all high schoold kids needed to establish independence

c.

Null Hypothesis is there is no difference in the reading and writing scores of the population.

Alternate Hypothesis is there is difference in the reading and writing scores of the population.

d.

The data are randomly selected and independent. Furthermore the distribution is nearly normal: unimodal, symmetric, and bell-shaped.

e.

\[SE = \sigma/sqrt(n) = 0.6284058\] t = 0.867274 This is not convincing evidence of a difference.

f.

We could have possibly made a type II error, since we did not reject the null hypothesis.

g.

0 falls within less than 1 SE of the observed mean, so it would fall within any reasonable confidence interval, 90% for example.

  1. Exercise: 5.32 Fuel efficiency of manual and automatic cars, Part I.

We use a two tailed test since the experiment is just for a difference and the critical value is 2.0555 at 5% significance. The t score is higher than this so we reject the null hypothesis that the means are equal.

SE <- sqrt(3.58^2/26 + 4.51^2/26)
t <- (19.85 - 16.12)/SE
t
## [1] 3.30302
  1. Exercise: 5.48 Work hours and education

a.

Null hypothesis: there is no difference in mean hours worked among the groups

Alternative hypothesis: There is a difference in mean hours worked among the groups

b.

There is independance in observations within and between the groups. The groups have approximately the same spread and are normally distributed within the groups

c.

Caption for the picture.

Caption for the picture.

d.

.05 seems typical for F tests but we did not choose level of significance, so we can not reject the null hypothesis at that level