#install.packages("dplyr")
#install.packages("Rtools")
#install.packages("tidyverse")
#install.packages("infer")
#install.packages("EnvStats")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(infer)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.2 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(infer)
library(EnvStats)
##
## Attaching package: 'EnvStats'
##
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
##
## The following object is masked from 'package:base':
##
## print.default
### Computational Problem 1: Find the 90% CI for the 16 observations. The 90% CI is
### 16.76032 57.69739, as shown in the output below. 75 does not fall w/in the CI; therefore,
### we reject the null hypothesis.
comp_problem_1 <-read.csv("comp_problem_1.csv")
varTest(comp_problem_1$Data,alternative = "two.sided", conf.level = 0.9,sigma.squared = 75)
## $statistic
## Chi-Squared
## 5.585833
##
## $parameters
## df
## 15
##
## $p.value
## [1] 0.0282057
##
## $estimate
## variance
## 27.92917
##
## $null.value
## variance
## 75
##
## $alternative
## [1] "two.sided"
##
## $method
## [1] "Chi-Squared Test on Variance"
##
## $data.name
## [1] "comp_problem_1$Data"
##
## $conf.int
## LCL UCL
## 16.76032 57.69739
## attr(,"conf.level")
## [1] 0.9
##
## attr(,"class")
## [1] "htestEnvStats"
### Problem 3. The steps. The formula is found at the bottom of page 344. 1st
### subtract the correlation coefficient, .6, from 1 to get .4. Divide .4 by the df, which is 32-2 or 30. Then multiply that answer by 2*the product of the standard deviations of the 2 variances.
### Those numbers are 7 and 5; 2 * 35 = 70. Divide 70 into 24 to get 2.9696. That number
### is compared to the value on Table A2. This is a 2-tailed test w/ 30 df, so the value
### against which we compare the test statistic is 2.042272. Because the test statistic is
### greater than the value on Table A2, we reject the null hypothesis that the variances
### are equal. Computations below.
49 - 25
## [1] 24
sqrt(49)
## [1] 7
sqrt(25)
## [1] 5
2*7*5
## [1] 70
1-.6
## [1] 0.4
sqrt(.4/30)
## [1] 0.1154701
0.1154701*70
## [1] 8.082907
24/8.082907
## [1] 2.969229
### Problem 5. This is the independent sample situation. n = 21; the df = 20. The sample
### mean is 50 with a sample variance of 10. We're testing at .05 significance level;
### the hypothesized variance is 25. We multiply the degrees of freedom by the given
### variance, then divide that result by the hypothesized variance. On table A3, the values
### we need are 34.169 and 9.59. The chi square statistic, calculated below, is 8. We
### therefore reject the null hypothesis that the variance is equal to 25.
(20*10)/25
## [1] 8
### Problem 7. This is the dependent sample problem, the same problem we had in Problem 3.
### I will not repeat the steps here; they are the same as they were in Problem 3. The book
### says t = -2.6178. I don't get that. I get -3.512132. I subtract 64 from 36 and get
### -28. The 2 square roots are 6 & 8. The denominator of the equation is 96* the
### square root of .2/29. (The df is 31 - 2.) That gives me a t value of -3.512132, as
### shown below. Thus I would reject the null hypothesis.
.2/29
## [1] 0.006896552
sqrt(.006896552)
## [1] 0.08304548
.08304548*96
## [1] 7.972366
-28/7.972366
## [1] -3.512132
### Problem 9. This is the same problem as problems 3 and 7. Dependent samples. Note the
### does not give the sample size so the df cannot be calculated. It is necessary to use
### 60 as given in the answer. Also note the r squared value in the problem is .1, but
### the answer says the numerator under the radical is 1 - .64.
### I will use .1 to do the calculations. The numerator for the final calculation
### is .503 - .427 or .076. The denominator is 2 times the square roots of the variances.
### Where the numbers 10 and 13 came from I do not know. My answer, as shown below, is
### 0.6694838. Comparing this against Table A2 for 60 df and a p-value of .025 (two-tailed)
### test, that number is 2.000298. My value is lower than that; reject the null hypothesis.
.503 - .427
## [1] 0.076
sqrt(.503)
## [1] 0.7092249
sqrt(.427)
## [1] 0.6534524
(0.7092249*0.6534524)
## [1] 0.4634447
0.4634447*2
## [1] 0.9268894
.9/60
## [1] 0.015
sqrt(.015)
## [1] 0.1224745
0.9268894*0.1224745
## [1] 0.1135203
.076/0.1135203
## [1] 0.6694838
### Interpretive Problem. The first issue I had was splitting the gender column
### into 2 columns so I could run the test as it was set up in the book. That took
### a bit of work, and I couldn't do it in R. I did manage to split the income column
### by gender. I have no idea what value to put in the sigma.squared command. The book
### used 50. I used 100. I ran it again w/ 10,000 and got the same result.
msd_labs <- read.csv("msd_labs.csv")
msd_lab_income <- read.csv("msd_lab_income.csv")
var.test (msd_lab_income$male_y, msd_lab_income$female_y,
paired = FALSE,
alternative = "two.sided",
sigma.squared = 100)
##
## F test to compare two variances
##
## data: msd_lab_income$male_y and msd_lab_income$female_y
## F = 0.81901, num df = 708, denom df = 1555, p-value = 0.002178
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7233138 0.9301452
## sample estimates:
## ratio of variances
## 0.8190068
var.test (msd_lab_income$male_y, msd_lab_income$female_y,
paired = FALSE,
alternative = "two.sided",
sigma.squared = 10000)
##
## F test to compare two variances
##
## data: msd_lab_income$male_y and msd_lab_income$female_y
## F = 0.81901, num df = 708, denom df = 1555, p-value = 0.002178
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7233138 0.9301452
## sample estimates:
## ratio of variances
## 0.8190068