PSYM570_HW_7July2023

#install.packages("dplyr")
#install.packages("Rtools")
#install.packages("tidyverse")
#install.packages("infer")
#install.packages("EnvStats")

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(infer)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.2     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(infer)
library(EnvStats)

## 
## Attaching package: 'EnvStats'
## 
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
## 
## The following object is masked from 'package:base':
## 
##     print.default

### Computational Problem 1: Find the 90% CI for the 16 observations. The 90% CI is 
### 16.76032 57.69739, as shown in the output below. 75 does not fall w/in the CI; therefore,
### we reject the null hypothesis.

comp_problem_1 <-read.csv("comp_problem_1.csv")

varTest(comp_problem_1$Data,alternative = "two.sided", conf.level = 0.9,sigma.squared = 75)

## $statistic
## Chi-Squared 
##    5.585833 
## 
## $parameters
## df 
## 15 
## 
## $p.value
## [1] 0.0282057
## 
## $estimate
## variance 
## 27.92917 
## 
## $null.value
## variance 
##       75 
## 
## $alternative
## [1] "two.sided"
## 
## $method
## [1] "Chi-Squared Test on Variance"
## 
## $data.name
## [1] "comp_problem_1$Data"
## 
## $conf.int
##      LCL      UCL 
## 16.76032 57.69739 
## attr(,"conf.level")
## [1] 0.9
## 
## attr(,"class")
## [1] "htestEnvStats"

### Problem 3.  The steps.  The formula is found at the bottom of page 344.  1st
### subtract the correlation coefficient, .6, from 1 to get .4.  Divide .4 by the df, which is 32-2 or 30.  Then multiply that answer by 2*the product of the standard deviations of the 2 variances.
### Those numbers are 7 and 5; 2 * 35 = 70.  Divide 70 into 24 to get 2.9696. That number
### is compared to the value on Table A2. This is a 2-tailed test w/ 30 df, so the value
### against which we compare the test statistic is 2.042272. Because the test statistic is 
### greater than the value on Table A2, we reject the null hypothesis that the variances
### are equal. Computations below.

49 - 25

## [1] 24

sqrt(49)

## [1] 7

sqrt(25)

## [1] 5

2*7*5

## [1] 70

1-.6

## [1] 0.4

sqrt(.4/30)

## [1] 0.1154701

0.1154701*70

## [1] 8.082907

24/8.082907

## [1] 2.969229

### Problem 5. This is the independent sample situation. n = 21; the df = 20. The sample
### mean is 50 with a sample variance of 10. We're testing at .05 significance level;
### the hypothesized variance is 25. We multiply the degrees of freedom by the given 
### variance, then divide that result by the hypothesized variance. On table A3, the values
### we need are 34.169 and 9.59. The chi square statistic, calculated below, is 8. We 
### therefore reject the null hypothesis that the variance is equal to 25.

(20*10)/25

## [1] 8

### Problem 7. This is the dependent sample problem, the same problem we had in Problem 3. 
### I will not repeat the steps here; they are the same as they were in Problem 3. The book
### says t = -2.6178.  I don't get that. I get -3.512132.  I subtract 64 from 36 and get 
### -28.  The 2 square roots are 6 & 8.  The denominator of the equation is 96* the 
### square root of .2/29. (The df is 31 - 2.) That gives me a t value of -3.512132, as 
### shown below. Thus I would reject the null hypothesis.

.2/29

## [1] 0.006896552

sqrt(.006896552)

## [1] 0.08304548

.08304548*96

## [1] 7.972366

-28/7.972366

## [1] -3.512132

### Problem 9. This is the same problem as problems 3 and 7. Dependent samples. Note the 
### does not give the sample size so the df cannot be calculated. It is necessary to use
### 60 as given in the answer. Also note the r squared value in the problem is .1, but
### the answer says the numerator under the radical is 1 - .64.
###  I will use .1 to do the calculations. The numerator for the final calculation
### is .503 - .427 or .076.  The denominator is 2 times the square roots of the variances.
### Where the numbers 10 and 13 came from I do not know. My answer, as shown below, is 
### 0.6694838. Comparing this against Table A2 for 60 df and a p-value of .025 (two-tailed)
### test, that number is 2.000298. My value is lower than that; reject the null hypothesis.

.503 - .427

## [1] 0.076

sqrt(.503)

## [1] 0.7092249

sqrt(.427)

## [1] 0.6534524

(0.7092249*0.6534524)

## [1] 0.4634447

0.4634447*2

## [1] 0.9268894

.9/60

## [1] 0.015

sqrt(.015)

## [1] 0.1224745

0.9268894*0.1224745

## [1] 0.1135203

.076/0.1135203

## [1] 0.6694838

### Interpretive Problem. The first issue I had was splitting the gender column
### into 2 columns so I could run the test as it was set up in the book. That took
### a bit of work, and I couldn't do it in R. I did manage to split the income column  
### by gender. I have no idea what value to put in the sigma.squared command. The book
### used 50. I used 100. I ran it again w/ 10,000 and got the same result.

msd_labs <- read.csv("msd_labs.csv")

msd_lab_income <- read.csv("msd_lab_income.csv")

var.test (msd_lab_income$male_y,  msd_lab_income$female_y,
          paired = FALSE,
          alternative = "two.sided",
          sigma.squared = 100)

## 
##  F test to compare two variances
## 
## data:  msd_lab_income$male_y and msd_lab_income$female_y
## F = 0.81901, num df = 708, denom df = 1555, p-value = 0.002178
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7233138 0.9301452
## sample estimates:
## ratio of variances 
##          0.8190068

var.test (msd_lab_income$male_y,  msd_lab_income$female_y,
          paired = FALSE,
          alternative = "two.sided",
          sigma.squared = 10000)

## 
##  F test to compare two variances
## 
## data:  msd_lab_income$male_y and msd_lab_income$female_y
## F = 0.81901, num df = 708, denom df = 1555, p-value = 0.002178
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7233138 0.9301452
## sample estimates:
## ratio of variances 
##          0.8190068

PSYM570_HW_7July2023

Jerome

2023-07-07