Stats

library(readr)
library(fastR2)
StatsHelp <- read_csv("C:/Users/jlwol/Downloads/StatsHelp.csv")
View(StatsHelp)

2.) Compare data set values for experience and arrest productivity to Gotham City: 10 years and 58 arrests

First, set the value for alpha to be 0.05 to test our p-value against.
\(H_o\): \(\mu = 10\)

\(H_a\): \(\mu \neq 10\)
Z-score must be computed: \[Z = \frac{8.44898-10.00}{6.970358/\sqrt{49}} = -1.557616\]
The test is two side, both sides of the distribution must be accounted for the p-value.
P-value:

2*pnorm(-1.557616)

## [1] 0.1193243

\(H_o\): \(\mu = 58\)

\(H_a\): \(\mu \neq 58\)
Z-score must be computed: \[Z = \frac{55.91837-58}{28.43621/\sqrt{49}} = -0.5124245\]
The test is two side, both sides of the distribution must be accounted for the p-value.
P-value:

2*pnorm(-0.5124245)

## [1] 0.6083539

Since the P-value for both is greater then alpha we fail to reject the null and conclude that the data are representative of the averages.

3.) Is gender related to cynicism either before or after the program?

Male <- StatsHelp$Cynicism[StatsHelp$Gender == 0]
Female <- StatsHelp$Cynicism[StatsHelp$Gender == 1]

A difference in means will be used to determine whether or not Gender has any relation to the ranking of cynicism. To explore this the difference between the two genders can be compared.

First, set the value for alpha to be 0.05 to test our p-value against.
\(H_o\): \(\mu_{c} = \mu_{b}\)

\(H_a\): \(\mu_{c} \neq \mu_{b}\)
Two Tailed T-Test must be performed for before the program:
\(\bar x_{diff} = 0.483871\); \(\bar{s.d}_{1} = 1.2348\);\(\bar{s.d}_{2} = 1.084652\)
\(S.E = \sqrt{\frac{1.2348^{2}}{31} + \frac{1.084652^{2}}{18}} = 0.3384439\)

Compute test statistic.
\[t = \frac{0.483871 - 0}{S.E} = 1.429693\]

The test is two tailed, both sides of the distribution must be accounted for the p-value.
P-value:

pt(-1.429693, 17) + (1 - pt(1.429693, 17))

## [1] 0.170926

Male1 <- StatsHelp$PPC[StatsHelp$Gender == 0]
Female1 <- StatsHelp$PPC[StatsHelp$Gender == 1]

First, set the value for alpha to be 0.05 to test our p-value against.
\(H_o\): \(\mu_{c} = \mu_{b}\)

\(H_a\): \(\mu_{c} \neq \mu_{b}\)
Two Tailed T-Test must be performed for after the program:
\(\bar x_{diff} = -0.01075269\); \(\bar{s.d}_{1} = 1.248655\);\(\bar{s.d}_{2} = 0.9074852\)
\(S.E = \sqrt{\frac{1.248655^{2}}{31} + \frac{0.9074852^{2}}{18}} = 0.3099136\)

Compute test statistic.
\[t = \frac{-0.01075269 - 0}{S.E} = -0.03469577\]

The test is two tailed, both sides of the distribution must be accounted for the p-value.
P-value:

pt(-0.03469577, 17) + (1 - pt(0.03469577, 17))

## [1] 0.9727265

Both P-values are greater than alpha, we fail to reject the null hypothesis thus concluding that there is not statistically significant evidence showing a relationship between gender and cynicism before/after the program.

4.) Is gender or intial cynicism related to arrest productivity?

Male2 <- StatsHelp$Arrest1[StatsHelp$Gender == 0]
Female2 <- StatsHelp$Arrest1[StatsHelp$Gender == 1]

First, set the value for alpha to be 0.05 to test our p-value against.
\(H_o\): \(\mu_{c} = \mu_{b}\)

\(H_a\): \(\mu_{c} \neq \mu_{b}\)
Two Tailed T-Test must be performed for before the program:
\(\bar x_{diff} = 8.38889\); \(\bar{s.d}_{1} = 27.94161\);\(\bar{s.d}_{2} = 29.29593\)
\(S.E = \sqrt{\frac{27.94161^{2}}{31} + \frac{29.29593^{2}}{18}} = 8.536134\)

Compute test statistic.
\[t = \frac{8.38889 - 0}{S.E} = 0.9827505\]

The test is two tailed, both sides of the distribution must be accounted for the p-value.
P-value:

pt(-0.9827505, 17) + (1 - pt(0.9827505, 17))

## [1] 0.3395109

Relationship between Initial Cynicism and arrest productivity

x_values <- as.integer(StatsHelp$Cynicism)
y_values <- as.integer(StatsHelp$Arrest1)

xyplot(y_values~x_values,type = c("p", "r"), xlab = "Cynicism", ylab = "Arrest productivity")

#computes regression line components
model <- lm(y_values~x_values)
model

## 
## Call:
## lm(formula = y_values ~ x_values)
## 
## Coefficients:
## (Intercept)     x_values  
##      77.756       -9.469

origCorrelation <- cor(y_values~x_values)
origCorrelation

## [1] -0.3975376

listOfCorrelations <- c()

for (i in 1:10000){
  reordered_y_values <- sample(y_values,size=49,replace=F)
  newCorrelation <- cor(reordered_y_values~x_values)
  listOfCorrelations <- c(listOfCorrelations,newCorrelation)
}

histogram(~listOfCorrelations)

#1-sided p-value
pvalue <- (sum(listOfCorrelations >= origCorrelation))/10000
pvalue

## [1] 0.9979

Both P-values showed no relationship or correlation between gender or initial cynicism with arrest productivity. (Rephrase for a proper answer)

6.) Are the variables, PPC and arrest1, related?

x_values <- as.integer(StatsHelp$PPC)
y_values <- as.integer(StatsHelp$Arrest1)

xyplot(y_values~x_values,type = c("p", "r"), xlab = "PPC", ylab = "Arrest productivity")

#computes regression line components
model <- lm(y_values~x_values)
model

## 
## Call:
## lm(formula = y_values ~ x_values)
## 
## Coefficients:
## (Intercept)     x_values  
##      65.936       -4.306

origCorrelation <- cor(y_values~x_values)
origCorrelation

## [1] -0.1703872

listOfCorrelations <- c()

for (i in 1:10000){
  reordered_y_values <- sample(y_values,size=49,replace=F)
  newCorrelation <- cor(reordered_y_values~x_values)
  listOfCorrelations <- c(listOfCorrelations,newCorrelation)
}

histogram(~listOfCorrelations)

#1-sided p-value
pvalue <- (sum(listOfCorrelations >= origCorrelation))/10000
pvalue

## [1] 0.8837

The P-value is greater than the alpha which shows no relationship or correlation between Post-program cynicism with arrest1 was found. (Rephrase for a proper answer)

7.) Did arrest productivity change after the program?

A t-test of the difference in the Arrest2 and Arrest1 variable was done

z <- (StatsHelp$Arrest1 - StatsHelp$Arrest2)
mean(z)

## [1] 16.87755

sd(z)

## [1] 20.64242

First, set the value for alpha to be 0.05 to test our p-value against.
\(H_o\): \(\mu_{diff} = 0\)

\(H_a\): \(\mu_{diff} \neq 0\)
Two Tailed T-Test must be performed:
\(\bar x_{diff} = 16.87755\); \(\bar{s.d}_{diff} = 20.64242\); \(S.E = 2.979477\)

Compute test statistic.
\[t = \frac{16.87755 - 0}{S.E} = 5.664602\]

The test is two tailed, both sides of the distribution must be accounted for the p-value.
P-value:

pt(-5.664602, 48) + (1 - pt(5.664602, 48))

## [1] 8.122817e-07

The calculated p-value was less than alpha, thus the null hypothesis was rejected, and there is statistically significant evidence that there was a change in arrest productivity before and after the program.

8.) Are Post cynicism results related to Shift?

x <- as.character(StatsHelp$Shift)
y<- StatsHelp$PPC
summary(aov(y ~ x, data = StatsHelp))

##             Df Sum Sq Mean Sq F value Pr(>F)
## x            2   1.85  0.9237   0.721  0.492
## Residuals   46  58.93  1.2810

The p-value is higher than alpha thus their is no evidence showing a relationship between Shift and post cynicism.

9.) Are Shift and arrest productivity related (before and after)?

x <- as.character(StatsHelp$Shift)
y<- StatsHelp$Arrest1
summary(aov(y ~ x, data = StatsHelp))

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## x            2  16707    8353   17.38 2.38e-06 ***
## Residuals   46  22107     481                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

x <- as.character(StatsHelp$Shift)
y<- StatsHelp$Arrest2
summary(aov(y ~ x, data = StatsHelp))

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## x            2   9865    4932   10.15 0.000222 ***
## Residuals   46  22343     486                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both anova tests compute p-values lower than alpha thus concluding that the Shift worked has statistically significant evidence of a relationship between arrest productivity (before and after) and shift

10.) Relationship between Arrest1 and Arrest2.

x_values <- as.integer(StatsHelp$Arrest1)
y_values <- as.integer(StatsHelp$Arrest2)

xyplot(y_values~x_values,type = c("p", "r"), xlab = "Arrests per month before program", ylab = "Arrest per month after program")

#computes regression line components
model <- lm(y_values~x_values)
model

## 
## Call:
## lm(formula = y_values ~ x_values)
## 
## Coefficients:
## (Intercept)     x_values  
##      2.6142       0.6514

origCorrelation <- cor(y_values~x_values)
origCorrelation

## [1] 0.7151133

listOfCorrelations <- c()

for (i in 1:10000){
  reordered_y_values <- sample(y_values,size=49,replace=F)
  newCorrelation <- cor(reordered_y_values~x_values)
  listOfCorrelations <- c(listOfCorrelations,newCorrelation)
}

histogram(~listOfCorrelations)

#1-sided p-value
pvalue <- (sum(listOfCorrelations >= origCorrelation))/10000
pvalue

## [1] 0

The calculated p-value was less than alpha, thus the null hypothesis was rejected, and there is statistically significant evidence that there is a relationship between the arrest productivity before and after the program.

Stats

Jacob Wolfla

April 29, 2018