Part 1: In the following questions your task is to provide the appropriate, simplest, and most efficient analysis that could be performed for the scenario given (worth 2% for all).

  1. You are interested in testing whether a coin is fair or not (i.e., lands on each side equally). What test could we use?

o For this scenario a Chi Square Goodness of Fit Test would be approriate beacuse we can analyze the proportions of the outcomes (heads or tails) in relation to the data gathered in our observation (flipping the coins).

  1. You are asked to find out if the proportion of males and females differ by academic majors, classified categorically as: Art, History, Science, or Social Science. What test would you use?

o We could use a One-way ANOVA test to examine the relationship between categorical predictor with two levels (students gender) and a continuous dependent variable (Academic Majors) .

  1. You have student’s math scores (a continuous variable) and have been asked to test whether there is a difference between test scores for students in two different sections of the same course. Assuming equal variances what test would you use?

o Assuming equal variances, an idependent T-test can check for whether two groups (sections) have any differences based on a continuous variable (test score)

  1. You are testing the improvement of individuals health following treatment. You have a continuous score of health before treatment began and after it was concluded for each individual. What test would you use?

o In this occasion a Paired T-test would assess the relationship between the independent variables (Treatment intake) and the dependent variable (Health).

  1. You are interested in which of three industrial milling machines provides the fastest completion time for a standard task. What test(s) would you use?

o For this case, it’s appropriate to use One-way ANOVA since it compares a constant variable (time) against a categorical value (machine types)

  1. You are asked to see how several factors (all categorical) influence quality perception in consumers (measured as a continuous variable). What test(s) would you use?

o An ANOVA test could be used to analyze the relationship between categorical predictor with two multiple levels (factors that influence consumer perception) and a continuous dependent variable (consumer perception) .

  1. You are interested in predicting income (a continuous variable) by age, sex, years of education, and years of industry experience (mixtures of continuous and categorical predictors). What test would you use?

o For this scneario, we could a use a multiple linear regression, since it tests for the the effect of the mixtures of continuous and categorical predictors have over income.

Part 2: At this stage in the course you have several analyses techniques that can be used somewhat interchangeably (ANOVA ~ Regression). Thus, while the following questions have optimal ways to proceed, there may be several “right” ways to proceed. While you will not lose points for using an analysis that I would not necessarily use (assuming you perform the analysis right and get the correct answer), I would suggest using the simple analysis you can for the problem presented (read the description of the study and look at the data closely).

  1. A technology firm has conducted a study around human computer interaction. They have designed software (coded NEW in the data set) that they believe can aid air traffic controllers in making quicker decisions, possibly presenting horrific accidents from occurring. There are already two major competitors who currently provide the software to most airports around the world (C1 & C2). As such they need to show that their software does a better job than these competitors. To accomplish this, they randomly recruited 1000 air traffic controllers to take part in the study from various airports and branches of the United States and United Kingdom’s military. Each air traffic controller used each type of software in a simulator and their mean reaction time to important alerts were calculated. The air traffic controller’s ages, genders, and years of experience were also collected. Analyze the data (Exam2Q1.xlsx) and provide a formal summary report to the company about how their software fairs against the two competitors (worth 4%).
library(readxl)
AirTrafficData= read_excel("C:/Users/jcolu/OneDrive/Documents/Harrisburg/Summer 2018/ANLY 510/Exam2Q1.xlsx")
AirTrafficData

Intial data set analysis using str() to summarize.

str(AirTrafficData)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3000 obs. of  8 variables:
##  $ subject           : num  1 1 1 2 2 2 3 3 3 4 ...
##  $ age               : num  32 32 32 31 31 31 36 36 36 32 ...
##  $ expierence        : num  7 7 7 6 6 6 20 20 20 7 ...
##  $ simulator         : chr  "C1" "C2" "NEW" "C1" ...
##  $ rt                : num  11.1 18.58 6.76 12.18 15.38 ...
##  $ gender            : chr  "Female" "Female" "Female" "Female" ...
##  $ airport           : chr  "PDX" "PDX" "PDX" "PDX" ...
##  $ civilianormilitary: num  0 0 0 0 0 0 0 0 0 0 ...

Assing a value to civilan or military.

AirTrafficData$civilianormilitary=replace(AirTrafficData$civilianormilitary,AirTrafficData$civilianormilitary==0, "Civilian")
AirTrafficData$civilianormilitary=replace(AirTrafficData$civilianormilitary,AirTrafficData$civilianormilitary==1, "Military")
str(AirTrafficData)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3000 obs. of  8 variables:
##  $ subject           : num  1 1 1 2 2 2 3 3 3 4 ...
##  $ age               : num  32 32 32 31 31 31 36 36 36 32 ...
##  $ expierence        : num  7 7 7 6 6 6 20 20 20 7 ...
##  $ simulator         : chr  "C1" "C2" "NEW" "C1" ...
##  $ rt                : num  11.1 18.58 6.76 12.18 15.38 ...
##  $ gender            : chr  "Female" "Female" "Female" "Female" ...
##  $ airport           : chr  "PDX" "PDX" "PDX" "PDX" ...
##  $ civilianormilitary: chr  "Civilian" "Civilian" "Civilian" "Civilian" ...

Analyze the means of the reaction times.

tapply(X=AirTrafficData$rt, INDEX = AirTrafficData$simulator, FUN = mean)
##        C1        C2       NEW 
##  7.268982 16.507990  5.535704

Reaction time for new simulator seems faster in comparisson to the two previous cases.

tapply(X=AirTrafficData$rt, INDEX = AirTrafficData$civilianormilitary, FUN = mean)
##  Civilian  Military 
## 10.479070  8.792932

Civilians have a slower reaction time than military.

tapply(X=AirTrafficData$rt, INDEX = AirTrafficData$gender, FUN = mean)
##   Female     Male 
## 9.817140 9.727163

Menas are too similar, but men seem to have a faster reaction time.

Comparisson of means across categories

New_Simulator = subset(AirTrafficData, AirTrafficData$simulator=="NEW", select = c("subject","age","expierence","rt","gender","airport","civilianormilitary"))
tapply(New_Simulator$rt,New_Simulator$gender,FUN = mean)
##   Female     Male 
## 5.664256 5.414155
tapply(New_Simulator$rt,New_Simulator$civilianormilitary,FUN = mean)
## Civilian Military 
## 6.176517 4.650772
C1_Simulator = subset(AirTrafficData, AirTrafficData$simulator=="C1", select = c("subject","age","expierence","rt","gender","airport","civilianormilitary"))
tapply(C1_Simulator$rt,C1_Simulator$gender,FUN = mean)
##   Female     Male 
## 7.318357 7.222297
tapply(C1_Simulator$rt,C1_Simulator$civilianormilitary,FUN = mean)
## Civilian Military 
## 8.006233 6.250874
C2_Simulator = subset(AirTrafficData, AirTrafficData$simulator=="C2", select = c("subject","age","expierence","rt","gender","airport","civilianormilitary"))
tapply(C2_Simulator$rt,C2_Simulator$gender,FUN = mean)
##   Female     Male 
## 16.46881 16.54504
tapply(C2_Simulator$rt,C2_Simulator$civilianormilitary,FUN = mean)
## Civilian Military 
## 17.25446 15.47715

According to gender, the new simulator has the lowest reaction time. Also, according to Civilian or military, the new simulator has the lowest reaction time.

Plot the data for visualization.

interaction.plot(AirTrafficData$expierence,AirTrafficData$simulator,AirTrafficData$rt, main = "Reaction times according to simulator and Experience")

interaction.plot(AirTrafficData$gender,AirTrafficData$simulator,AirTrafficData$rt, main = "Reaction times according to simulator and Gender")

interaction.plot(AirTrafficData$age,AirTrafficData$simulator,AirTrafficData$rt, main = "Reaction times according to simulator and Age")

boxplot(AirTrafficData$rt~AirTrafficData$simulator,main= "Reaction times per Simulator")

Although generally reaction times are faster in the new simulator, the plots do not any significance difference according in the other variables.

Conduct normality and skeness tests.

library(moments)
agostino.test(AirTrafficData$rt)
## 
##  D'Agostino skewness test
## 
## data:  AirTrafficData$rt
## skew = 0.49073, z = 10.43100, p-value < 2.2e-16
## alternative hypothesis: data have a skewness
shapiro.test(AirTrafficData$rt)
## 
##  Shapiro-Wilk normality test
## 
## data:  AirTrafficData$rt
## W = 0.94254, p-value < 2.2e-16

Data set fails for skewness and normality.

  1. Exam2Q2.xlsx contains data from a study examining the effects of mood on the propensity to accept gambles relative to sure losses or gains. Specifically, participants were given 100$ and then given a choice between (gain frame: keeping 25$ of the 100$ for sure vs. 50% chance to win 100$ more or lose \(50 of the 100\)) and (loss frame: lose 75$ of the 100$ for sure vs. 50% chance to win 100$ or lose \(50 of the 100\)). Each participant encountered each frame (within-subjects) once in random order. In addition, before making their selections they watched videos (between subjects) selected to induce sadness (video of a child crying), happiness (video of a family reuniting), or neutrality (nature video). Appropriately analyze the data and report your findings in a formal summary (worth 4%).
Mood= read_excel("C:/Users/jcolu/OneDrive/Documents/Harrisburg/Summer 2018/ANLY 510/Exam2Q2.xlsx")
Mood

Intial data set analysis using str() to summarize.

str(Mood)
## Classes 'tbl_df', 'tbl' and 'data.frame':    182 obs. of  4 variables:
##  $ subject       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ mood          : num  1 2 1 2 1 2 1 2 1 2 ...
##  $ acceptancerate: num  0.2812 0.2812 0.2188 0.0312 0.0625 ...
##  $ lossorgain    : chr  "gain" "gain" "gain" "gain" ...

Change the names and factors to conduct a better analysis

Mood$mood=replace(Mood$mood,Mood$mood==1, "happy")
Mood$mood=replace(Mood$mood,Mood$mood==3, "indiferent")
Mood$mood=replace(Mood$mood,Mood$mood==2, "sad")
Mood$mood=factor(x=Mood$mood, levels = c("happy","indiferent","sad"))
Mood$lossorgain=factor(x=Mood$lossorgain,levels = c("gain","loss"))
str(Mood)
## Classes 'tbl_df', 'tbl' and 'data.frame':    182 obs. of  4 variables:
##  $ subject       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ mood          : Factor w/ 3 levels "happy","indiferent",..: 1 3 1 3 1 3 1 3 1 3 ...
##  $ acceptancerate: num  0.2812 0.2812 0.2188 0.0312 0.0625 ...
##  $ lossorgain    : Factor w/ 2 levels "gain","loss": 1 1 1 1 1 1 1 1 1 1 ...

Visualize data

interaction.plot(Mood$mood,Mood$lossorgain,Mood$acceptancerate,main="Interaction between Acceptance & Mood")

interaction.plot(Mood$lossorgain, Mood$mood,Mood$acceptancerate, main = "Interaction between Acceptance rate & Gain/Loss")

Conduct normality and skeness tests.

agostino.test(Mood$acceptancerate)
## 
##  D'Agostino skewness test
## 
## data:  Mood$acceptancerate
## skew = 0.89965, z = 4.49720, p-value = 6.886e-06
## alternative hypothesis: data have a skewness
shapiro.test(Mood$acceptancerate)
## 
##  Shapiro-Wilk normality test
## 
## data:  Mood$acceptancerate
## W = 0.8899, p-value = 2.465e-10

Data set failed for normality and skewness.

Logistic Regression Model

attach(Mood)
LRM = glm(acceptancerate~mood*lossorgain,data = Mood)
summary(LRM)
## 
## Call:
## glm(formula = acceptancerate ~ mood * lossorgain, data = Mood)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.57261  -0.14580  -0.04136   0.13144   0.67641  
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    0.21415    0.03982   5.378 2.37e-07 ***
## moodindiferent                -0.06512    0.06049  -1.077   0.2832    
## moodsad                       -0.09621    0.05766  -1.669   0.0970 .  
## lossorgainloss                 0.35846    0.05631   6.366 1.63e-09 ***
## moodindiferent:lossorgainloss -0.10124    0.08554  -1.184   0.2382    
## moodsad:lossorgainloss        -0.15281    0.08154  -1.874   0.0626 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.05390424)
## 
##     Null deviance: 14.1948  on 181  degrees of freedom
## Residual deviance:  9.4871  on 176  degrees of freedom
## AIC: -7.1469
## 
## Number of Fisher Scoring iterations: 2

Fit Linear Model

anova(LRM, test = "Chisq")

Plot Model Residuals

plot(resid(LRM))

qqnorm(resid(LRM))

Plot Effects

library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(LRM, partial.residuals=T))