Part 1

In the following questions your task is to provide the appropriate, simplest, and most efficient analysis that could be performed for the scenario given (worth 2% for all).

1. You are interested in testing whether a coin is fair or not (i.e., lands on each side equally). What test could we use?

Chi square GOF – tests if the data corresponds to a population with specified set of probabilities. Expected frequencies (a coin should have a 1/2 probability for each side) are compared with actual (unfair coin might not return same probability). The coin is determined to be fair if the difference between expected and actual is about the same.

2. You are asked to find out if the proportion of males and females differ by academic majors, classified categorically as: Art, History, Science, or Social Science. What test would you use?

ANOVA: tests the significance of group differences between two or more groups when the IVs (majors) have two or more categories and DVs (enjoyment) are continuous.

3. You have student’s math scores (a continuous variable) and have been asked to test whether there is a difference between test scores for students in two different sections of the same course. Assuming equal variances what test would you use?

Independent T-tests: tests whether two groups (student sections) differ based on a variable of interest (math scores).

4. You are testing the improvement of individuals health following treatment. You have a continuous score of health before treatment began and after it was concluded for each individual. What test would you use?

Paired sample T-test: tests whether a DV changes over time (before / after treatment).

5. You are interested in which of three industrial milling machines provides the fastest completion time for a standard task. What test(s) would you use?

ANOVA: tests the significance of group differences between two or more groups when the IVs (completion time for each machine) have two or more categories and DVs (time) are continuous.

6. You are asked to see how several factors (all categorical) influence quality perception in consumers (measured as a continuous variable). What test(s) would you use?

ANOVA: tests the significance of group differences between two or more groups when the IVs (several factors) have two or more categories and DVs (perception) are continuous.

7. You are interested in predicting income (a continuous variable) by age, sex, years of education, and years of industry experience (mixtures of continuous and categorical predictors) What test would you use?

Multiple linear regression: tests the effect that multiple variables (IV) have over a DV known as “response variable”. A regression will provide a “fit” that reduces the amount of unexpected variance based on the quantified relationship that each set of IV (predictors) have on the IV. ___

Part 2:

Question 1

  1. A technology firm has conducted a study around human computer interaction. They have designed software (coded NEW in the data set) that they believe can aid air traffic controllers in making quicker decisions, possibly presenting horrific accidents from occurring. There are already two major competitors who currently provide the software to most airports around the world (C1 & C2). As such they need to show that their software does a better job than these competitors. To accomplish this, they randomly recruited 1000 air traffic controllers to take part in the study from various airports and branches of the United States and United Kingdom’s military. Each air traffic controller used each type of software in a simulator and their mean reaction time to important alerts were calculated. The air traffic controller’s ages, genders, and years of experience were also collected. Analyze the data (Exam2Q1.xlsx) and provide a formal summary report to the company about how their software fairs against the two competitors (worth 4%).
library(readxl)
## Warning: package 'readxl' was built under R version 3.4.4
AirTraffic <- read_excel("C:/Users/Enrique/OneDrive/Documents/HU/ANLY510_Principles7Applicaitons02/Data/AirTraffic.xlsx")
AirTraffic$civilianormilitary=replace(AirTraffic$civilianormilitary,
                                      AirTraffic$civilianormilitary==0,"Civilian")
AirTraffic$civilianormilitary=replace(AirTraffic$civilianormilitary,
                                      AirTraffic$civilianormilitary==1,"Military")
str(AirTraffic)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3000 obs. of  8 variables:
##  $ subject           : num  1 1 1 2 2 2 3 3 3 4 ...
##  $ age               : num  32 32 32 31 31 31 36 36 36 32 ...
##  $ expierence        : num  7 7 7 6 6 6 20 20 20 7 ...
##  $ simulator         : chr  "C1" "C2" "NEW" "C1" ...
##  $ Reaction          : num  11.1 18.58 6.76 12.18 15.38 ...
##  $ gender            : chr  "Female" "Female" "Female" "Female" ...
##  $ airport           : chr  "PDX" "PDX" "PDX" "PDX" ...
##  $ civilianormilitary: chr  "Civilian" "Civilian" "Civilian" "Civilian" ...

Lets take a quick look at the reaction means

tapply(X = AirTraffic$Reaction,INDEX = AirTraffic$simulator,FUN = mean)
##        C1        C2       NEW 
##  7.268982 16.507990  5.535704
tapply(X = AirTraffic$Reaction,INDEX = AirTraffic$gender, FUN= mean)
##   Female     Male 
## 9.817140 9.727163
tapply(AirTraffic$Reaction, AirTraffic$civilianormilitary, FUN= mean)
##  Civilian  Military 
## 10.479070  8.792932

The means show to be quite different across each simulator with New having the lowest. Civilians seem to have a slightly faster reaction than military. Means are very similar between males and females.

Lets seee how the means for each category compare for each simulator separately:

Air_New = subset(AirTraffic, AirTraffic$simulator=="NEW",
                 select = c("subject","age","expierence",
                            "Reaction","gender","airport","civilianormilitary"))
Air_c1 = subset(AirTraffic, AirTraffic$simulator=="C1",
                 select = c("subject","age","expierence",
                            "Reaction","gender","airport","civilianormilitary"))
Air_c2 = subset(AirTraffic, AirTraffic$simulator=="C2",
                 select = c("subject","age","expierence",
                            "Reaction","gender","airport","civilianormilitary"))
tapply(Air_New$Reaction,Air_New$gender,FUN = mean)
##   Female     Male 
## 5.664256 5.414155
tapply(Air_c1$Reaction,Air_c1$gender,FUN = mean)
##   Female     Male 
## 7.318357 7.222297
tapply(Air_c2$Reaction,Air_c2$gender,FUN = mean)
##   Female     Male 
## 16.46881 16.54504
  #Civilian and military - New has the lowest reaction mean by civilian or military
tapply(Air_New$Reaction,Air_New$civilianormilitary,FUN = mean)
## Civilian Military 
## 6.176517 4.650772
tapply(Air_c1$Reaction,Air_c1$civilianormilitary,FUN = mean)
## Civilian Military 
## 8.006233 6.250874
tapply(Air_c2$Reaction,Air_c2$civilianormilitary,FUN = mean)
## Civilian Military 
## 17.25446 15.47715

Gender - New has the lowest reaction mean by gender.

Civilian and military - New has the lowest reaction mean by civilian or military

__ Now that we know how the data differs by comparing the means, lets plot the data to display the level of association:__

boxplot(AirTraffic$Reaction~AirTraffic$simulator, main="Reaction per Simulator",
        xlab="Simulator", ylab="Reaction time", col=c(2,3,5))

interaction.plot(AirTraffic$gender,AirTraffic$simulator,AirTraffic$Reaction, 
                 main="Interaction between simulator & gender",
                 ylab="Mean of Reaction", xlab="Gender")

interaction.plot(AirTraffic$expierence,AirTraffic$simulator,AirTraffic$Reaction, 
                 main="Interaction between simulator & Experience",
                 ylab="Mean of Reaction", xlab="Experience")

interaction.plot(AirTraffic$age,AirTraffic$simulator,AirTraffic$Reaction, 
                 main="Interaction between simulator & age",
                 ylab="Mean of Reaction", xlab="Age")

Observations:

Simulation results don’t change based on gender across all three simualtors. Only C2 shows a significant decrease in reaction time for participants whom have more experince. *Reaction time increased significantly for participants above 45 years old. This holds true for all three simulators.

Plot distribution of reaction time and test for skewness, normality, and variance.

library(moments)
plot(density(AirTraffic$Reaction), col=2, lwd=2, lty="dashed")  

agostino.test(AirTraffic$Reaction) 
## 
##  D'Agostino skewness test
## 
## data:  AirTraffic$Reaction
## skew = 0.49073, z = 10.43100, p-value < 2.2e-16
## alternative hypothesis: data have a skewness
shapiro.test(AirTraffic$Reaction) 
## 
##  Shapiro-Wilk normality test
## 
## data:  AirTraffic$Reaction
## W = 0.94254, p-value < 2.2e-16

Failed for both skewness and normality.

Fit ANOVA model to confirm reaction times are in fact different among simulators.

Air_model=anova(lm(AirTraffic$Reaction~AirTraffic$simulator))

Conclusion:

Reject HO of equal variances across groups. Anova confirmed that reaction times highly differ across simulators. The means show to be quite different across each simulator with New having the lowest and C2 the highest. Prescriptive analyses clearly show the new simulator as more effective in terms of response times compared to C1 and c2 across all different demographics.


Question 2

Exam2Q1.xlsx contains data from a study examining the effects of mood on the propensity to accept gambles relative to sure losses or gains. Specifically, participants were given 100$ and then given a choice between (gain frame: keeping 25$ of the 100$ for sure vs. 50% chance to win 100$ more or lose \(50 of the 100\)) and (loss frame: lose 75$ of the 100$ for sure vs. 50% chance to win 100$ or lose \(50 of the 100\)). Each participant encountered each frame (within-subjects) once in random order. In addition, before making their selections they watched videos (between subjects) selected to induce sadness (video of a child crying), happiness (video of a family reuniting), or neutrality (nature video). Appropriately analyze the data and report your findings in a formal summary (worth 4%).

library(readxl)
MoodEffects <- read_excel("C:/Users/Enrique/OneDrive/Documents/HU/ANLY510_Principles7Applicaitons02/Data/MoodEffects.xlsx")

Change the names and add factors for easier analysis.

MoodEffects$mood=replace(MoodEffects$mood,MoodEffects$mood==1,"happy")
MoodEffects$mood=replace(MoodEffects$mood,MoodEffects$mood==3,"normal")
MoodEffects$mood=replace(MoodEffects$mood,MoodEffects$mood==2,"sad")
MoodEffects$mood=factor(x = MoodEffects$mood,levels = c("happy","normal","sad"))
MoodEffects$lossorgain=factor(x = MoodEffects$lossorgain,levels = c("gain","loss"))
str(MoodEffects)
## Classes 'tbl_df', 'tbl' and 'data.frame':    182 obs. of  4 variables:
##  $ subject       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ mood          : Factor w/ 3 levels "happy","normal",..: 1 3 1 3 1 3 1 3 1 3 ...
##  $ acceptancerate: num  0.2812 0.2812 0.2188 0.0312 0.0625 ...
##  $ lossorgain    : Factor w/ 2 levels "gain","loss": 1 1 1 1 1 1 1 1 1 1 ...

Visualize data to identify any associations prior fitting the model

interaction.plot(MoodEffects$mood,MoodEffects$lossorgain,MoodEffects$acceptancerate,
                 main="Interaction: Acceptance & Mood", xlab="Mood", ylab="Acceptance Rate")

interaction.plot(MoodEffects$lossorgain,MoodEffects$mood,MoodEffects$acceptancerate,
                 main="Interaction: Acceptance & Gain-loss", xlab="Mood", ylab="Acceptance Rate")

Check for skewness and normality.

plot(density(MoodEffects$acceptancerate))

agostino.test(MoodEffects$acceptancerate)
## 
##  D'Agostino skewness test
## 
## data:  MoodEffects$acceptancerate
## skew = 0.89965, z = 4.49720, p-value = 6.886e-06
## alternative hypothesis: data have a skewness
shapiro.test(MoodEffects$acceptancerate)
## 
##  Shapiro-Wilk normality test
## 
## data:  MoodEffects$acceptancerate
## W = 0.8899, p-value = 2.465e-10

Data failed for normality and skewness.

Fit a logistic regression model to display the effects of mood change on acceptance rate, relative to loss or gain.

attach(MoodEffects)
effectmodel=glm(acceptancerate~mood*lossorgain,data = MoodEffects)
anova(effectmodel, test="Chisq")
## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: acceptancerate
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                              181    14.1948              
## mood             2   1.0078       179    13.1870 8.714e-05 ***
## lossorgain       1   3.5031       178     9.6839 7.538e-16 ***
## mood:lossorgain  2   0.1968       176     9.4871    0.1612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(lme4)
## Warning: package 'lme4' was built under R version 3.4.4
## Loading required package: Matrix
effectmodel=lmer(acceptancerate~mood+(1|lossorgain),data= MoodEffects)
summary(effectmodel)
## Linear mixed model fit by REML ['lmerMod']
## Formula: acceptancerate ~ mood + (1 | lossorgain)
##    Data: MoodEffects
## 
## REML criterion at convergence: 3.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2721 -0.6048 -0.1285  0.5681  2.7552 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  lossorgain (Intercept) 0.0379   0.1947  
##  Residual               0.0544   0.2332  
## Number of obs: 182, groups:  lossorgain, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  0.39338    0.14053   2.799
## moodnormal  -0.11574    0.04297  -2.694
## moodsad     -0.17262    0.04096  -4.214
## 
## Correlation of Fixed Effects:
##            (Intr) mdnrml
## moodnormal -0.132       
## moodsad    -0.139  0.455

Interpretation:

Acceptance rate is 39% more likely to increase if participants are happy taking into account wins/losses.When mood changes from happy to normal and from normal to sad, acceptance rate drops by 11% and 17% respectively.

Verify results of model by making sure residuals are normally distributed,fall close to 0, without patterns.

qqnorm(resid(effectmodel))
qqline(resid(effectmodel))

plot(resid(effectmodel))

Residuals do not show any patterns and are mostly close to 0, thus validating the results of our model.Rsually, non-random patterns in residuals indicate lack of fit from predictors.

Plot effects

library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(effectmodel, partial.residuals = T))

Conclusion:

Effect plots of the model confirms results of model and aligns with previous visualizations by displaying a gradual drop in acceptance rate as mood changes from happy to sad.

Casinos should focus on keepin customers’ mood at happy or normal levels to maintain acceptance rates high at all times, regardless of wins and losses.