Part 1: In the following questions your task is to provide the appropriate, simplest, and most efficient analysis that could be performed for the scenario given (worth 2% for all).
o For this scenario a Chi Square Goodness of Fit Test would be approriate beacuse we can analyze the proportions of the outcomes (heads or tails) in relation to the data gathered in our observation (flipping the coins).
o We could use a One-way ANOVA test to examine the relationship between categorical predictor with two levels (students gender) and a continuous dependent variable (Academic Majors) .
o Assuming equal variances, an idependent T-test can check for whether two groups (sections) have any differences based on a continuous variable (test score)
o In this occasion a Paired T-test would assess the relationship between the independent variables (Treatment intake) and the dependent variable (Health).
o For this case, it’s appropriate to use One-way ANOVA since it compares a constant variable (time) against a categorical value (machine types)
o An ANOVA test could be used to analyze the relationship between categorical predictor with two multiple levels (factors that influence consumer perception) and a continuous dependent variable (consumer perception) .
o For this scneario, we could a use a multiple linear regression, since it tests for the the effect of the mixtures of continuous and categorical predictors have over income.
Part 2: At this stage in the course you have several analyses techniques that can be used somewhat interchangeably (ANOVA ~ Regression). Thus, while the following questions have optimal ways to proceed, there may be several “right” ways to proceed. While you will not lose points for using an analysis that I would not necessarily use (assuming you perform the analysis right and get the correct answer), I would suggest using the simple analysis you can for the problem presented (read the description of the study and look at the data closely).
library(readxl)
AirTrafficData= read_excel("C:/Users/jcolu/OneDrive/Documents/Harrisburg/Summer 2018/ANLY 510/Exam2Q1.xlsx")
AirTrafficData
Intial data set analysis using str() to summarize.
str(AirTrafficData)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3000 obs. of 8 variables:
## $ subject : num 1 1 1 2 2 2 3 3 3 4 ...
## $ age : num 32 32 32 31 31 31 36 36 36 32 ...
## $ expierence : num 7 7 7 6 6 6 20 20 20 7 ...
## $ simulator : chr "C1" "C2" "NEW" "C1" ...
## $ rt : num 11.1 18.58 6.76 12.18 15.38 ...
## $ gender : chr "Female" "Female" "Female" "Female" ...
## $ airport : chr "PDX" "PDX" "PDX" "PDX" ...
## $ civilianormilitary: num 0 0 0 0 0 0 0 0 0 0 ...
Assing a value to civilan or military.
AirTrafficData$civilianormilitary=replace(AirTrafficData$civilianormilitary,AirTrafficData$civilianormilitary==0, "Civilian")
AirTrafficData$civilianormilitary=replace(AirTrafficData$civilianormilitary,AirTrafficData$civilianormilitary==1, "Military")
str(AirTrafficData)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3000 obs. of 8 variables:
## $ subject : num 1 1 1 2 2 2 3 3 3 4 ...
## $ age : num 32 32 32 31 31 31 36 36 36 32 ...
## $ expierence : num 7 7 7 6 6 6 20 20 20 7 ...
## $ simulator : chr "C1" "C2" "NEW" "C1" ...
## $ rt : num 11.1 18.58 6.76 12.18 15.38 ...
## $ gender : chr "Female" "Female" "Female" "Female" ...
## $ airport : chr "PDX" "PDX" "PDX" "PDX" ...
## $ civilianormilitary: chr "Civilian" "Civilian" "Civilian" "Civilian" ...
Analyze the means of the reaction times.
tapply(X=AirTrafficData$rt, INDEX = AirTrafficData$simulator, FUN = mean)
## C1 C2 NEW
## 7.268982 16.507990 5.535704
Reaction time for new simulator seems faster in comparisson to the two previous cases.
tapply(X=AirTrafficData$rt, INDEX = AirTrafficData$civilianormilitary, FUN = mean)
## Civilian Military
## 10.479070 8.792932
Civilians have a slower reaction time than military.
tapply(X=AirTrafficData$rt, INDEX = AirTrafficData$gender, FUN = mean)
## Female Male
## 9.817140 9.727163
Menas are too similar, but men seem to have a faster reaction time.
Comparisson of means across categories
New_Simulator = subset(AirTrafficData, AirTrafficData$simulator=="NEW", select = c("subject","age","expierence","rt","gender","airport","civilianormilitary"))
tapply(New_Simulator$rt,New_Simulator$gender,FUN = mean)
## Female Male
## 5.664256 5.414155
tapply(New_Simulator$rt,New_Simulator$civilianormilitary,FUN = mean)
## Civilian Military
## 6.176517 4.650772
C1_Simulator = subset(AirTrafficData, AirTrafficData$simulator=="C1", select = c("subject","age","expierence","rt","gender","airport","civilianormilitary"))
tapply(C1_Simulator$rt,C1_Simulator$gender,FUN = mean)
## Female Male
## 7.318357 7.222297
tapply(C1_Simulator$rt,C1_Simulator$civilianormilitary,FUN = mean)
## Civilian Military
## 8.006233 6.250874
C2_Simulator = subset(AirTrafficData, AirTrafficData$simulator=="C2", select = c("subject","age","expierence","rt","gender","airport","civilianormilitary"))
tapply(C2_Simulator$rt,C2_Simulator$gender,FUN = mean)
## Female Male
## 16.46881 16.54504
tapply(C2_Simulator$rt,C2_Simulator$civilianormilitary,FUN = mean)
## Civilian Military
## 17.25446 15.47715
According to gender, the new simulator has the lowest reaction time. Also, according to Civilian or military, the new simulator has the lowest reaction time.
Plot the data for visualization.
interaction.plot(AirTrafficData$expierence,AirTrafficData$simulator,AirTrafficData$rt, main = "Reaction times according to simulator and Experience")
interaction.plot(AirTrafficData$gender,AirTrafficData$simulator,AirTrafficData$rt, main = "Reaction times according to simulator and Gender")
interaction.plot(AirTrafficData$age,AirTrafficData$simulator,AirTrafficData$rt, main = "Reaction times according to simulator and Age")
boxplot(AirTrafficData$rt~AirTrafficData$simulator,main= "Reaction times per Simulator")
Although generally reaction times are faster in the new simulator, the plots do not any significance difference according in the other variables.
Conduct normality and skeness tests.
library(moments)
agostino.test(AirTrafficData$rt)
##
## D'Agostino skewness test
##
## data: AirTrafficData$rt
## skew = 0.49073, z = 10.43100, p-value < 2.2e-16
## alternative hypothesis: data have a skewness
shapiro.test(AirTrafficData$rt)
##
## Shapiro-Wilk normality test
##
## data: AirTrafficData$rt
## W = 0.94254, p-value < 2.2e-16
Data set fails for skewness and normality.
Mood= read_excel("C:/Users/jcolu/OneDrive/Documents/Harrisburg/Summer 2018/ANLY 510/Exam2Q2.xlsx")
Mood
Intial data set analysis using str() to summarize.
str(Mood)
## Classes 'tbl_df', 'tbl' and 'data.frame': 182 obs. of 4 variables:
## $ subject : num 1 2 3 4 5 6 7 8 9 10 ...
## $ mood : num 1 2 1 2 1 2 1 2 1 2 ...
## $ acceptancerate: num 0.2812 0.2812 0.2188 0.0312 0.0625 ...
## $ lossorgain : chr "gain" "gain" "gain" "gain" ...
Change the names and factors to conduct a better analysis
Mood$mood=replace(Mood$mood,Mood$mood==1, "happy")
Mood$mood=replace(Mood$mood,Mood$mood==3, "indiferent")
Mood$mood=replace(Mood$mood,Mood$mood==2, "sad")
Mood$mood=factor(x=Mood$mood, levels = c("happy","indiferent","sad"))
Mood$lossorgain=factor(x=Mood$lossorgain,levels = c("gain","loss"))
str(Mood)
## Classes 'tbl_df', 'tbl' and 'data.frame': 182 obs. of 4 variables:
## $ subject : num 1 2 3 4 5 6 7 8 9 10 ...
## $ mood : Factor w/ 3 levels "happy","indiferent",..: 1 3 1 3 1 3 1 3 1 3 ...
## $ acceptancerate: num 0.2812 0.2812 0.2188 0.0312 0.0625 ...
## $ lossorgain : Factor w/ 2 levels "gain","loss": 1 1 1 1 1 1 1 1 1 1 ...
Visualize data
interaction.plot(Mood$mood,Mood$lossorgain,Mood$acceptancerate,main="Interaction between Acceptance & Mood")
interaction.plot(Mood$lossorgain, Mood$mood,Mood$acceptancerate, main = "Interaction between Acceptance rate & Gain/Loss")
Conduct normality and skeness tests.
agostino.test(Mood$acceptancerate)
##
## D'Agostino skewness test
##
## data: Mood$acceptancerate
## skew = 0.89965, z = 4.49720, p-value = 6.886e-06
## alternative hypothesis: data have a skewness
shapiro.test(Mood$acceptancerate)
##
## Shapiro-Wilk normality test
##
## data: Mood$acceptancerate
## W = 0.8899, p-value = 2.465e-10
Data set failed for normality and skewness.
Logistic Regression Model
attach(Mood)
LRM = glm(acceptancerate~mood*lossorgain,data = Mood)
summary(LRM)
##
## Call:
## glm(formula = acceptancerate ~ mood * lossorgain, data = Mood)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.57261 -0.14580 -0.04136 0.13144 0.67641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.21415 0.03982 5.378 2.37e-07 ***
## moodindiferent -0.06512 0.06049 -1.077 0.2832
## moodsad -0.09621 0.05766 -1.669 0.0970 .
## lossorgainloss 0.35846 0.05631 6.366 1.63e-09 ***
## moodindiferent:lossorgainloss -0.10124 0.08554 -1.184 0.2382
## moodsad:lossorgainloss -0.15281 0.08154 -1.874 0.0626 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.05390424)
##
## Null deviance: 14.1948 on 181 degrees of freedom
## Residual deviance: 9.4871 on 176 degrees of freedom
## AIC: -7.1469
##
## Number of Fisher Scoring iterations: 2
Fit Linear Model
anova(LRM, test = "Chisq")
Plot Model Residuals
plot(resid(LRM))
qqnorm(resid(LRM))
Plot Effects
library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(LRM, partial.residuals=T))