setwd("D:/stat")
getwd()
## [1] "D:/stat"
library(readxl)
## Warning: package 'readxl' was built under R version 4.2.3
Data<-read_excel("D:/stat//DataFinalExam.xlsx")
Data
## # A tibble: 163 × 33
##      Age Gender Course T…¹   In1   In2   In3   In4   In5   In6   In7   In8   Ex1
##    <dbl> <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    22 Female BS Mathem…     4     3     2     1     4     7     6     7     4
##  2    23 Female BS Biology     6     6     4     4     4     5     4     7     4
##  3    20 Female BSED Engl…     5     5     3     3     2     6     5     7     5
##  4    22 Female BSED Biol…     4     5     4     3     3     6     6     7     5
##  5    23 Male   BSED Engl…     7     6     5     5     4     6     4     7     7
##  6    22 Female BSED Biol…     6     6     6     6     6     7     7     7     7
##  7    20 Male   BS Civil …     4     5     6     2     5     7     4     1     7
##  8    21 Female BS Electr…     5     6     5     6     5     7     6     7     7
##  9    21 Female BS Mathem…     6     7     5     5     5     7     7     7     7
## 10    22 Male   BS Biology     6     7     5     6     7     7     7     7     5
## # … with 153 more rows, 21 more variables: Ex2 <dbl>, Ex3 <dbl>, Ex4 <dbl>,
## #   Ex5 <dbl>, Ex6 <dbl>, Ex7 <dbl>, Ex8 <dbl>, Ex9 <dbl>, Ex10 <dbl>,
## #   Ex11 <dbl>, TP1 <dbl>, TP2 <dbl>, TP3 <dbl>, TP4 <dbl>, TP5 <dbl>,
## #   T6 <dbl>, CP1 <dbl>, CP2 <dbl>, CP3 <dbl>, CP4 <dbl>, CP5 <dbl>, and
## #   abbreviated variable name ¹​`Course Taken`
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rmarkdown)

1.1 First output:

library(dplyr)
Data %>%
  group_by(`Course Taken`) %>%
  summarise(Frequency = n(), "Mean Age"=mean(Age), "SD of age"= sd(Age))
## # A tibble: 6 × 4
##   `Course Taken`            Frequency `Mean Age` `SD of age`
##   <chr>                         <int>      <dbl>       <dbl>
## 1 BS Biology                       33       21.6       0.751
## 2 BS Civil Engineering             16       21.4       0.727
## 3 BS Electrical Engineering        17       21.6       0.618
## 4 BS Mathematics                   33       21.7       0.924
## 5 BSED Biology                     32       21.5       0.803
## 6 BSED English                     32       21.6       0.878

1.2 Second output:

Data2<-Data%>%
  group_by(`Course Taken`)%>%
  summarise(Frequency=n(), 'Mean Intrinsic4' = mean(In4), 'Mean Extrinsic3' = mean(Ex3), 'Mean TP3' = mean(TP3), 'Mean CP3' = mean(CP3))
Data2
## # A tibble: 6 × 6
##   `Course Taken`            Frequency `Mean Intrinsic4` Mean E…¹ Mean …² Mean …³
##   <chr>                         <int>             <dbl>    <dbl>   <dbl>   <dbl>
## 1 BS Biology                       33              4.94     5.27    3.88   NA   
## 2 BS Civil Engineering             16              4.06     5.5     3.31    4.38
## 3 BS Electrical Engineering        17              4.35     5       3.47    3.71
## 4 BS Mathematics                   33              4.27     5.39    3.55    3.52
## 5 BSED Biology                     32              4.34     5.22    3.25    3.62
## 6 BSED English                     32              4.19     5.66    3.91    3.03
## # … with abbreviated variable names ¹​`Mean Extrinsic3`, ²​`Mean TP3`,
## #   ³​`Mean CP3`

Recoding the responses in Variables “In3 and In4” with the following changes “1 for”Strongly Disagree” “2” for “Disagree” “3” for “Moderately Disagree” “4” for “Neutral” “5” for “Moderately Agree” “6” for “Agree” “7” for “Strongly Agree”

2.1 Answer the following:

Data4<-Data3%>%
  group_by(In3recode, In4recode) %>%
  summarise(count=n())
## `summarise()` has grouped output by 'In3recode'. You can override using the
## `.groups` argument.
Data4
## # A tibble: 33 × 3
## # Groups:   In3recode [7]
##    In3recode In4recode           count
##    <chr>     <chr>               <int>
##  1 Agree     Agree                  12
##  2 Agree     Disagree                2
##  3 Agree     Moderately Agree        9
##  4 Agree     Neutral                 7
##  5 Agree     Strongly Agree          3
##  6 Disagree  Disagree                3
##  7 Disagree  Moderately Agree        3
##  8 Disagree  Moderately Disagree     1
##  9 Disagree  Neutral                 1
## 10 Disagree  Strongly Disagree       2
## # … with 23 more rows

a. How many observations in Variable In3 that are strongly agree and at the same time moderately disagree in variable In4?

Answer: There is no variables from In3 that is strongly agree at the same time moderately disagree in In4.

b. How many observations in Variable In3 that are strongly agree and at the same time Neutral in variable In4?

Answer: There is only variables from In3 that are strongly agree at the same time Neutral in variable In4.

3. Consider the following:

Make a new variable named as “InAverage”, InAverage is the average of the responses in the variables In1, In2, IIn3, In4, and In5.

Data<-Data%>%
  mutate(InAverage = (In1+In2+In3+In4+In5)/5)

Make two groups of the variable “Course Taken”

Grouping:

Group 1 with courses: BS Civil Engineering, BS Electrical Engineering, and BS Mathematics

Group 2: BS Biology, BSED Biology, and BSED English

Data<-Data %>%
  mutate(CTrecode = recode(`Course Taken`,
                "BS Civil Engineering" = "1", 
                "BS Electrical Engineering"= "2", 
                "BS Mathematics" =  "3", 
                "BS Biology" = "4" ,
                "BSED Biology" = "5",
                "BSED English" = "6" ))
Data1<-Data%>%
 mutate(CourseTakenGroup = ifelse(Data$CTrecode<4, "Group1", "Group2"))
Data1
## # A tibble: 163 × 36
##      Age Gender Course T…¹   In1   In2   In3   In4   In5   In6   In7   In8   Ex1
##    <dbl> <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    22 Female BS Mathem…     4     3     2     1     4     7     6     7     4
##  2    23 Female BS Biology     6     6     4     4     4     5     4     7     4
##  3    20 Female BSED Engl…     5     5     3     3     2     6     5     7     5
##  4    22 Female BSED Biol…     4     5     4     3     3     6     6     7     5
##  5    23 Male   BSED Engl…     7     6     5     5     4     6     4     7     7
##  6    22 Female BSED Biol…     6     6     6     6     6     7     7     7     7
##  7    20 Male   BS Civil …     4     5     6     2     5     7     4     1     7
##  8    21 Female BS Electr…     5     6     5     6     5     7     6     7     7
##  9    21 Female BS Mathem…     6     7     5     5     5     7     7     7     7
## 10    22 Male   BS Biology     6     7     5     6     7     7     7     7     5
## # … with 153 more rows, 24 more variables: Ex2 <dbl>, Ex3 <dbl>, Ex4 <dbl>,
## #   Ex5 <dbl>, Ex6 <dbl>, Ex7 <dbl>, Ex8 <dbl>, Ex9 <dbl>, Ex10 <dbl>,
## #   Ex11 <dbl>, TP1 <dbl>, TP2 <dbl>, TP3 <dbl>, TP4 <dbl>, TP5 <dbl>,
## #   T6 <dbl>, CP1 <dbl>, CP2 <dbl>, CP3 <dbl>, CP4 <dbl>, CP5 <dbl>,
## #   InAverage <dbl>, CTrecode <chr>, CourseTakenGroup <chr>, and abbreviated
## #   variable name ¹​`Course Taken`

3.1 Is there a significant difference between the two groups of course taken in terms of the variable “InAverage”? (Note: 1.1 Check first the equality of variances 1.2 Answer this by including all the steps in hypothesis testing)

table(Data1$CourseTakenGroup)
## 
## Group1 Group2 
##     66     97

Checking the equality of variances

var.test(Data1$InAverage~Data1$CourseTakenGroup)
## 
##  F test to compare two variances
## 
## data:  Data1$InAverage by Data1$CourseTakenGroup
## F = 0.763, num df = 65, denom df = 96, p-value = 0.2458
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4919799 1.2067147
## sample estimates:
## ratio of variances 
##           0.762995

Hypotheses:

Alternative: There is a significant difference between the two groups of course taken in terms of the variable “InAverage”

Null: There is no significant difference between the two groups of course taken in terms of the variable “InAverage”

t.test(Data1$InAverage~Data1$CourseTakenGroup, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  Data1$InAverage by Data1$CourseTakenGroup
## t = -0.050692, df = 161, p-value = 0.9596
## alternative hypothesis: true difference in means between group Group1 and group Group2 is not equal to 0
## 95 percent confidence interval:
##  -0.3582544  0.3403225
## sample estimates:
## mean in group Group1 mean in group Group2 
##             4.784848             4.793814

Interpretation: Using t test, we obtain p-value=0.9596 which is greater than alpha=0.05 (for 95% confidence level). Therefore, the null hypothesis is accepted. Hence, there is no significant difference between the two groups of course taken in terms of the variable “InAverage”

4. Is there a significant difference among the courses taken in terms of the variable “InAverage”(Refer to Q3 for the “Inaverage” variable)? (Note: 1. Answer this by using F-test 1.2 Answer this by including all the steps in hypothesis testing 1.3 Provide results for pairwise comparison if significant)

1.2 Answer this by including all the steps in hypothesis testing

Hypotheses:

Alternative: There is a significant difference among the courses taken in terms of the variable “InAverage”

Null: There is no significant difference among the courses taken in terms of the variable “InAverage”

table(Data$`Course Taken`)
## 
##                BS Biology      BS Civil Engineering BS Electrical Engineering 
##                        33                        16                        17 
##            BS Mathematics              BSED Biology              BSED English 
##                        33                        32                        32
res.aov <- aov(InAverage ~ `Course Taken`, data = Data)
summary(res.aov)
##                 Df Sum Sq Mean Sq F value Pr(>F)
## `Course Taken`   5   1.96  0.3913   0.314  0.904
## Residuals      157 195.87  1.2476

Interpretation: The null hypothesis is accepted.Since the result shows that we obtain p-value=0.9596 greater than alpha=0.05 (for 95% confidence level). Therefore, there is no significant difference among the courses taken in terms of the variable “InAverage”