Economics and Student Affairs Project

Research question: find that there is statistically significant evidence that the responses to the question if your expectations for studying in Economics at UVic have been met? differ depending upon the respondent’s choice of department. If the distribution of responses varies considerably between the different study hours categories.

There are three types test we are able to do.

  1. regression

Since we have a set of new binary variable now, we are able to do logistic regression and to identify which predictor is important for student’s choice about whether use this service.

One example is which factor is related to whether student use academic advising service. Please see example one: whether student use Academic Advising.

  1. test for Likert scale data.

Example two: Do Males and Females answer differently?

Example three: Do scoring tendencies differ by countries?

Scoring tenancies are calculated by using Likert scale. For example, in this case, we assign 1 to “not helpful”, 2 to “somewhat not helpful”, 3 to “neutral”, 4 to “somewhat helpful”, 5 to “helpful”, and 0 to “no basis to judge”.

  1. test correlation between two variables.

Example four: Whether Expectation and Faculty to Choose are correlated.

Resource Usage

Different resource usage

This diagram shows if students use this type service. The idea is from Susan, Tricia and David. We dichotomize the question 34 to question 54 to generate binary variables about whether students use this service. We identify “no basis to judge” and “NA” as “FALSE” and other items as “TRUE”.

Different attitude towards to service

Regression: response (explained) and predictor (Explanatory)

Example one:

We are able to run logistic regression using those binary variables.

One example is whether student use Academic Advising. The explanatory variables are:

“Q63factoCareer”

“Q65OpportunityEmployment”

“Q66expectationsMet”

“Q67studentType”

“Q71gender”

“Q72country”

“Q75live”

We should able to add more explanatory variables. Please let us know which explanatory variable you think we should include in the regression.

names(ECONSR1)[c(1,83,85 ,91, 92,100, 102, 112 ,116)]
## [1] "RespondentId"             "Q63factoCareer"          
## [3] "Q65OpportunityEmployment" "Q66expectationsMet"      
## [5] "Q67studentType"           "Q71gender"               
## [7] "Q72country"               "Q75live"                 
## [9] "AcademicAdvising"

glm(formula = AcademicAdvising ~ ., family = "binomial", data = reg1)

The result shows that three variables are significant.

  1. Opportunity for long-term employment in Canada (Q65)

  2. whether expectations Met (Q66)

  3. student Type (Q67)

Coefficients: Estimate Std. Error z value Pr(>
Q65 Opportunity Employment: Opportunity for long-term employment in Canada 2.1407 1.1208 1.910 0.0561 .
Q66 expectations Met:Somewhat agree 1.9583 1.0835 1.807 0.0707 .
Q66 expectations Met:Somewhat disagree -3.8307 2.0739 -1.847 0.0647 .
Q67 student Type:As a Transfer student -3.4949 1.4127 -2.474 0.0134 *
Q67 student Type:As an Exchange student -3.8738 1.8962 -2.043 0.0411 *

The interpretation of the estimated coefficient is complicated. I still need time to go through it.

For now, it is clear, the regression tell us which factor is statistical significant.

We are able to check all explained variables and explanatory variables to find out the relationships between those two types variables, but it is more important to use field knowledge and wisdom to get meaningful models which make sense.

Hypothesis test about likert scale data.

  • Mann Whitney test.

  • Kruskal Wallis test.

Data may also be combined into say two nominal categories Agree/Accept and Disagree/Reject, which allows us to carry out the:

  • Chi-square test.

  • Fisher exact test.

Example two: Do Males and Females answer differently?

Mann-Whitney test.

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Q34AcademicAdvising by Q71gender
## W = 782, p-value = 0.9491
## alternative hypothesis: true location shift is not equal to 0

From the Mann-Whitney test we get a p-value of 0.9491, hence we fail to reject the null hypothesis that Males and Females have the same scoring tendency at the 5% level. So there is no difference between two gender.

Example three: Do scoring tendancies differ by countries?

If we were interested in statistically testing if there were a significant difference between the scoring tenancies of people from different countries.

Unofficially we may conclude from the barplot that there is seemingly no difference in the scoring tendencies of people from different countries. Using a Kruskal-Wallis we can officially test for a difference.

## [1] "China"                  "India"                 
## [3] "Other (please specify)" "United States"

Kruskal-Wallis Test.

To officially test for a difference in scoring tendencies of people from different country we use a Kruskal-Wallis Test.

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Q34AcademicAdvising by Q72country
## Kruskal-Wallis chi-squared = 8.6653, df = 3, p-value = 0.03409

The Kruskal-Wallis test gives us a p-vale of 0.03, hence we have evidence to reject our null hypothesis that there is no difference.

We are likely therefore to believe that there is a difference in scoring tendency between people from different countries.

Test Correlation

To test the correlation between categorical variable

  • Chi-square test.

  • Fisher exact test.

Example four: Whether Expectation and Faculty to Choose are correlated.

Test statistical significant

## 
##  Fisher's Exact Test for Count Data
## 
## data:  expectation_faculty
## p-value = 0.5485
## alternative hypothesis: two.sided
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 1e+05 replicates)
## 
## data:  expectation_faculty
## p-value = 0.549
## alternative hypothesis: two.sided

Our Fisher’s Exact Test for Count Data with simulated p-value revealed that the expectation and faculty to chooser of student are independent ( p = 0.55 ).

Test statistical significant

## 
##  Fisher's Exact Test for Count Data
## 
## data:  expectation_gender
## p-value = 0.1778
## alternative hypothesis: two.sided
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 1e+05 replicates)
## 
## data:  expectation_gender
## p-value = 0.1764
## alternative hypothesis: two.sided

Our Fisher’s Exact Test for Count Data with simulated p-value revealed that the expectation and gender of student are independent ( p = 0.17 ). The expectation of student did not differ by gender.

Predict students’ expectation

Plot for study hours and how easy to make friends deciding expectation

## [1] "RespondentId"       "Q6hourStudy"        "Q23friendCanadian" 
## [4] "Q66expectationsMet"

The more hours student study and easier for student to make friends, the higher probability for them to meet their expectation.

Prediction

Strongly disagree Somewhat disagree Neither agree nor disagree Somewhat agree Strongly agree
Strongly disagree 0 0 0 0 0
Somewhat disagree 0 0 0 0 0
Neither agree nor disagree 0 0 2 0 0
Somewhat agree 2 3 16 43 15
Strongly agree 0 0 0 2 4

Vertical dimension: prediction

Horizontal dimension: true

Not very well, but many students some what agree and strongly agree.