Lab 02 - Equivalence Tests

CPP 524

Kidist Gondel


Packages

library( dplyr )
library( pander )
library( ggplot2 )

Data

# load lab data
URL <- "https://github.com/DS4PS/cpp-524-sum-2020/blob/master/labs/data/female-np-entrepreneurs.rds?raw=true"
dat <- readRDS(gzcon(url( URL )))
head( dat )
##   gender age income edu.level years.prof.exp experience.np.create
## 1 Female  54  79669  Graduate          11-15                   No
## 2 Female  62  63474  Graduate            15+                   No
## 3 Female  70  27887  Graduate            15+                  Yes
## 4   Male  63  63474  Graduate            15+                  Yes
## 5 Female  60 170832  Graduate            15+                  Yes
## 6 Female  41  69531  Graduate           6-10                  Yes
##   experience.np.form experience.np.other take.on.debt seed.funding
## 1                 No                 Yes           $0           No
## 2                Yes                 Yes           $0           No
## 3                Yes                 Yes           $0           No
## 4                 No                 Yes           $0           No
## 5                Yes                 Yes           $0          Yes
## 6                 No                  No           $0          Yes
##   most.imp.fund.source
## 1            Donations
## 2            Gov Grant
## 3            Donations
## 4            Donations
## 5           Corp Grant
## 6            Gov Grant

QUESTIONS

Question 1

Compare education levels of male and female entrepreneurs.

  • Variable Name: edu.level
  • Variable Type: factor
  • Survey question: What is the highest level of education you have achieved?
levels(dat$edu.level)
## [1] "None"         "High School"  "Some College" "Bachelor"     "Graduate"
t <- table( dat$edu.level, dat$gender  )
t %>% prop.table( margin=1 ) %>% round(2) %>% pander()
  Female Male
None 0.47 0.53
High School 0.4 0.6
Some College 0.57 0.43
Bachelor 0.6 0.4
Graduate 0.52 0.48
summary(t)
## Number of cases in table: 554 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 4.383, df = 4, p-value = 0.3566
##  Chi-squared approximation may be incorrect
chisq.test(t)
## 
##  Pearson's Chi-squared test
## 
## data:  t
## X-squared = 4.3831, df = 4, p-value = 0.3566
chisq.test(t, simulate.p.value = TRUE, B=10000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 10000
##  replicates)
## 
## data:  t
## X-squared = 4.3831, df = NA, p-value = 0.3636

ANSWER:

In the test for study group equivalence, conclude there is not enough statistical evidence to support a difference in education level between female and male entrepreneurs.

Question 2

Compare work experience for male and female entrepreneurs.

  • Variable Name: years.prof.exp
  • Variable Type: factor
  • Survey question: How many years of professional experience did you have prior to starting the nonprofit?
levels(dat$years.prof.exp)
## [1] "0"     "1-2"   "3-5"   "6-10"  "11-15" "15+"
t2 <- table( dat$years.prof.exp, dat$gender )
t2 %>% prop.table( margin=1 ) %>% round(2) %>% pander()
  Female Male
0 0.73 0.27
1-2 0.67 0.33
3-5 0.62 0.38
6-10 0.57 0.43
11-15 0.58 0.42
15+ 0.53 0.47
chisq.test(t2)
## 
##  Pearson's Chi-squared test
## 
## data:  t2
## X-squared = 4.0086, df = 5, p-value = 0.5482
chisq.test(t2, simulate.p.value = TRUE, B=10000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 10000
##  replicates)
## 
## data:  t2
## X-squared = 4.0086, df = NA, p-value = 0.5671
bon_alpha2<- 0.05/6
bon_alpha2
## [1] 0.008333333

ANSWER:

In the test for group equivalence, conclude there is not enough statistical evidence to support a difference in work experience between male and female entrepreneurs.

Question 3

Compare success in accessing seed funding for male and female entrepreneurs.

  • Variable Name: seed.funding
  • Variable Type: factor
  • Survey question: Did you receive any SEED FUNDING to start the organization? Seed funding supports development of the organization without requiring deliverables or program activities. Seed funding could also include funding for pilot programs.
levels(dat$seed.funding)
## [1] "No"  "Yes"
t3 <- table( dat$seed.funding, dat$gender)
t3 %>% prop.table( margin=1 ) %>% round(2) %>% pander()
  Female Male
No 0.55 0.45
Yes 0.54 0.46
chisq.test(t3)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t3
## X-squared = 0.0086147, df = 1, p-value = 0.9261
chisq.test(t3, simulate.p.value = TRUE, B=10000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 10000
##  replicates)
## 
## data:  t3
## X-squared = 0.033448, df = NA, p-value = 0.8562
bon_alpha3<-0.05/2
bon_alpha3
## [1] 0.025

ANSWER

In the test for group equivalence,conclude there is not enough statistical evidence to support a difference in success in accessing seed funding between male and female entrepreneurs.

Question 4

Compare the willingness to take on personal debt for male and female entrepreneurs.

  • Variable Name: take.on.debt
  • Variable Type: factor
  • Survey question: Have any members of the organization taken on debt to finance the organization? Collectively:
levels(dat$take.on.debt)
## [1] "$0"        "$0k-$10k"  "$10k-$25k" "$25k-$50k" "$50k+"
t4 <- table( dat$take.on.debt, dat$gender )
t4 %>% prop.table( margin=1 ) %>% round(2) %>% pander()
  Female Male
$0 0.57 0.43
$0k-$10k 0.6 0.4
$10k-$25k 0.39 0.61
$25k-$50k 0.47 0.53
$50k+ 0.36 0.64
chisq.test(t4)
## 
##  Pearson's Chi-squared test
## 
## data:  t4
## X-squared = 8.6158, df = 4, p-value = 0.07145
chisq.test(t4, simulate.p.value = TRUE, B=10000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 10000
##  replicates)
## 
## data:  t4
## X-squared = 8.6158, df = NA, p-value = 0.06499
bon_alpha4<-0.05/5
bon_alpha4
## [1] 0.01

ANSWER ANSWER: In the test for group equivalence,conclude there is not enough statistical evidence to support a difference in willingness to take on personal debt between male and female entrepreneurs.

Question 5

Compare sources of first year funding for male and female entrepreneurs.

  • Variable Name: most.imp.fund.source
  • Variable Type: factor
  • Survey question: From the list of funding sources, which has been the MOST important in your first year of operations? Choose one.
levels(dat$most.imp.fund.source)
## [1] "Donations"        "Founder"          "Earned Revenues"  "Foundation Grant"
## [5] "Gov Grant"        "Member Fees"      "Parent Org"       "Angel"           
## [9] "Corp Grant"
t5 <- table( dat$most.imp.fund.source, dat$gender )
t5 %>% prop.table( margin=1 ) %>% round(2) %>% pander()
  Female Male
Donations 0.51 0.49
Founder 0.56 0.44
Earned Revenues 0.66 0.34
Foundation Grant 0.59 0.41
Gov Grant 0.59 0.41
Member Fees 0.46 0.54
Parent Org 0.5 0.5
Angel 0.52 0.48
Corp Grant 0.67 0.33
chisq.test(t5)
## 
##  Pearson's Chi-squared test
## 
## data:  t5
## X-squared = 8.8304, df = 8, p-value = 0.3568
chisq.test(t5, simulate.p.value = TRUE, B=10000)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 10000
##  replicates)
## 
## data:  t5
## X-squared = 8.8304, df = NA, p-value = 0.3681
bon_alpha5<-0.05/9
bon_alpha5
## [1] 0.005555556

ANSWER

In the test for group equivalence,conclude there is not enough statistical evidence to support a difference in sources of first year funding between male and female entrepreneurs.

Question 6

Compare age at the time of nonprofit formation for male and female entrepreneurs.

  • Variable Name: age
  • Variable Type: numeric
  • Survey question: What was your age when you created the nonprofit?
is.numeric(dat$age)
## [1] TRUE
tapply(dat$age, dat$gender, summary)
## $Female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   22.00   44.00   52.00   51.94   59.50   85.00      15 
## 
## $Male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   29.00   45.25   57.00   55.05   64.75   82.00       5
boxplot(age~gender, data=dat, col=c("pink", "light green"))

t.test(age~gender, data=dat)
## 
##  Welch Two Sample t-test
## 
## data:  age by gender
## t = -3.1749, df = 589.02, p-value = 0.001577
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -5.031881 -1.185709
## sample estimates:
## mean in group Female   mean in group Male 
##             51.93948             55.04828

ANSWER In the test for group equivalence,conclude there IS enough statistical evidence to support a difference age between male and female entrepreneurs.

Question 7

Compare income levels prior to starting the nonprofit for male and female entrepreneurs.

  • Variable Name: income
  • Variable Type: numeric
  • Survey question: Please specify your income range prior to working to create this nonprofit:
is.numeric(dat$income)
## [1] TRUE
tapply(dat$income, dat$gender, summary)
## $Female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      81   33139   61112   67742   86094  199147 
## 
## $Male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1026   49437   69351   80883  107363  199684
boxplot(income~gender, data=dat, col=c("pink", "light green"))

t.test(income~gender, data=dat)
## 
##  Welch Two Sample t-test
## 
## data:  income by gender
## t = -3.6353, df = 630.22, p-value = 0.0003003
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -20239.710  -6042.518
## sample estimates:
## mean in group Female   mean in group Male 
##             67741.83             80882.95

ANSWER In the test for group equivalence, conclude there IS enough statistical evidence to support a difference in prior income levels between male and female entrepreneurs.

Question 8

Based upon these seven contrasts, would you conclude that the resources male and female nonprofit entrepreneurs have at the time of founding were equivalent?

Q8-A:

What is the adjusted decision criteria used for contrasts to maintain an alpha of 0.05 for the omnibus test of group equivalence?

ANSWER

In Omnibus hypotheses scenarios, the results are rejected contingent on the failure of any one test. The Bonferroni Correction divides alpha by the number of contrasts and compares the p-value to the new alpha.

Q8-B:

What is the lowest p-value you observed across the seven contrasts?

ANSWER p-value = 0.0003003

Q8-C:

Can we claim study group equivalency? Why or why not?

ANSWER

bon_alpha<-0.05/7
bon_alpha
## [1] 0.007142857

ANSWER The smallest p-value 0.0003003 is much smaller than the Bonferroni Corrected Alpha of 0.007143. This concludes that we CANNOT claim the study group equivalency.