I chose to examine the differences in median donation amounts and donation probabilities based on NONUTDEGREE and STUDENT.LIFE.
For NONUTDEGREE, the differences in median donation amounts is 1864.5 dollars, with alumni holding a degree from a university other than UT having the larger amount. The difference is statistically significant with a p-value of \(1.832\cdot { 10 }^{ -10 }\), but is of little practical significance.
For STUDENT.LIFE, the differences in median donation amounts is 509.4 dollars, with those who participated in student life having the larger amount. The difference is statistically significant with a p-value of \(1.428\cdot { 10 }^{ -111 }\) and is of large practical significance.
For NONUTDEGREE, the differences in donation probabilities is \(0.20\), with alumni holding a degree from a universtiy other than UT being more likely to donate. The difference is statistically significant with a p-value of \(1.589\cdot { 10 }^{ -11 }\) but is of little practical significance.
For STUDENT.LIFE, the differences in donation probabilities is \(0.20\), with those who participated in student life being more likely to donate. The difference is statistically significant with a p-value of \(0\) and is of large practical significance.
setwd("~/Documents/analytics_capstone/homework_2")
library(regclass)
## Loading required package: bestglm
## Loading required package: leaps
## Loading required package: rpart
## Loading required package: rpart.plot
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## Loading required package: VGAM
## Loading required package: stats4
## Loading required package: splines
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
library(multcompView)
AMOUNTS <- read.csv("alumni-donoramounts.dat")
STATUS <- read.csv("alumni-donorstatus.dat")
#Result of using aggregate(),
aggregate(LTG2UT~NONUTDEGREE, data = AMOUNTS, FUN = median)
## NONUTDEGREE LTG2UT
## 1 No 350.0
## 2 Yes 2224.5
# Running statistical test
associate(LTG2UT~NONUTDEGREE, data = AMOUNTS, plot = FALSE, permutations = 0, prompt = FALSE, classic = TRUE)
## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test
## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test
## Association between NONUTDEGREE (categorical) and LTG2UT (numerical)
## using 18110 complete cases
##
## Sample Sizesx
## No Yes
## 17943 167
## Classic approach (must check assumptions):
## No Yes Approximate p-value
## Averages (ANOVA) 13251 108097 0.002155
## Mean Ranks (Kruskal) 9052 9422 9.532e-16
## Medians 350 2224 1.832e-10
##
## Tests of assumptions:
## -matters for Classic approach
## -can be overly strict, use graphics and judgment if FALSE)
## Test pvalue Pass
## Equal Variance 0.00226573 FALSE
## Normality No 0.00000000 FALSE
## Normality Yes 0.00000000 FALSE
##
## For classic p-values to be reliable, check the sample sizes below:
## x
## No Yes
## 17943 167
## If n < 10, samples must pass test for Normality and Equal Variance
## If n < 25, distribution must be roughly symmetric and pass test of Equal Variance
## If n < 100, distribution can't be extremely skewed
## If n > 100, p-values are reliable
##
##
## Advice: If it makes sense to compare means (i.e., no extreme outliers and the
## distributions aren't too skewed), use the ANOVA. If there there are
## some obvious extreme outliers but the distributions are roughly symmetric, use
## Rank test. Otherwise, use the Median test or rerun the test using, e.g., log10(y)
## instead of y
# aggregating lifetime gift by student life
aggregate(LTG2UT~STUDENT.LIFE, data = AMOUNTS, FUN = median)
## STUDENT.LIFE LTG2UT
## 1 N 240.00
## 2 Y 749.35
# Running statistical test for median
associate(LTG2UT~STUDENT.LIFE, data = AMOUNTS, plot = FALSE, permutations = 0, prompt = FALSE, classic = TRUE)
## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test
## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test
## Association between STUDENT.LIFE (categorical) and LTG2UT (numerical)
## using 18110 complete cases
##
## Sample Sizesx
## N Y
## 10638 7472
## Classic approach (must check assumptions):
## N Y Approximate p-value
## Averages (ANOVA) 8742 21791 0.02971
## Mean Ranks (Kruskal) 8926 9240 2.611e-153
## Medians 240 749.4 1.428e-111
##
## Tests of assumptions:
## -matters for Classic approach
## -can be overly strict, use graphics and judgment if FALSE)
## Test pvalue Pass
## Equal Variance 0.03128629 FALSE
## Normality N 0.00000000 FALSE
## Normality Y 0.00000000 FALSE
##
## For classic p-values to be reliable, check the sample sizes below:
## x
## N Y
## 10638 7472
## If n < 10, samples must pass test for Normality and Equal Variance
## If n < 25, distribution must be roughly symmetric and pass test of Equal Variance
## If n < 100, distribution can't be extremely skewed
## If n > 100, p-values are reliable
##
##
## Advice: If it makes sense to compare means (i.e., no extreme outliers and the
## distributions aren't too skewed), use the ANOVA. If there there are
## some obvious extreme outliers but the distributions are roughly symmetric, use
## Rank test. Otherwise, use the Median test or rerun the test using, e.g., log10(y)
## instead of y
# Probability of donation by non-UT degree
DONATE_NONUTDEGREE <- aggregate(DONATED~NONUTDEGREE, data = STATUS, FUN = function(x)mean(x=="Yes"))
# Creating a gglot2 object
prob_nonutdegree_plot <- ggplot(aes(x = NONUTDEGREE, y = DONATED), data = DONATE_NONUTDEGREE)
prob_nonutdegree_plot + geom_bar(stat="identity")
# statistical test
associate(DONATED~NONUTDEGREE, data = STATUS, plot = FALSE, permutations = 500, classic = TRUE)
## You have 43994 observations. This may take a while.
## If you are sure you want to continue, type y then enter/return
##
## Association between NONUTDEGREE (categorical) and DONATED (categorical):
##
## using 43994 complete cases
## Contingency table:
## y
## x No Yes Total
## No 25779 17943 43722
## Yes 105 167 272
## Total 25884 18110 43994
##
## Table of Expected Counts:
## No Yes
## No 25724 17998
## Yes 160 112
##
## Conditional distributions of y (DONATED) for each level of x (NONUTDEGREE):
## If there is no association, these should look similar to each other and
## similar to the marginal distribution of y
## No Yes
## No 0.5896116 0.4103884
## Yes 0.3860294 0.6139706
## Marginal 0.5883530 0.4116470
##
## Classic approach (must check assumptions):
## Discrepancy Estimated p-value
## 45.42182 1.588549e-11
##
## Reminder: Classic approach requires most expected counts >= 5. No requirements
## for permutation approach.
# Probability of donation by student.life
DONATE_STUDENT.LIFE <- aggregate(DONATED~STUDENT.LIFE, data = STATUS, FUN = function(x)mean(x=="Yes"))
# ggplot2 object for probability of donation by student life
prob_student.life_plot <- ggplot(aes(x = STUDENT.LIFE, y = DONATED), data = DONATE_STUDENT.LIFE)
prob_student.life_plot + geom_bar(stat="identity")
# statistical test
associate(DONATED~STUDENT.LIFE, data = STATUS, plot = FALSE, permutations = 500, classic = TRUE)
## You have 43994 observations. This may take a while.
## If you are sure you want to continue, type y then enter/return
##
## Association between STUDENT.LIFE (categorical) and DONATED (categorical):
##
## using 43994 complete cases
## Contingency table:
## y
## x No Yes Total
## N 21048 10638 31686
## Y 4836 7472 12308
## Total 25884 18110 43994
##
## Table of Expected Counts:
## No Yes
## N 18642.6 13043.4
## Y 7241.4 5066.6
##
## Conditional distributions of y (DONATED) for each level of x (STUDENT.LIFE):
## If there is no association, these should look similar to each other and
## similar to the marginal distribution of y
## No Yes
## N 0.6642681 0.3357319
## Y 0.3929152 0.6070848
## Marginal 0.5883530 0.4116470
##
## Classic approach (must check assumptions):
## Discrepancy Estimated p-value
## 2693.935 0
##
## Reminder: Classic approach requires most expected counts >= 5. No requirements
## for permutation approach.
Note: While a difference in median donation amount between alumni with non-UT degrees and those that do not hold degrees from other universities seems considerable, the comparison is likely unfair. There will undoubtedly be individuals who do not have non-UT degrees who who only hold one degree, while others may have more than one degree (in which case the multiple degrees would be from UT). The comparison is therefore confounded in that one group (those who have non-UT degrees) will certainly have more than one degree, and that the second group (those who do not have non-UT degrees) may or may not hold more than one degree. Similarly, this reasoning follows when considering the probability of an alumnus donating.
I chose to examine the differences in donation amounts and donation probabilities based on DEGREE1_MAJOR.
The following tables show the average log-donation amount and the donation probabilities, sorted from largest to smallest.
# Looking at log donation amount
# aggregating the data utilizing the over the median
LOGAMOUNT_DEGREE1_MAJOR <- aggregate(LOGAMOUNT~DEGREE1_MAJOR,data=AMOUNTS,FUN=median)
# Creating a ggplot object
degree_plot <- ggplot(data = LOGAMOUNT_DEGREE1_MAJOR, aes(x=DEGREE1_MAJOR, y=LOGAMOUNT))
degree_plot + geom_bar(stat="identity") + coord_flip() # Bar plot
# Running statistical test
associate(LOGAMOUNT~DEGREE1_MAJOR, data = AMOUNTS, plot = FALSE, prompt = FALSE, permutations = 500)
## Association between DEGREE1_MAJOR (categorical) and LOGAMOUNT (numerical)
## using 18110 complete cases
##
## Sample Sizesx
## Accounting Banking
## 3587 103
## Business Administration BUSINESS ANALYTICS
## 1157 32
## Business Education Business Studies
## 101 21
## Chemical Engineering Economics
## 28 461
## Electrical Engineering Enterprise Management
## 39 48
## Finance General Business
## 2127 834
## Human Resource Development Human Resource Management
## 15 53
## Industrial Engineering Industrial Management
## 45 141
## Insurance Journalism
## 60 126
## Law Logistics
## 30 247
## Logistics & Transportation Management
## 536 830
## Management Science Marketing
## 37 2472
## Mathematics Mechanical Engineering
## 35 40
## MISSING Office Administration
## 202 84
## Office Administration, Secretarial Personnel Management
## 46 258
## Political Science Psychology
## 34 47
## Public Administration RARE
## 236 582
## Real Estate & Urban Development Retailing
## 206 97
## Statistics Supply Chain Management
## 195 113
## Transportation/Logistics Unknown
## 1162 1643
##
## Permutation procedure:
## Accounting Banking Business Administration
## Averages (ANOVA) 2.776 2.767 2.577
## Mean Ranks (Kruskal) 9126 9772 8737
## Medians 2.703 2.574 2.477
## BUSINESS ANALYTICS Business Education
## Averages (ANOVA) 1.636 2.708
## Mean Ranks (Kruskal) 8913 7717
## Medians 1.305 2.695
## Business Studies Chemical Engineering Economics
## Averages (ANOVA) 2.005 2.979 2.61
## Mean Ranks (Kruskal) 10965 8971 9124
## Medians 1.875 3.183 2.398
## Electrical Engineering Enterprise Management Finance
## Averages (ANOVA) 3.093 2.089 2.625
## Mean Ranks (Kruskal) 8949 8098 9068
## Medians 3.114 1.923 2.477
## General Business Human Resource Development
## Averages (ANOVA) 2.681 1.99
## Mean Ranks (Kruskal) 9078 6574
## Medians 2.481 2
## Human Resource Management Industrial Engineering
## Averages (ANOVA) 1.581 2.717
## Mean Ranks (Kruskal) 9833 7963
## Medians 1.322 2.748
## Industrial Management Insurance Journalism Law
## Averages (ANOVA) 3.361 2.802 2.848 2.774
## Mean Ranks (Kruskal) 9613 9752 9376 10113
## Medians 3.342 2.887 2.856 2.703
## Logistics Logistics & Transportation Management
## Averages (ANOVA) 1.944 2.401 2.4
## Mean Ranks (Kruskal) 7829 9528 9222
## Medians 2 2.243 2.238
## Management Science Marketing Mathematics
## Averages (ANOVA) 2.484 2.605 2.772
## Mean Ranks (Kruskal) 7879 9024 9168
## Medians 2.061 2.477 2.672
## Mechanical Engineering MISSING Office Administration
## Averages (ANOVA) 2.694 3.36 2.72
## Mean Ranks (Kruskal) 9635 8841 9056
## Medians 2.867 3.408 2.594
## Office Administration, Secretarial
## Averages (ANOVA) 2.941
## Mean Ranks (Kruskal) 8288
## Medians 3.249
## Personnel Management Political Science Psychology
## Averages (ANOVA) 2.841 2.6 2.699
## Mean Ranks (Kruskal) 8866 8804 8372
## Medians 2.744 2.47 2.574
## Public Administration RARE
## Averages (ANOVA) 2.722 2.794
## Mean Ranks (Kruskal) 8248 9346
## Medians 2.674 2.688
## Real Estate & Urban Development Retailing Statistics
## Averages (ANOVA) 2.761 3.1 2.6
## Mean Ranks (Kruskal) 9374 9633 9069
## Medians 2.699 3.1 2.439
## Supply Chain Management Transportation/Logistics
## Averages (ANOVA) 1.335 2.739
## Mean Ranks (Kruskal) 9178 9206
## Medians 1.304 2.663
## Unknown Discrepancy Estimated p-value
## Averages (ANOVA) 2.76 20.97 0
## Mean Ranks (Kruskal) 8965 818.6 0
## Medians 2.732 612.6 0
## With 500 permutations, we are 95% confident that
## the p-value of ANOVA (means) is between 0 and 0.007
## the p-value of Kruskal-Wallis (ranks) is between 0 and 0.007
## the p-value of median test is between 0 and 0.007
## Note: If 0.05 is in a range, change permutations= to a larger number
##
##
##
## Advice: If it makes sense to compare means (i.e., no extreme outliers and the
## distributions aren't too skewed), use the ANOVA. If there there are
## some obvious extreme outliers but the distributions are roughly symmetric, use
## Rank test. Otherwise, use the Median test or rerun the test using, e.g., log10(y)
## instead of y
# Creating a sorted list from largest to smallest and printing it
(LOGAMOUNT_DEGREE1_MAJOR <- LOGAMOUNT_DEGREE1_MAJOR[order(LOGAMOUNT_DEGREE1_MAJOR$LOGAMOUNT, decreasing = TRUE),])
## DEGREE1_MAJOR LOGAMOUNT
## 27 MISSING 3.408237
## 16 Industrial Management 3.342423
## 29 Office Administration, Secretarial 3.249259
## 7 Chemical Engineering 3.182744
## 9 Electrical Engineering 3.113943
## 36 Retailing 3.100371
## 17 Insurance 2.887076
## 26 Mechanical Engineering 2.867200
## 18 Journalism 2.855693
## 15 Industrial Engineering 2.748188
## 30 Personnel Management 2.744275
## 40 Unknown 2.732394
## 1 Accounting 2.703291
## 19 Law 2.703270
## 35 Real Estate & Urban Development 2.698970
## 5 Business Education 2.694605
## 34 RARE 2.687832
## 33 Public Administration 2.673909
## 25 Mathematics 2.672098
## 39 Transportation/Logistics 2.662758
## 28 Office Administration 2.593760
## 2 Banking 2.574031
## 32 Psychology 2.574031
## 12 General Business 2.481433
## 3 Business Administration 2.477121
## 11 Finance 2.477121
## 24 Marketing 2.477121
## 31 Political Science 2.469760
## 37 Statistics 2.439333
## 8 Economics 2.397940
## 21 Logistics & Transportation 2.243038
## 22 Management 2.238017
## 23 Management Science 2.060698
## 13 Human Resource Development 2.000000
## 20 Logistics 2.000000
## 10 Enterprise Management 1.922859
## 6 Business Studies 1.875061
## 14 Human Resource Management 1.322219
## 4 BUSINESS ANALYTICS 1.304813
## 38 Supply Chain Management 1.304275
# Looking at donation probability
DONATE_DEGREE1_MAJOR <- aggregate(DONATED~DEGREE1_MAJOR, data = STATUS, FUN = function(x)mean(x=="Yes"))
# Creating a gglot2 object
prob_degree1_major_plot <- ggplot(aes(x = DEGREE1_MAJOR, y = DONATED), data = DONATE_DEGREE1_MAJOR)
prob_degree1_major_plot + geom_bar(stat="identity") + coord_flip()
# Running statistical test
associate(DONATED~DEGREE1_MAJOR, data = STATUS, plot = FALSE, prompt = FALSE, permutations = 500)
## Association between DEGREE1_MAJOR (categorical) and DONATED (categorical):
##
## using 43994 complete cases
## Contingency table:
## y
## x No Yes Total
## Accounting 3775 3587 7362
## Banking 51 103 154
## Business Administration 2556 1157 3713
## BUSINESS ANALYTICS 94 32 126
## Business Education 107 101 208
## Business Studies 85 21 106
## Chemical Engineering 33 28 61
## Economics 815 461 1276
## Electrical Engineering 42 39 81
## Enterprise Management 235 48 283
## Finance 3449 2127 5576
## General Business 998 834 1832
## Human Resource Development 46 15 61
## Human Resource Management 180 53 233
## Industrial Engineering 49 45 94
## Industrial Management 59 141 200
## Insurance 34 60 94
## Journalism 73 126 199
## Law 30 30 60
## Logistics 1191 247 1438
## Logistics & Transportation 1331 536 1867
## Management 1618 830 2448
## Management Science 78 37 115
## Marketing 3997 2472 6469
## Mathematics 37 35 72
## Mechanical Engineering 37 40 77
## MISSING 152 202 354
## Office Administration 54 84 138
## Office Administration, Secretarial 32 46 78
## Personnel Management 214 258 472
## Political Science 37 34 71
## Psychology 63 47 110
## Public Administration 223 236 459
## RARE 686 582 1268
## Real Estate & Urban Development 141 206 347
## Retailing 48 97 145
## Statistics 346 195 541
## Supply Chain Management 355 113 468
## Transportation/Logistics 1101 1162 2263
## Unknown 1432 1643 3075
## Total 25884 18110 43994
##
## Table of Expected Counts:
## No Yes
## Accounting 4331.5 3030.5
## Banking 90.6 63.4
## Business Administration 2184.6 1528.4
## BUSINESS ANALYTICS 74.1 51.9
## Business Education 122.4 85.6
## Business Studies 62.4 43.6
## Chemical Engineering 35.9 25.1
## Economics 750.7 525.3
## Electrical Engineering 47.7 33.3
## Enterprise Management 166.5 116.5
## Finance 3280.7 2295.3
## General Business 1077.9 754.1
## Human Resource Development 35.9 25.1
## Human Resource Management 137.1 95.9
## Industrial Engineering 55.3 38.7
## Industrial Management 117.7 82.3
## Insurance 55.3 38.7
## Journalism 117.1 81.9
## Law 35.3 24.7
## Logistics 846.1 591.9
## Logistics & Transportation 1098.5 768.5
## Management 1440.3 1007.7
## Management Science 67.7 47.3
## Marketing 3806.1 2662.9
## Mathematics 42.4 29.6
## Mechanical Engineering 45.3 31.7
## MISSING 208.3 145.7
## Office Administration 81.2 56.8
## Office Administration, Secretarial 45.9 32.1
## Personnel Management 277.7 194.3
## Political Science 41.8 29.2
## Psychology 64.7 45.3
## Public Administration 270.1 188.9
## RARE 746.0 522.0
## Real Estate & Urban Development 204.2 142.8
## Retailing 85.3 59.7
## Statistics 318.3 222.7
## Supply Chain Management 275.3 192.7
## Transportation/Logistics 1331.4 931.6
## Unknown 1809.2 1265.8
##
## Conditional distributions of y (DONATED) for each level of x (DEGREE1_MAJOR):
## If there is no association, these should look similar to each other and
## similar to the marginal distribution of y
## No Yes
## Accounting 0.5127683 0.4872317
## Banking 0.3311688 0.6688312
## Business Administration 0.6883921 0.3116079
## BUSINESS ANALYTICS 0.7460317 0.2539683
## Business Education 0.5144231 0.4855769
## Business Studies 0.8018868 0.1981132
## Chemical Engineering 0.5409836 0.4590164
## Economics 0.6387147 0.3612853
## Electrical Engineering 0.5185185 0.4814815
## Enterprise Management 0.8303887 0.1696113
## Finance 0.6185438 0.3814562
## General Business 0.5447598 0.4552402
## Human Resource Development 0.7540984 0.2459016
## Human Resource Management 0.7725322 0.2274678
## Industrial Engineering 0.5212766 0.4787234
## Industrial Management 0.2950000 0.7050000
## Insurance 0.3617021 0.6382979
## Journalism 0.3668342 0.6331658
## Law 0.5000000 0.5000000
## Logistics 0.8282337 0.1717663
## Logistics & Transportation 0.7129084 0.2870916
## Management 0.6609477 0.3390523
## Management Science 0.6782609 0.3217391
## Marketing 0.6178698 0.3821302
## Mathematics 0.5138889 0.4861111
## Mechanical Engineering 0.4805195 0.5194805
## MISSING 0.4293785 0.5706215
## Office Administration 0.3913043 0.6086957
## Office Administration, Secretarial 0.4102564 0.5897436
## Personnel Management 0.4533898 0.5466102
## Political Science 0.5211268 0.4788732
## Psychology 0.5727273 0.4272727
## Public Administration 0.4858388 0.5141612
## RARE 0.5410095 0.4589905
## Real Estate & Urban Development 0.4063401 0.5936599
## Retailing 0.3310345 0.6689655
## Statistics 0.6395564 0.3604436
## Supply Chain Management 0.7585470 0.2414530
## Transportation/Logistics 0.4865223 0.5134777
## Unknown 0.4656911 0.5343089
## Marginal 0.5883530 0.4116470
##
## Permutation procedure:
## Discrepancy Estimated p-value
## 1822.342 0
## With 500 permutations, we are 95% confident that:
## the p-value is between 0 and 0.007
## If 0.05 is in this range, change permutations= to a larger number
# Creating a sorted list from largest to smallest and printing it
(DONATE_DEGREE1_MAJOR <- DONATE_DEGREE1_MAJOR[order(DONATE_DEGREE1_MAJOR$DONATE, decreasing = TRUE),])
## DEGREE1_MAJOR DONATED
## 16 Industrial Management 0.7050000
## 36 Retailing 0.6689655
## 2 Banking 0.6688312
## 17 Insurance 0.6382979
## 18 Journalism 0.6331658
## 28 Office Administration 0.6086957
## 35 Real Estate & Urban Development 0.5936599
## 29 Office Administration, Secretarial 0.5897436
## 27 MISSING 0.5706215
## 30 Personnel Management 0.5466102
## 40 Unknown 0.5343089
## 26 Mechanical Engineering 0.5194805
## 33 Public Administration 0.5141612
## 39 Transportation/Logistics 0.5134777
## 19 Law 0.5000000
## 1 Accounting 0.4872317
## 25 Mathematics 0.4861111
## 5 Business Education 0.4855769
## 9 Electrical Engineering 0.4814815
## 31 Political Science 0.4788732
## 15 Industrial Engineering 0.4787234
## 7 Chemical Engineering 0.4590164
## 34 RARE 0.4589905
## 12 General Business 0.4552402
## 32 Psychology 0.4272727
## 24 Marketing 0.3821302
## 11 Finance 0.3814562
## 8 Economics 0.3612853
## 37 Statistics 0.3604436
## 22 Management 0.3390523
## 23 Management Science 0.3217391
## 3 Business Administration 0.3116079
## 21 Logistics & Transportation 0.2870916
## 4 BUSINESS ANALYTICS 0.2539683
## 13 Human Resource Development 0.2459016
## 38 Supply Chain Management 0.2414530
## 14 Human Resource Management 0.2274678
## 6 Business Studies 0.1981132
## 20 Logistics 0.1717663
## 10 Enterprise Management 0.1696113
The differences in average log-donation amounts is statistically significant with a p-value of \(0\) The differences in donation probabilities is statistically significant with a p-value of \(0\). These differences are large. Knowing a particular alumnus’s field of study, we gain valuable information regarding both the probability that they may donate as well as the log amount of donation expected.
The following is a “connecting letters” report for the average log-donation amounts. One takeaway is that there is a statistically significant difference in the log amount of donation between business analytics and statistics majors. However, we must keep in mind that the business analytics has only been offered as a degree for a short period of time. It would be interesting to keep an eye on these differences in the future.
#Connecting letters report
COMPS <- aov(LOGAMOUNT~DEGREE1_MAJOR,data=AMOUNTS)
# Tukey test
TUKEY <- TukeyHSD(COMPS)
# connecting letters report
multcompLetters4(COMPS,TUKEY)
## $DEGREE1_MAJOR
## Industrial Management MISSING
## "ab" "a"
## Retailing Electrical Engineering
## "abc" "abcdef"
## Chemical Engineering Office Administration, Secretarial
## "abcdefghijkl" "abcdefghij"
## Journalism Personnel Management
## "cdefghi" "cdefghi"
## Insurance RARE
## "abcdefghijkl" "cdegh"
## Accounting Law
## "cdg" "abcdefghijkl"
## Mathematics Banking
## "abcdefghijkl" "cdefghijkl"
## Real Estate & Urban Development Unknown
## "cdefghi" "cdg"
## Transportation/Logistics Public Administration
## "cdefghi" "cdefghik"
## Office Administration Industrial Engineering
## "cdefghijkl" "abcdefghijkl"
## Business Education Psychology
## "cdefghijkl" "bcdefghijkl"
## Mechanical Engineering General Business
## "abcdefghijkl" "cdefghik"
## Finance Economics
## "efhik" "defghijkl"
## Marketing Political Science
## "fik" "bcdefghijklm"
## Statistics Business Administration
## "defghijkl" "fijkl"
## Management Science Logistics & Transportation
## "cdefghijklm" "jl"
## Management Enterprise Management
## "jl" "klmn"
## Business Studies Human Resource Development
## "ghijklmno" "cdefghijklmno"
## Logistics BUSINESS ANALYTICS
## "mn" "mno"
## Human Resource Management Supply Chain Management
## "no" "o"