Assignment 2 - Alumni Data Descriptive Analytics (weight 3-15)

Executive Summary part 1

I chose to examine the differences in median donation amounts and donation probabilities based on NONUTDEGREE and STUDENT.LIFE.

For NONUTDEGREE, the differences in median donation amounts is 1864.5 dollars, with alumni holding a degree from a university other than UT having the larger amount. The difference is statistically significant with a p-value of \(1.832\cdot { 10 }^{ -10 }\), but is of little practical significance.

For STUDENT.LIFE, the differences in median donation amounts is 509.4 dollars, with those who participated in student life having the larger amount. The difference is statistically significant with a p-value of \(1.428\cdot { 10 }^{ -111 }\) and is of large practical significance.

For NONUTDEGREE, the differences in donation probabilities is \(0.20\), with alumni holding a degree from a universtiy other than UT being more likely to donate. The difference is statistically significant with a p-value of \(1.589\cdot { 10 }^{ -11 }\) but is of little practical significance.

For STUDENT.LIFE, the differences in donation probabilities is \(0.20\), with those who participated in student life being more likely to donate. The difference is statistically significant with a p-value of \(0\) and is of large practical significance.

setwd("~/Documents/analytics_capstone/homework_2")

library(regclass)

## Loading required package: bestglm

## Loading required package: leaps

## Loading required package: rpart

## Loading required package: rpart.plot

## Loading required package: randomForest

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

## Loading required package: VGAM

## Loading required package: stats4

## Loading required package: splines

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:randomForest':
## 
##     margin

library(multcompView)

Supporting R code for 1st task (variable with 2 levels)

AMOUNTS <- read.csv("alumni-donoramounts.dat")
STATUS <- read.csv("alumni-donorstatus.dat")

#Result of using aggregate(), 
aggregate(LTG2UT~NONUTDEGREE, data = AMOUNTS, FUN = median)

##   NONUTDEGREE LTG2UT
## 1          No  350.0
## 2         Yes 2224.5

# Running statistical test
associate(LTG2UT~NONUTDEGREE, data = AMOUNTS, plot = FALSE, permutations = 0, prompt = FALSE, classic = TRUE)

## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test

## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test

## Association between NONUTDEGREE (categorical) and  LTG2UT (numerical)
##  using 18110 complete cases
## 
## Sample Sizesx
##    No   Yes 
## 17943   167 
## Classic approach (must check assumptions):
##                         No    Yes Approximate p-value
## Averages (ANOVA)     13251 108097            0.002155
## Mean Ranks (Kruskal)  9052   9422           9.532e-16
## Medians                350   2224           1.832e-10
## 
## Tests of assumptions:
##  -matters for Classic approach
##  -can be overly strict, use graphics and judgment if FALSE)
##            Test     pvalue  Pass
##  Equal Variance 0.00226573 FALSE
##    Normality No 0.00000000 FALSE
##   Normality Yes 0.00000000 FALSE
## 
## For classic p-values to be reliable, check the sample sizes below:
## x
##    No   Yes 
## 17943   167 
##   If n < 10, samples must pass test for Normality and Equal Variance
##   If n < 25, distribution must be roughly symmetric and pass test of Equal Variance 
##   If n < 100, distribution can't be extremely skewed 
##   If n > 100, p-values are reliable
##  
## 
## Advice: If it makes sense to compare means (i.e., no extreme outliers and the 
## distributions aren't too skewed), use the ANOVA.  If there there are 
## some obvious extreme outliers but the distributions are roughly symmetric, use 
## Rank test.  Otherwise, use the Median test or rerun the test using, e.g., log10(y) 
## instead of y

# aggregating lifetime gift by student life 
aggregate(LTG2UT~STUDENT.LIFE, data = AMOUNTS, FUN = median)

##   STUDENT.LIFE LTG2UT
## 1            N 240.00
## 2            Y 749.35

# Running statistical test for median
associate(LTG2UT~STUDENT.LIFE, data = AMOUNTS, plot = FALSE, permutations = 0, prompt = FALSE, classic = TRUE)

## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test

## Warning in ks.test(x, "pnorm", mean(x), sd(x)): ties should not be present
## for the Kolmogorov-Smirnov test

## Association between STUDENT.LIFE (categorical) and  LTG2UT (numerical)
##  using 18110 complete cases
## 
## Sample Sizesx
##     N     Y 
## 10638  7472 
## Classic approach (must check assumptions):
##                         N     Y Approximate p-value
## Averages (ANOVA)     8742 21791             0.02971
## Mean Ranks (Kruskal) 8926  9240          2.611e-153
## Medians               240 749.4          1.428e-111
## 
## Tests of assumptions:
##  -matters for Classic approach
##  -can be overly strict, use graphics and judgment if FALSE)
##            Test     pvalue  Pass
##  Equal Variance 0.03128629 FALSE
##     Normality N 0.00000000 FALSE
##     Normality Y 0.00000000 FALSE
## 
## For classic p-values to be reliable, check the sample sizes below:
## x
##     N     Y 
## 10638  7472 
##   If n < 10, samples must pass test for Normality and Equal Variance
##   If n < 25, distribution must be roughly symmetric and pass test of Equal Variance 
##   If n < 100, distribution can't be extremely skewed 
##   If n > 100, p-values are reliable
##  
## 
## Advice: If it makes sense to compare means (i.e., no extreme outliers and the 
## distributions aren't too skewed), use the ANOVA.  If there there are 
## some obvious extreme outliers but the distributions are roughly symmetric, use 
## Rank test.  Otherwise, use the Median test or rerun the test using, e.g., log10(y) 
## instead of y

# Probability of donation by non-UT degree
DONATE_NONUTDEGREE <- aggregate(DONATED~NONUTDEGREE, data = STATUS, FUN = function(x)mean(x=="Yes"))
# Creating a gglot2 object 
prob_nonutdegree_plot <- ggplot(aes(x = NONUTDEGREE, y = DONATED), data = DONATE_NONUTDEGREE)
prob_nonutdegree_plot + geom_bar(stat="identity")

# statistical test
associate(DONATED~NONUTDEGREE, data = STATUS, plot = FALSE, permutations = 500, classic = TRUE)

## You have 43994 observations.  This may take a while.
## If you are sure you want to continue, type y then enter/return  
## 
## Association between NONUTDEGREE (categorical) and  DONATED (categorical):
## 
##  using 43994 complete cases
## Contingency table:
##        y
## x          No   Yes Total
##   No    25779 17943 43722
##   Yes     105   167   272
##   Total 25884 18110 43994
## 
##  Table of Expected Counts:
##        No   Yes
## No  25724 17998
## Yes   160   112
## 
## Conditional distributions of y (DONATED) for each level of x (NONUTDEGREE):
## If there is no association, these should look similar to each other and
##  similar to the marginal distribution of y
##                 No       Yes
## No       0.5896116 0.4103884
## Yes      0.3860294 0.6139706
## Marginal 0.5883530 0.4116470
## 
## Classic approach (must check assumptions):
##   Discrepancy Estimated p-value
##      45.42182      1.588549e-11
## 
## Reminder:  Classic approach requires most expected counts >= 5.  No requirements
##  for permutation approach.

# Probability of donation by student.life
DONATE_STUDENT.LIFE <- aggregate(DONATED~STUDENT.LIFE, data = STATUS, FUN = function(x)mean(x=="Yes"))
# ggplot2 object for probability of donation by student life
prob_student.life_plot <- ggplot(aes(x = STUDENT.LIFE, y = DONATED), data = DONATE_STUDENT.LIFE)
prob_student.life_plot + geom_bar(stat="identity")

# statistical test
associate(DONATED~STUDENT.LIFE, data = STATUS, plot = FALSE, permutations = 500, classic = TRUE)

## You have 43994 observations.  This may take a while.
## If you are sure you want to continue, type y then enter/return  
## 
## Association between STUDENT.LIFE (categorical) and  DONATED (categorical):
## 
##  using 43994 complete cases
## Contingency table:
##        y
## x          No   Yes Total
##   N     21048 10638 31686
##   Y      4836  7472 12308
##   Total 25884 18110 43994
## 
##  Table of Expected Counts:
##        No     Yes
## N 18642.6 13043.4
## Y  7241.4  5066.6
## 
## Conditional distributions of y (DONATED) for each level of x (STUDENT.LIFE):
## If there is no association, these should look similar to each other and
##  similar to the marginal distribution of y
##                 No       Yes
## N        0.6642681 0.3357319
## Y        0.3929152 0.6070848
## Marginal 0.5883530 0.4116470
## 
## Classic approach (must check assumptions):
##   Discrepancy Estimated p-value
##      2693.935                 0
## 
## Reminder:  Classic approach requires most expected counts >= 5.  No requirements
##  for permutation approach.

Note: While a difference in median donation amount between alumni with non-UT degrees and those that do not hold degrees from other universities seems considerable, the comparison is likely unfair. There will undoubtedly be individuals who do not have non-UT degrees who who only hold one degree, while others may have more than one degree (in which case the multiple degrees would be from UT). The comparison is therefore confounded in that one group (those who have non-UT degrees) will certainly have more than one degree, and that the second group (those who do not have non-UT degrees) may or may not hold more than one degree. Similarly, this reasoning follows when considering the probability of an alumnus donating.

Executive Summary part 2

I chose to examine the differences in donation amounts and donation probabilities based on DEGREE1_MAJOR.

The following tables show the average log-donation amount and the donation probabilities, sorted from largest to smallest.

# Looking at log donation amount
# aggregating the data utilizing the over the median
LOGAMOUNT_DEGREE1_MAJOR <- aggregate(LOGAMOUNT~DEGREE1_MAJOR,data=AMOUNTS,FUN=median)  

# Creating a ggplot object 
degree_plot <- ggplot(data = LOGAMOUNT_DEGREE1_MAJOR, aes(x=DEGREE1_MAJOR, y=LOGAMOUNT))
degree_plot + geom_bar(stat="identity") + coord_flip() # Bar plot

# Running statistical test
associate(LOGAMOUNT~DEGREE1_MAJOR, data = AMOUNTS, plot = FALSE, prompt = FALSE, permutations = 500)

## Association between DEGREE1_MAJOR (categorical) and  LOGAMOUNT (numerical)
##  using 18110 complete cases
## 
## Sample Sizesx
##                         Accounting                            Banking 
##                               3587                                103 
##            Business Administration                 BUSINESS ANALYTICS 
##                               1157                                 32 
##                 Business Education                   Business Studies 
##                                101                                 21 
##               Chemical Engineering                          Economics 
##                                 28                                461 
##             Electrical Engineering              Enterprise Management 
##                                 39                                 48 
##                            Finance                   General Business 
##                               2127                                834 
##         Human Resource Development          Human Resource Management 
##                                 15                                 53 
##             Industrial Engineering              Industrial Management 
##                                 45                                141 
##                          Insurance                         Journalism 
##                                 60                                126 
##                                Law                          Logistics 
##                                 30                                247 
##         Logistics & Transportation                         Management 
##                                536                                830 
##                 Management Science                          Marketing 
##                                 37                               2472 
##                        Mathematics             Mechanical Engineering 
##                                 35                                 40 
##                            MISSING              Office Administration 
##                                202                                 84 
## Office Administration, Secretarial               Personnel Management 
##                                 46                                258 
##                  Political Science                         Psychology 
##                                 34                                 47 
##              Public Administration                               RARE 
##                                236                                582 
##    Real Estate & Urban Development                          Retailing 
##                                206                                 97 
##                         Statistics            Supply Chain Management 
##                                195                                113 
##           Transportation/Logistics                            Unknown 
##                               1162                               1643 
## 
## Permutation procedure:
##                      Accounting Banking Business Administration
## Averages (ANOVA)          2.776   2.767                   2.577
## Mean Ranks (Kruskal)       9126    9772                    8737
## Medians                   2.703   2.574                   2.477
##                      BUSINESS ANALYTICS Business Education
## Averages (ANOVA)                  1.636              2.708
## Mean Ranks (Kruskal)               8913               7717
## Medians                           1.305              2.695
##                      Business Studies Chemical Engineering Economics
## Averages (ANOVA)                2.005                2.979      2.61
## Mean Ranks (Kruskal)            10965                 8971      9124
## Medians                         1.875                3.183     2.398
##                      Electrical Engineering Enterprise Management Finance
## Averages (ANOVA)                      3.093                 2.089   2.625
## Mean Ranks (Kruskal)                   8949                  8098    9068
## Medians                               3.114                 1.923   2.477
##                      General Business Human Resource Development
## Averages (ANOVA)                2.681                       1.99
## Mean Ranks (Kruskal)             9078                       6574
## Medians                         2.481                          2
##                      Human Resource Management Industrial Engineering
## Averages (ANOVA)                         1.581                  2.717
## Mean Ranks (Kruskal)                      9833                   7963
## Medians                                  1.322                  2.748
##                      Industrial Management Insurance Journalism   Law
## Averages (ANOVA)                     3.361     2.802      2.848 2.774
## Mean Ranks (Kruskal)                  9613      9752       9376 10113
## Medians                              3.342     2.887      2.856 2.703
##                      Logistics Logistics & Transportation Management
## Averages (ANOVA)         1.944                      2.401        2.4
## Mean Ranks (Kruskal)      7829                       9528       9222
## Medians                      2                      2.243      2.238
##                      Management Science Marketing Mathematics
## Averages (ANOVA)                  2.484     2.605       2.772
## Mean Ranks (Kruskal)               7879      9024        9168
## Medians                           2.061     2.477       2.672
##                      Mechanical Engineering MISSING Office Administration
## Averages (ANOVA)                      2.694    3.36                  2.72
## Mean Ranks (Kruskal)                   9635    8841                  9056
## Medians                               2.867   3.408                 2.594
##                      Office Administration, Secretarial
## Averages (ANOVA)                                  2.941
## Mean Ranks (Kruskal)                               8288
## Medians                                           3.249
##                      Personnel Management Political Science Psychology
## Averages (ANOVA)                    2.841               2.6      2.699
## Mean Ranks (Kruskal)                 8866              8804       8372
## Medians                             2.744              2.47      2.574
##                      Public Administration  RARE
## Averages (ANOVA)                     2.722 2.794
## Mean Ranks (Kruskal)                  8248  9346
## Medians                              2.674 2.688
##                      Real Estate & Urban Development Retailing Statistics
## Averages (ANOVA)                               2.761       3.1        2.6
## Mean Ranks (Kruskal)                            9374      9633       9069
## Medians                                        2.699       3.1      2.439
##                      Supply Chain Management Transportation/Logistics
## Averages (ANOVA)                       1.335                    2.739
## Mean Ranks (Kruskal)                    9178                     9206
## Medians                                1.304                    2.663
##                      Unknown Discrepancy Estimated p-value
## Averages (ANOVA)        2.76       20.97                 0
## Mean Ranks (Kruskal)    8965       818.6                 0
## Medians                2.732       612.6                 0
## With 500 permutations, we are 95% confident that
##  the p-value of ANOVA (means) is between 0 and 0.007 
##  the p-value of Kruskal-Wallis (ranks) is between 0 and 0.007 
##  the p-value of median test is between 0 and 0.007 
## Note:  If 0.05 is in a range, change permutations= to a larger number
## 
## 
## 
## Advice: If it makes sense to compare means (i.e., no extreme outliers and the 
## distributions aren't too skewed), use the ANOVA.  If there there are 
## some obvious extreme outliers but the distributions are roughly symmetric, use 
## Rank test.  Otherwise, use the Median test or rerun the test using, e.g., log10(y) 
## instead of y

# Creating a sorted list from largest to smallest and printing it
(LOGAMOUNT_DEGREE1_MAJOR <- LOGAMOUNT_DEGREE1_MAJOR[order(LOGAMOUNT_DEGREE1_MAJOR$LOGAMOUNT, decreasing = TRUE),])

##                         DEGREE1_MAJOR LOGAMOUNT
## 27                            MISSING  3.408237
## 16              Industrial Management  3.342423
## 29 Office Administration, Secretarial  3.249259
## 7                Chemical Engineering  3.182744
## 9              Electrical Engineering  3.113943
## 36                          Retailing  3.100371
## 17                          Insurance  2.887076
## 26             Mechanical Engineering  2.867200
## 18                         Journalism  2.855693
## 15             Industrial Engineering  2.748188
## 30               Personnel Management  2.744275
## 40                            Unknown  2.732394
## 1                          Accounting  2.703291
## 19                                Law  2.703270
## 35    Real Estate & Urban Development  2.698970
## 5                  Business Education  2.694605
## 34                               RARE  2.687832
## 33              Public Administration  2.673909
## 25                        Mathematics  2.672098
## 39           Transportation/Logistics  2.662758
## 28              Office Administration  2.593760
## 2                             Banking  2.574031
## 32                         Psychology  2.574031
## 12                   General Business  2.481433
## 3             Business Administration  2.477121
## 11                            Finance  2.477121
## 24                          Marketing  2.477121
## 31                  Political Science  2.469760
## 37                         Statistics  2.439333
## 8                           Economics  2.397940
## 21         Logistics & Transportation  2.243038
## 22                         Management  2.238017
## 23                 Management Science  2.060698
## 13         Human Resource Development  2.000000
## 20                          Logistics  2.000000
## 10              Enterprise Management  1.922859
## 6                    Business Studies  1.875061
## 14          Human Resource Management  1.322219
## 4                  BUSINESS ANALYTICS  1.304813
## 38            Supply Chain Management  1.304275

# Looking at donation probability
DONATE_DEGREE1_MAJOR <- aggregate(DONATED~DEGREE1_MAJOR, data = STATUS, FUN = function(x)mean(x=="Yes"))

# Creating a gglot2 object 
prob_degree1_major_plot <- ggplot(aes(x = DEGREE1_MAJOR, y = DONATED), data = DONATE_DEGREE1_MAJOR)
prob_degree1_major_plot + geom_bar(stat="identity") + coord_flip()

# Running statistical test
associate(DONATED~DEGREE1_MAJOR, data = STATUS, plot = FALSE, prompt = FALSE, permutations = 500)

## Association between DEGREE1_MAJOR (categorical) and  DONATED (categorical):
## 
##  using 43994 complete cases
## Contingency table:
##                                     y
## x                                       No   Yes Total
##   Accounting                          3775  3587  7362
##   Banking                               51   103   154
##   Business Administration             2556  1157  3713
##   BUSINESS ANALYTICS                    94    32   126
##   Business Education                   107   101   208
##   Business Studies                      85    21   106
##   Chemical Engineering                  33    28    61
##   Economics                            815   461  1276
##   Electrical Engineering                42    39    81
##   Enterprise Management                235    48   283
##   Finance                             3449  2127  5576
##   General Business                     998   834  1832
##   Human Resource Development            46    15    61
##   Human Resource Management            180    53   233
##   Industrial Engineering                49    45    94
##   Industrial Management                 59   141   200
##   Insurance                             34    60    94
##   Journalism                            73   126   199
##   Law                                   30    30    60
##   Logistics                           1191   247  1438
##   Logistics & Transportation          1331   536  1867
##   Management                          1618   830  2448
##   Management Science                    78    37   115
##   Marketing                           3997  2472  6469
##   Mathematics                           37    35    72
##   Mechanical Engineering                37    40    77
##   MISSING                              152   202   354
##   Office Administration                 54    84   138
##   Office Administration, Secretarial    32    46    78
##   Personnel Management                 214   258   472
##   Political Science                     37    34    71
##   Psychology                            63    47   110
##   Public Administration                223   236   459
##   RARE                                 686   582  1268
##   Real Estate & Urban Development      141   206   347
##   Retailing                             48    97   145
##   Statistics                           346   195   541
##   Supply Chain Management              355   113   468
##   Transportation/Logistics            1101  1162  2263
##   Unknown                             1432  1643  3075
##   Total                              25884 18110 43994
## 
##  Table of Expected Counts:
##                                        No    Yes
## Accounting                         4331.5 3030.5
## Banking                              90.6   63.4
## Business Administration            2184.6 1528.4
## BUSINESS ANALYTICS                   74.1   51.9
## Business Education                  122.4   85.6
## Business Studies                     62.4   43.6
## Chemical Engineering                 35.9   25.1
## Economics                           750.7  525.3
## Electrical Engineering               47.7   33.3
## Enterprise Management               166.5  116.5
## Finance                            3280.7 2295.3
## General Business                   1077.9  754.1
## Human Resource Development           35.9   25.1
## Human Resource Management           137.1   95.9
## Industrial Engineering               55.3   38.7
## Industrial Management               117.7   82.3
## Insurance                            55.3   38.7
## Journalism                          117.1   81.9
## Law                                  35.3   24.7
## Logistics                           846.1  591.9
## Logistics & Transportation         1098.5  768.5
## Management                         1440.3 1007.7
## Management Science                   67.7   47.3
## Marketing                          3806.1 2662.9
## Mathematics                          42.4   29.6
## Mechanical Engineering               45.3   31.7
## MISSING                             208.3  145.7
## Office Administration                81.2   56.8
## Office Administration, Secretarial   45.9   32.1
## Personnel Management                277.7  194.3
## Political Science                    41.8   29.2
## Psychology                           64.7   45.3
## Public Administration               270.1  188.9
## RARE                                746.0  522.0
## Real Estate & Urban Development     204.2  142.8
## Retailing                            85.3   59.7
## Statistics                          318.3  222.7
## Supply Chain Management             275.3  192.7
## Transportation/Logistics           1331.4  931.6
## Unknown                            1809.2 1265.8
## 
## Conditional distributions of y (DONATED) for each level of x (DEGREE1_MAJOR):
## If there is no association, these should look similar to each other and
##  similar to the marginal distribution of y
##                                           No       Yes
## Accounting                         0.5127683 0.4872317
## Banking                            0.3311688 0.6688312
## Business Administration            0.6883921 0.3116079
## BUSINESS ANALYTICS                 0.7460317 0.2539683
## Business Education                 0.5144231 0.4855769
## Business Studies                   0.8018868 0.1981132
## Chemical Engineering               0.5409836 0.4590164
## Economics                          0.6387147 0.3612853
## Electrical Engineering             0.5185185 0.4814815
## Enterprise Management              0.8303887 0.1696113
## Finance                            0.6185438 0.3814562
## General Business                   0.5447598 0.4552402
## Human Resource Development         0.7540984 0.2459016
## Human Resource Management          0.7725322 0.2274678
## Industrial Engineering             0.5212766 0.4787234
## Industrial Management              0.2950000 0.7050000
## Insurance                          0.3617021 0.6382979
## Journalism                         0.3668342 0.6331658
## Law                                0.5000000 0.5000000
## Logistics                          0.8282337 0.1717663
## Logistics & Transportation         0.7129084 0.2870916
## Management                         0.6609477 0.3390523
## Management Science                 0.6782609 0.3217391
## Marketing                          0.6178698 0.3821302
## Mathematics                        0.5138889 0.4861111
## Mechanical Engineering             0.4805195 0.5194805
## MISSING                            0.4293785 0.5706215
## Office Administration              0.3913043 0.6086957
## Office Administration, Secretarial 0.4102564 0.5897436
## Personnel Management               0.4533898 0.5466102
## Political Science                  0.5211268 0.4788732
## Psychology                         0.5727273 0.4272727
## Public Administration              0.4858388 0.5141612
## RARE                               0.5410095 0.4589905
## Real Estate & Urban Development    0.4063401 0.5936599
## Retailing                          0.3310345 0.6689655
## Statistics                         0.6395564 0.3604436
## Supply Chain Management            0.7585470 0.2414530
## Transportation/Logistics           0.4865223 0.5134777
## Unknown                            0.4656911 0.5343089
## Marginal                           0.5883530 0.4116470
## 
## Permutation procedure:
##   Discrepancy Estimated p-value
##      1822.342                 0
## With 500 permutations, we are 95% confident that:
##  the p-value is between 0 and 0.007 
## If 0.05 is in this range, change permutations= to a larger number

# Creating a sorted list from largest to smallest and printing it
(DONATE_DEGREE1_MAJOR <- DONATE_DEGREE1_MAJOR[order(DONATE_DEGREE1_MAJOR$DONATE, decreasing = TRUE),])

##                         DEGREE1_MAJOR   DONATED
## 16              Industrial Management 0.7050000
## 36                          Retailing 0.6689655
## 2                             Banking 0.6688312
## 17                          Insurance 0.6382979
## 18                         Journalism 0.6331658
## 28              Office Administration 0.6086957
## 35    Real Estate & Urban Development 0.5936599
## 29 Office Administration, Secretarial 0.5897436
## 27                            MISSING 0.5706215
## 30               Personnel Management 0.5466102
## 40                            Unknown 0.5343089
## 26             Mechanical Engineering 0.5194805
## 33              Public Administration 0.5141612
## 39           Transportation/Logistics 0.5134777
## 19                                Law 0.5000000
## 1                          Accounting 0.4872317
## 25                        Mathematics 0.4861111
## 5                  Business Education 0.4855769
## 9              Electrical Engineering 0.4814815
## 31                  Political Science 0.4788732
## 15             Industrial Engineering 0.4787234
## 7                Chemical Engineering 0.4590164
## 34                               RARE 0.4589905
## 12                   General Business 0.4552402
## 32                         Psychology 0.4272727
## 24                          Marketing 0.3821302
## 11                            Finance 0.3814562
## 8                           Economics 0.3612853
## 37                         Statistics 0.3604436
## 22                         Management 0.3390523
## 23                 Management Science 0.3217391
## 3             Business Administration 0.3116079
## 21         Logistics & Transportation 0.2870916
## 4                  BUSINESS ANALYTICS 0.2539683
## 13         Human Resource Development 0.2459016
## 38            Supply Chain Management 0.2414530
## 14          Human Resource Management 0.2274678
## 6                    Business Studies 0.1981132
## 20                          Logistics 0.1717663
## 10              Enterprise Management 0.1696113

The differences in average log-donation amounts is statistically significant with a p-value of \(0\) The differences in donation probabilities is statistically significant with a p-value of \(0\). These differences are large. Knowing a particular alumnus’s field of study, we gain valuable information regarding both the probability that they may donate as well as the log amount of donation expected.

The following is a “connecting letters” report for the average log-donation amounts. One takeaway is that there is a statistically significant difference in the log amount of donation between business analytics and statistics majors. However, we must keep in mind that the business analytics has only been offered as a degree for a short period of time. It would be interesting to keep an eye on these differences in the future.

#Connecting letters report
COMPS <- aov(LOGAMOUNT~DEGREE1_MAJOR,data=AMOUNTS)  
# Tukey test
TUKEY <- TukeyHSD(COMPS)
# connecting letters report
multcompLetters4(COMPS,TUKEY)

## $DEGREE1_MAJOR
##              Industrial Management                            MISSING 
##                               "ab"                                "a" 
##                          Retailing             Electrical Engineering 
##                              "abc"                           "abcdef" 
##               Chemical Engineering Office Administration, Secretarial 
##                     "abcdefghijkl"                       "abcdefghij" 
##                         Journalism               Personnel Management 
##                          "cdefghi"                          "cdefghi" 
##                          Insurance                               RARE 
##                     "abcdefghijkl"                            "cdegh" 
##                         Accounting                                Law 
##                              "cdg"                     "abcdefghijkl" 
##                        Mathematics                            Banking 
##                     "abcdefghijkl"                       "cdefghijkl" 
##    Real Estate & Urban Development                            Unknown 
##                          "cdefghi"                              "cdg" 
##           Transportation/Logistics              Public Administration 
##                          "cdefghi"                         "cdefghik" 
##              Office Administration             Industrial Engineering 
##                       "cdefghijkl"                     "abcdefghijkl" 
##                 Business Education                         Psychology 
##                       "cdefghijkl"                      "bcdefghijkl" 
##             Mechanical Engineering                   General Business 
##                     "abcdefghijkl"                         "cdefghik" 
##                            Finance                          Economics 
##                            "efhik"                        "defghijkl" 
##                          Marketing                  Political Science 
##                              "fik"                     "bcdefghijklm" 
##                         Statistics            Business Administration 
##                        "defghijkl"                            "fijkl" 
##                 Management Science         Logistics & Transportation 
##                      "cdefghijklm"                               "jl" 
##                         Management              Enterprise Management 
##                               "jl"                             "klmn" 
##                   Business Studies         Human Resource Development 
##                        "ghijklmno"                    "cdefghijklmno" 
##                          Logistics                 BUSINESS ANALYTICS 
##                               "mn"                              "mno" 
##          Human Resource Management            Supply Chain Management 
##                               "no"                                "o"

Assignment 2 - Alumni Data Descriptive Analytics (weight 3-15)

Todd Young

due Thursday 2/4 by 5pm

Executive Summary part 1

Supporting R code for 1st task (variable with 2 levels)

Executive Summary part 2