To perform this analysis, you will need a csv file with the unique IDs for your alumni as well as their majors and if they are a donor or not as a binary variable (1 or 0). You can also add in any other variables that you want to include.
First, load the ggplot2 library (use install.packages(“ggplot2”) if you don’t have this package yet. We will use this for plotting. Then, installor load RCurl if you are using my dataset in order to pull the csv off of GitHub:
library(ggplot2)
library(RCurl)
## Loading required package: bitops
Then, read the file into R:
x <- getURL("https://raw.githubusercontent.com/michaelpawlus/giving_by_major_comparison/master/alum_donor_factors_no_names.csv")
ad <- read.csv(text = x)
Use the summary function to look for any outliers:
summary(ad) ## summary of the data
## coreid aff_total coresex coreprefyr Degree
## Min. : 1923 Min. : 0.00 F:1562 Min. :1967 BS :1179
## 1st Qu.:102878 1st Qu.: 4.00 M:1316 1st Qu.:1986 BBA : 472
## Median :129104 Median : 7.00 Median :2000 BA : 364
## Mean :310682 Mean : 10.17 Mean :1997 MED : 239
## 3rd Qu.:597856 3rd Qu.: 11.00 3rd Qu.:2009 MBA : 127
## Max. :727273 Max. :248.00 Max. :2014 MPA : 83
## (Other): 414
## Major fy14 donor
## Business General : 148 Min. : 0.27 Min. :1
## Accounting : 147 1st Qu.: 25.00 1st Qu.:1
## General Education : 139 Median : 50.00 Median :1
## Public Administration: 118 Mean : 351.06 Mean :1
## Nursing : 108 3rd Qu.: 200.00 3rd Qu.:1
## Finance : 105 Max. :51245.54 Max. :1
## (Other) :2113
I want to use affinity score or engagement score for my x-axis to see the impact of affinity on giving so I am going to trim my dataset so that I don’t get heavily skewed results from outliers and also to improve readability.
ad <- ad[ad$aff_total<16,]
Next, let’s create a table of all the majors in the dataset to see which ones have the most representations and then pull a top ten list.
maj_tbl <- aggregate(coreid ~ Major, data = ad, length)
head(maj_tbl[order(-maj_tbl$coreid),], n= 10)
## Major coreid
## 2 Accounting 129
## 47 General Education 116
## 18 Business General 109
## 92 Nursing 94
## 105 Psychology 92
## 84 Management 89
## 45 Finance 88
## 85 Marketing 85
## 106 Public Administration 85
## 100 Physical Education 83
From this table, pull a few majors to compare. I will select three.
ad <- ad[ad$Major %in% c("Accounting","Nursing","Psychology"),]
Then, use a faceted plot to compare giving between majors as well as the relative impact of engagement.
qplot(aff_total, log10(fy14), data = ad, color = Major, facets=~Major) + stat_smooth(method="lm")