Loading the Data

The first step is to load the data. The data I will be using is related to a math placement exam from a Liberal Arts college.

mathP <- read.csv(url("https://raw.githubusercontent.com/lysanthus/CUNYDSBridge/master/MathPlacement.csv"),header=TRUE,sep=',',quote="\"")

The data loads into a data frame. We can look at the structure of each variable in the frame by:

str(mathP)
## 'data.frame':    2696 obs. of  17 variables:
##  $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Student      : int  625 628 629 630 634 636 638 641 643 645 ...
##  $ Gender       : int  0 0 1 0 1 0 0 0 0 0 ...
##  $ PSATM        : int  56 57 NA 53 NA 63 42 52 51 60 ...
##  $ SATM         : int  56 NA 62 NA 64 68 NA NA 58 NA ...
##  $ ACTM         : int  25 23 27 27 31 NA 23 24 NA 26 ...
##  $ Rank         : int  1 1 42 6 72 96 38 72 51 215 ...
##  $ Size         : int  420 85 421 75 462 518 382 480 703 524 ...
##  $ GPAadj       : int  40 40 38 38 35 34 37 37 34 32 ...
##  $ PlcmtScore   : int  23 21 20 20 19 18 18 17 17 16 ...
##  $ Recommends   : Factor w/ 9 levels "R0","R01","R1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Course       : int  210 117 117 117 114 117 117 117 117 117 ...
##  $ Grade        : Factor w/ 17 levels "","A","A-","A+",..: 2 2 3 5 2 7 3 7 3 5 ...
##  $ RecTaken     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ TooHigh      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ TooLow       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CourseSuccess: int  1 1 1 1 1 1 1 1 1 1 ...

…and then summarize the data by:

summary(mathP)
##        X             Student           Gender           PSATM      
##  Min.   :   1.0   Min.   :   2.0   Min.   :0.0000   Min.   : 0.00  
##  1st Qu.: 674.8   1st Qu.: 925.8   1st Qu.:0.0000   1st Qu.:54.00  
##  Median :1348.5   Median :1953.0   Median :0.0000   Median :59.00  
##  Mean   :1348.5   Mean   :1942.7   Mean   :0.4586   Mean   :58.14  
##  3rd Qu.:2022.2   3rd Qu.:2968.2   3rd Qu.:1.0000   3rd Qu.:65.00  
##  Max.   :2696.0   Max.   :4067.0   Max.   :1.0000   Max.   :80.00  
##                                    NA's   :2116     NA's   :1560   
##       SATM           ACTM            Rank             Size      
##  Min.   :35.0   Min.   :13.00   Min.   :  0.00   Min.   :  0.0  
##  1st Qu.:58.0   1st Qu.:25.00   1st Qu.:  7.00   1st Qu.:177.0  
##  Median :63.0   Median :27.00   Median : 28.00   Median :322.0  
##  Mean   :62.6   Mean   :26.98   Mean   : 51.01   Mean   :323.5  
##  3rd Qu.:68.0   3rd Qu.:30.00   3rd Qu.: 73.00   3rd Qu.:455.0  
##  Max.   :80.0   Max.   :36.00   Max.   :530.00   Max.   :888.0  
##  NA's   :1460   NA's   :322     NA's   :196      NA's   :179    
##      GPAadj        PlcmtScore       Recommends       Course     
##  Min.   : 0.00   Min.   :-18.00   R1     :1132   Min.   :109.0  
##  1st Qu.:33.00   1st Qu.: 26.00   R2     : 487   1st Qu.:120.0  
##  Median :37.00   Median : 33.00   R4     : 308   Median :120.0  
##  Mean   :35.73   Mean   : 32.44   R01    : 240   Mean   :123.4  
##  3rd Qu.:39.00   3rd Qu.: 39.00   R8     : 215   3rd Qu.:122.0  
##  Max.   :40.00   Max.   : 59.00   R0     : 177   Max.   :398.0  
##  NA's   :20      NA's   :35       (Other): 137                  
##      Grade        RecTaken         TooHigh          TooLow       
##         :562   Min.   :0.0000   Min.   :0.000   Min.   :0.00000  
##  A      :439   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.00000  
##  B      :380   Median :1.0000   Median :1.000   Median :0.00000  
##  B+     :301   Mean   :0.6855   Mean   :0.569   Mean   :0.02003  
##  A-     :293   3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:0.00000  
##  B-     :184   Max.   :1.0000   Max.   :1.000   Max.   :1.00000  
##  (Other):537                                                     
##  CourseSuccess   
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :1.0000  
##  Mean   :0.6768  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
##  NA's   :567

Let’s compare the mean and median of SAT and ACT math scores:

meanSAT <- round(mean(mathP$SATM, na.rm=TRUE),2)
medianSAT <- round(median(mathP$SATM, na.rm=TRUE),2)

meanACT <- round(mean(mathP$ACTM, na.rm=TRUE),2)
medianACT <- round(median(mathP$ACTM, na.rm=TRUE),2)

We see that the SAT scores have a mean of 62.6 and a median of 63, while the ACT scores have a mean of 26.98 and a median of 27


Now, let’s split the data byy those who graduated in small classes, say the smallest 50%, and the larger classes (> 50%):

mathP.small <- mathP[which(mathP$Size<=quantile(mathP$Size,0.50,na.rm=TRUE)),]
mathP.large <- mathP[which(mathP$Size>quantile(mathP$Size,0.50,na.rm=TRUE)),]

…and see if the mean and medians differ much.

meanSAT.small <- round(mean(mathP.small$SATM,na.rm=TRUE),2)
medianSAT.small <- round(median(mathP.small$SATM,na.rm=TRUE),2)
meanSAT.large <- round(mean(mathP.large$SATM,na.rm=TRUE),2)
medianSAT.large <- round(median(mathP.large$SATM,na.rm=TRUE),2)

meanACT.small <- round(mean(mathP.small$ACTM,na.rm=TRUE),2)
medianACT.small <- round(median(mathP.small$ACTM,na.rm=TRUE),2)
meanACT.large <- round(mean(mathP.large$ACTM,na.rm=TRUE),2)
medianACT.large <- round(median(mathP.large$ACTM,na.rm=TRUE),2)

We see that the SAT scores from students in the smaller schools have a mean of 62.66 and a median of 63, while the ACT scores have a mean of 26.69 and a median of 27.

We can compare the smaller schools to the larger ones:

Group SAT Mean SAT Median ACT Mean ACT Median
Large Classes 62.56 63 27.26 27
Small Classes 62.66 63 26.69 27

Looking at the variable “Grade”,

levels(mathP$Grade)
##  [1] ""   "A"  "A-" "A+" "B"  "B-" "B+" "C"  "C-" "C+" "D"  "D-" "D+" "F" 
## [15] "I"  "S"  "W"

…we see that most are the typical letter grades given to students. Some, such as “W”, “I”, or “S”, though, represent other situations. Perhaps we prefer to make them a bit more descriptive:

# Add the new factors
levels(mathP.small$Grade) <- c(levels(mathP.small$Grade),c("Incomplete","Withdrawal","Satisfactory"))

levels(mathP.small$Grade)
##  [1] ""             "A"            "A-"           "A+"          
##  [5] "B"            "B-"           "B+"           "C"           
##  [9] "C-"           "C+"           "D"            "D-"          
## [13] "D+"           "F"            "I"            "S"           
## [17] "W"            "Incomplete"   "Withdrawal"   "Satisfactory"
# Change the values
mathP.small$Grade[which(mathP.small$Grade == "I")] <- "Incomplete"
mathP.small$Grade[which(mathP.small$Grade == "W")] <- "Withdrawal"
mathP.small$Grade[which(mathP.small$Grade == "S")] <- "Satisfactory"

mathP.small[which(mathP.small$Grade == "Incomplete" | mathP.small$Grade == "Withdrawal" | mathP.small$Grade == "Satisfactory"),]
##       X Student Gender PSATM SATM ACTM Rank Size GPAadj PlcmtScore
## 334 334    2282     NA    NA   NA   25    0    0     29         21
## 836 836    1390     NA    58   NA   29   12  123     38         36
## 854 854    1528     NA    55   NA   25   92  320     31         22
## 968 968    1933     NA    64   NA   27   50  262     37         30
##     Recommends Course        Grade RecTaken TooHigh TooLow CourseSuccess
## 334        R01    120   Incomplete        0       1      0             0
## 836         R1    120 Satisfactory        1       1      0            NA
## 854         R1    117 Satisfactory        0       0      0            NA
## 968         R1    120 Satisfactory        1       1      0            NA

We could also the the same for the larger class size subset if we wanted.