Description of the data

Let’s pursue Example 3 from above. We have a hypothetical data file, intreg_data.dta with 30 observations. The GPA score is represented by two values, the lower interval score (lgpa) and the upper interval score (ugpa).

The writing test scores, the teacher rating and the type of program (a nominal variable which has three levels) are write, rating and type, respectively.

Let’s look at the data. It is always a good idea to start with descriptive statistics.

Edudata <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/ggplot2/Education.csv")
# summary of the variables
summary(Edudata)
##        id             lgpa          ugpa           write           rating     
##  Min.   : 1.00   Min.   :0.0   Min.   :2.000   Min.   : 50.0   Min.   :48.00  
##  1st Qu.: 8.25   1st Qu.:2.0   1st Qu.:2.500   1st Qu.: 70.0   1st Qu.:51.62  
##  Median :15.50   Median :2.5   Median :3.000   Median :105.0   Median :54.00  
##  Mean   :15.50   Mean   :2.6   Mean   :3.097   Mean   :113.8   Mean   :57.53  
##  3rd Qu.:22.75   3rd Qu.:3.3   3rd Qu.:3.700   3rd Qu.:153.8   3rd Qu.:66.25  
##  Max.   :30.00   Max.   :3.8   Max.   :4.000   Max.   :205.0   Max.   :72.00  
##      type          
##  Length:30         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
## 
# bivariate plots
#ggpairs(Edudata [, -1], lower = list(combo = "box"), upper = list(combo = "blank"))
# for the regression surface
f <- function(x, y, type = "vocational") {
    newdat <- data.frame(write = x, rating = y, type = factor(type, levels = levels(Edudata$type)))
    predict(m, newdata = newdat)
}

# Create X, Y, and Z grids
X <- with(Edudata , seq(from = min(write), to = max(write), length.out = 10))
Y <- with(Edudata , seq(from = min(rating), to = max(rating), length.out = 10))
Z <- outer(X, Y, f)
# Create 3d scatter plot and add the regression surface
open3d(windowRect = c(100, 100, 700, 700))
## [1] 1
with(Edudata , plot3d(x = write, y = rating, z = ugpa, xlab = "write", ylab = "rating", 
    zlab = "ugpa", xlim = range(write), ylim = range(rating), zlim = range(ugpa)))
par3d(ignoreExtent = TRUE)
# add regression surface for each type of program in a different colour
# with 50 percent transparency (alpha = .5)
surface3d(X, Y, outer(X, Y, f, type = "vocational"), col = "blue", alpha = 0.5)
surface3d(X, Y, outer(X, Y, f, type = "general"), col = "red", alpha = 0.5)
surface3d(X, Y, outer(X, Y, f, type = "academic"), col = "green", alpha = 0.5)

# create an animated movie movie3d(spin3d(axis=c(.5,.5,.5), rpm=5),
# duration=6, dir = 'intreg_fig')