Exam 2

Use R to complete all of the following questions. Provide the instructor with the output from your code as either screenshots pasted in Word, or as output generated in an HTML document. Submit both your code and output in Brightspace. Make sure that all textual explanations match the output that you provide the instructor.

Part 1: Predictive Models on the Iris Data Set

Load the data frame called iris in R. Then type the name of the data frame to view it in R.

data(iris)
iris

##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica

Randomly sample 80% of the rows in the iris data frame to create a training set. Create a testing set containing the rest of the rows in the iris data frame.

index <- sample(nrow(iris), nrow(iris)*0.80)
iris_train <- iris[index,]
iris_test <- iris[-index,]

The iris dataset contains sepal and petal measurements for three species of flowers. Using 500 bootstrapped sets, develop a bagging model on your training set to predict a flower’s species based on its sepal and petal measurements.

#library needed 
library(ipred)

#fitting the model
iris_bag <- bagging(formula = Species~., data = iris_train, nbagg = 500) 
#100 bootstrap samples, and we fit a tree model to each of these 100 bootstrap samples. Final is the average of the 100 trees. 
#nbag is the number of bootstraps we want
iris_bag

## 
## Bagging classification trees with 500 bootstrap replications 
## 
## Call: bagging.data.frame(formula = Species ~ ., data = iris_train, 
##     nbagg = 500)

What is the out-of-bag error for your model?

iris_bag_oob <- bagging(formula = Species~.,
                          data = iris_train,
                          coob = T,
                          nbagg = 500) #coob = T means that it will automatically calculate the out-of-bag prediction error, finds the mean squared error
iris_bag_oob

## 
## Bagging classification trees with 500 bootstrap replications 
## 
## Call: bagging.data.frame(formula = Species ~ ., data = iris_train, 
##     coob = T, nbagg = 500)
## 
## Out-of-bag estimate of misclassification error:  0.075

The original output screenshot is included below:

The out-of-bag error is 0.0583.

Use your model to make predictions for the observations in your testing set.

iris_bag_pred <- predict(iris_bag, newdata = iris_test)
iris_bag_pred

##  [1] setosa     setosa     setosa     setosa     setosa     setosa    
##  [7] setosa     setosa     setosa     setosa     setosa     versicolor
## [13] versicolor versicolor versicolor versicolor versicolor versicolor
## [19] versicolor versicolor versicolor versicolor versicolor virginica 
## [25] virginica  virginica  virginica  virginica  virginica  virginica 
## Levels: setosa versicolor virginica

The observations are shown above: 7 predictions for setosa flowers, 14 predictions for versicolor flowers, and 9 predictions for virginica flowers in the testing set.

Using 500 trees, develop a random forest model on your training set to predict a flower’s species based on its sepal and petal measurements.

#library needed
library(randomForest)

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

iris_rf <- randomForest(Species~., data=iris_train, importance = TRUE, ntree = 500)
iris_rf

## 
## Call:
##  randomForest(formula = Species ~ ., data = iris_train, importance = TRUE,      ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 6.67%
## Confusion matrix:
##            setosa versicolor virginica class.error
## setosa         39          0         0  0.00000000
## versicolor      0         35         3  0.07894737
## virginica       0          5        38  0.11627907

The following displays the origninal output generated:

What is the out-of-bag error for your model?

The out-of-bag-error for this model is 5.83%.

Based on the out-of-bag error, is the random forest better at predicting flower species than the bagging model? Or is the bagging model better than the random forest? Or do both models seem to perform with about the same accuracy?

Both models have the same out-of-bag error for both of the models generated for my training and testing set. It is likely that as the training and testing set change the out-of-bag error will change, however it shouldn’t change drastically. Thus, both models seem to perform with about the same accuracy.

How many flowers in your training set are misclassified by your random forest model?

The random forest model classified all of the setosa flowers correctly, misclassified 3 of the versicolor flowers as virgninca (3/37- the class.error 0.0810), and misclassified 4 virginica flowers as versicolor (4/40- the class.error 0.100). Overall misclassification was 7/120 flowers (overall class.error being 0.0583).

Use your random forest to make predictions for the observations in your testing set.

iris_rf_pred <- predict(iris_rf, iris_test)
iris_rf_pred

##          2          3          6         12         19         28         31 
##     setosa     setosa     setosa     setosa     setosa     setosa     setosa 
##         36         39         45         48         54         55         61 
##     setosa     setosa     setosa     setosa versicolor versicolor versicolor 
##         63         66         72         73         82         88         92 
## versicolor versicolor versicolor versicolor versicolor versicolor versicolor 
##         97        100        102        106        118        141        144 
## versicolor versicolor  virginica  virginica  virginica  virginica  virginica 
##        146        148 
##  virginica  virginica 
## Levels: setosa versicolor virginica

The observations are shown above: 7 predictions for setosa flowers, 13 predictions for versicolor flowers, and 10 predictions for virginica flowers in the testing set.

Develop a boosting model on your training set to predict a flower’s species based on its sepal and petal measurements. In developing this model, you can either use R’s default settings for things like the number of trees and shrinkage, or you can use values of your own choosing. (Note that flower species is a non-numeric categorical variable. So you want to make sure that you are developing a boosting model for the purpose of categorical classification.)

#library needed
library(adabag)

## Loading required package: rpart

## Loading required package: caret

## Loading required package: ggplot2

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:randomForest':
## 
##     margin

## Loading required package: lattice

## Loading required package: foreach

## Loading required package: doParallel

## Loading required package: iterators

## Loading required package: parallel

## 
## Attaching package: 'adabag'

## The following object is masked from 'package:ipred':
## 
##     bagging

#creating the boosting model
iris_boost = boosting(Species~., data = iris_train, boos = T)

Which flower measurement is most important in predicting its species (i.e., sepal length, sepal width, petal length, or petal width)?

iris_boost$importance

## Petal.Length  Petal.Width Sepal.Length  Sepal.Width 
##     47.64604     25.07661     13.65901     13.61834

The originally output is displayed below:

Petal Length appears to be most important in predicting flower species at a value of 44.21972.

Use your boosting model to make predictions for the observations in your testing set, and create a confusion matrix displaying your predictions. What is the misclassification error rate for flowers in your testing set?

pred_iris_boost = predict(iris_boost, newdata = iris_test)
pred_iris_boost

## $formula
## Species ~ .
## 
## $votes
##             [,1]      [,2]        [,3]
##  [1,] 86.4818556 11.257602   4.5998653
##  [2,] 86.4818556 12.074194   3.7832734
##  [3,] 86.6805460 14.990601   0.6681754
##  [4,] 87.5896938 10.966356   3.7832734
##  [5,] 86.6805460 14.990601   0.6681754
##  [6,] 87.5896938 14.081454   0.6681754
##  [7,] 86.4818556 12.074194   3.7832734
##  [8,] 86.4818556 15.189292   0.6681754
##  [9,] 86.4818556 11.257602   4.5998653
## [10,] 87.5896938 14.081454   0.6681754
## [11,] 86.4818556 12.074194   3.7832734
## [12,]  0.0000000 87.684284  14.6550388
## [13,]  0.8537486 88.930940  12.5546347
## [14,]  2.6319429 87.040838  12.6665421
## [15,]  0.8431612 82.376745  19.1194169
## [16,]  1.9716098 94.971523   5.3961897
## [17,]  1.6891025 91.634602   9.0156182
## [18,]  0.0000000 49.281715  53.0576082
## [19,]  1.7227951 92.710088   7.9064392
## [20,]  0.0000000 84.999823  17.3395000
## [21,]  1.9716098 87.785552  12.5821613
## [22,]  1.6891025 94.068025   6.5821950
## [23,]  1.6891025 92.940173   7.7100469
## [24,]  0.0000000 18.881068  83.4582551
## [25,]  0.0000000  2.672649  99.6666739
## [26,]  1.1078381  8.982707  92.2487774
## [27,]  0.0000000 10.296339  92.0429838
## [28,]  0.0000000 11.738945  90.6003777
## [29,]  0.0000000 13.243871  89.0954521
## [30,]  0.0000000  2.221698 100.1176247
## 
## $prob
##              [,1]       [,2]       [,3]
##  [1,] 0.845050107 0.11000270 0.04494719
##  [2,] 0.845050107 0.11798196 0.03696793
##  [3,] 0.846991593 0.14647939 0.00652902
##  [4,] 0.855875253 0.10715681 0.03696793
##  [5,] 0.846991593 0.14647939 0.00652902
##  [6,] 0.855875253 0.13759573 0.00652902
##  [7,] 0.845050107 0.11798196 0.03696793
##  [8,] 0.845050107 0.14842087 0.00652902
##  [9,] 0.845050107 0.11000270 0.04494719
## [10,] 0.855875253 0.13759573 0.00652902
## [11,] 0.845050107 0.11798196 0.03696793
## [12,] 0.000000000 0.85679953 0.14320047
## [13,] 0.008342332 0.86898112 0.12267655
## [14,] 0.025717806 0.85051215 0.12377004
## [15,] 0.008238878 0.80493736 0.18682376
## [16,] 0.019265418 0.92800617 0.05272841
## [17,] 0.016504921 0.89539973 0.08809535
## [18,] 0.000000000 0.48155209 0.51844791
## [19,] 0.016834146 0.90590875 0.07725710
## [20,] 0.000000000 0.83056855 0.16943145
## [21,] 0.019265418 0.85778906 0.12294552
## [22,] 0.016504921 0.91917772 0.06431736
## [23,] 0.016504921 0.90815701 0.07533807
## [24,] 0.000000000 0.18449475 0.81550525
## [25,] 0.000000000 0.02611556 0.97388444
## [26,] 0.010825146 0.08777376 0.90140109
## [27,] 0.000000000 0.10060980 0.89939020
## [28,] 0.000000000 0.11470611 0.88529389
## [29,] 0.000000000 0.12941136 0.87058864
## [30,] 0.000000000 0.02170913 0.97829087
## 
## $class
##  [1] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"    
##  [6] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"    
## [11] "setosa"     "versicolor" "versicolor" "versicolor" "versicolor"
## [16] "versicolor" "versicolor" "virginica"  "versicolor" "versicolor"
## [21] "versicolor" "versicolor" "versicolor" "virginica"  "virginica" 
## [26] "virginica"  "virginica"  "virginica"  "virginica"  "virginica" 
## 
## $confusion
##                Observed Class
## Predicted Class setosa versicolor virginica
##      setosa         11          0         0
##      versicolor      0         11         0
##      virginica       0          1         7
## 
## $error
## [1] 0.03333333

#this is the code just for the confusion matrix
pred_iris_boost$confusion

##                Observed Class
## Predicted Class setosa versicolor virginica
##      setosa         11          0         0
##      versicolor      0         11         0
##      virginica       0          1         7

#SUM of the wrong flowers divided by the sum of all of the flowers 
pred_iris_boost$error

## [1] 0.03333333

The output is included in the screenshot below:

The observations are shown above: 7 predictions for setosa flowers, 14 predictions for versicolor flowers, and 9 predictions for virginica flowers in the testing set. All 7 setosa flowers were predicted correctly, however 2 versicolor flowers were predicted to be virginica (2/14- 0.143) and 1 virginica flower was predicted to be a versicolor flower (1/9- 0.111). The total misclassification error rate was 0.1 (calculated from 3/30)

Create a plot comparing the number of trees used in the boosting model with the misclassification error on your testing set. Based on your plot, are a large number of trees needed to have a fairly accurate boosting model?

ntree <- c(1, seq(20, 100, 20))
err <- c(0)
for (i in 1:6){
  iris_boost = boosting(Species~., data = iris_train, boos = T, mfinal = ntree[i])
  pred_credit_boost = predict(iris_boost, newdata = iris_test)
  err[i] = pred_iris_boost$error
  cat(i, " ")
}

## 1  2  3  4  5  6

plot(ntree, err, type = 'l', col = 2, lwd = 2, xlab = "No. of Trees", ylab = "Missclassification Error")

The original output is included below:

Since the output displays a straight line this means that less trees will provide the team with same results. The prediction methods of boosting for this data set are constant.

Using your training set, create a regression model to predict a flower’s petal length based on its petal width, sepal length, sepal width, and species. Determine if your regression model could be improved with a Box-Cox transformation, and if so, perform the most appropriate Box-Cox transformation.

model1 <- lm(Petal.Length ~., data = iris_train)
summary(model1)

## 
## Call:
## lm(formula = Petal.Length ~ ., data = iris_train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78018 -0.15685  0.00861  0.14957  0.64142 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.95644    0.30664  -3.119   0.0023 ** 
## Sepal.Length       0.61073    0.05505  11.094  < 2e-16 ***
## Sepal.Width       -0.23524    0.09031  -2.605   0.0104 *  
## Petal.Width        0.61181    0.12959   4.721 6.72e-06 ***
## Speciesversicolor  1.45652    0.18758   7.765 3.87e-12 ***
## Speciesvirginica   1.94464    0.26268   7.403 2.46e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2663 on 114 degrees of freedom
## Multiple R-squared:  0.9783, Adjusted R-squared:  0.9774 
## F-statistic:  1030 on 5 and 114 DF,  p-value: < 2.2e-16

library(MASS)
boxcox(model1)

The output of the box-cox graph is included below:

As seen the value of lambda falls at around 0.5, which means a box-cox transformation will be useful within the model created. The transformation is the square root of the petal length

model1b = lm(I(sqrt(Petal.Length)) ~., data = iris_train)
summary(model1b)

## 
## Call:
## lm(formula = I(sqrt(Petal.Length)) ~ ., data = iris_train)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.198873 -0.039252  0.003451  0.040521  0.211371 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.64992    0.07825   8.306 2.32e-13 ***
## Sepal.Length       0.13785    0.01405   9.813  < 2e-16 ***
## Sepal.Width       -0.05132    0.02305  -2.227   0.0279 *  
## Petal.Width        0.14961    0.03307   4.524 1.50e-05 ***
## Speciesversicolor  0.54170    0.04787  11.316  < 2e-16 ***
## Speciesvirginica   0.64576    0.06703   9.633  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06797 on 114 degrees of freedom
## Multiple R-squared:  0.9824, Adjusted R-squared:  0.9816 
## F-statistic:  1270 on 5 and 114 DF,  p-value: < 2.2e-16

There were improvements in the adjusted r-squared value following the box-cox transformation, meaning it was successful.

##Part 2: Filter Joins

Create a new data frame called Faculty containing the following table of information about four faculty at a university:

Faculty <- data.frame(ID=c(1,2,3,7), Name=c("Grayson","Wayne", "Stark", "Grey"), Code=c("ART", "ART", "COMP", "HIST"))
Faculty

##   ID    Name Code
## 1  1 Grayson  ART
## 2  2   Wayne  ART
## 3  3   Stark COMP
## 4  7    Grey HIST

Create a new data frame called Department containing the following table of information:

Department <- data.frame(Code=c("ART", "COMP", "ENG", "HIST"), Department_Name=c("Art Department", "Computer Science Department", "English Department", "History Department"))
Department

##   Code             Department_Name
## 1  ART              Art Department
## 2 COMP Computer Science Department
## 3  ENG          English Department
## 4 HIST          History Department

Using a filter join, display only those departments that have a faculty member listed in the Faculty data frame.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ✔ purrr   0.3.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ purrr::accumulate() masks foreach::accumulate()
## ✖ dplyr::combine()    masks randomForest::combine()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ purrr::lift()       masks caret::lift()
## ✖ ggplot2::margin()   masks randomForest::margin()
## ✖ dplyr::select()     masks MASS::select()
## ✖ purrr::when()       masks foreach::when()

Department %>% semi_join(Faculty, by = "Code") -> joined_data
joined_data

##   Code             Department_Name
## 1  ART              Art Department
## 2 COMP Computer Science Department
## 3 HIST          History Department

Using a filter join, display the department that does not have a faculty member listed in the Faculty data frame.

Department %>% anti_join(Faculty, by = "Code") -> joined_data
joined_data

##   Code    Department_Name
## 1  ENG English Department

Exam 2

Kamrie Foster

2022-12-15

Exam 2

Part 1: Predictive Models on the Iris Data Set