Welcome to the Final, this semester has been a great opportunity for everyone to learn the basics of R. This exam will build up off of everything that we have done in the course. It is open book, but you may not ask the TAs for help nor your classmates.

Use the variable names that I gave you, I am testing automating the grading and the variable names will be important. Some of the questions are going to be muliple choice, see the example below for how to answer them:

Example Problem: Let’s start by introducing the topics of R. Assign 4 to x, “a” to y and FALSE to z. You cannot combine these variables, why?

  1. the class types are different
  2. you can combine 2 of them but not three
  3. all of the above
x <- 4
y <- "a"
z <- FALSE

# Submit answer
q_example <- "a"

# For some problems, you will need to assign the value to the question. Do it like so
q_example_2 <- 2
  1. Create a list called ‘farm_animals’ that contains, in this order, cat, dog, sheep, cow, chicken
farm_animals <- list("cat", "dog", "sheep", "cow","chicken")
  1. You want to calculate a students grade in your course.
report_card <- c(92,88,91,97,85)
assignments <- c("HW1", "Exam 1", "Quiz", "Exam 2", "HW2")

weight <- c(.10,.25, .3, .25, .10)
assign <- c(report_card, assignments)
weight_grade <- c(report_card * weight)
sum(weight_grade)
## [1] 91.25
student_grade <- 91.25
  1. Create a matrix called ‘nums’ which includes numbers 1:50 and has 5 rows with the numbers ascending on each row. Your output should look like this:

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 2 3 4 5 6 7 8 9 10 [2,] 11 12 13 14 15 16 17 18 19 20 [3,] 21 22 23 24 25 26 27 28 29 30 [4,] 31 32 33 34 35 36 37 38 39 40 [5,] 41 42 43 44 45 46 47 48 49 50

nums <-  matrix( 
  c(1:50),
   nrow=5,              
  ncol=10,              
  byrow = TRUE) 

4.With the looping variable created, write a for loop or lapply function that will change all the negative numbers to 0 and the positive numbers to 1

looping <- rnorm(100)

for (i in 1:NROW(looping)){
    if (looping[i] > 0) 
      {looping[i] <- 1} 
  else {looping[i] <- 0}
}

Part One

For part one you will be working with the ‘countries.csv’ dataset.

  1. Start by importing the dataset as ‘countries’, then displaying the first 3 rows using the appropriate function.
countries <- read.csv("countries.csv", header = TRUE)

head(countries,3)
##        Country                              Region Population Area..sq..mi..
## 1 Afghanistan        ASIA (EX. NEAR EAST)            31056997         647500
## 2     Albania  EASTERN EUROPE                         3581655          28748
## 3     Algeria  NORTHERN AFRICA                       32930091        2381740
##   Pop..Density..per.sq..mi.. Coastline..coast.area.ratio. Net.migration
## 1                       48,0                         0,00         23,06
## 2                      124,6                         1,26         -4,93
## 3                       13,8                         0,04         -0,39
##   Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## 1                             163,07                700         36,0
## 2                              21,52               4500         86,5
## 3                                 31               6000         70,0
##   Phones..per.1000. Arable.... Crops.... Other.... Climate Birthrate Deathrate
## 1               3,2      12,13      0,22     87,65       1      46,6     20,34
## 2              71,2      21,09      4,42     74,49       3     15,11      5,22
## 3              78,1       3,22      0,25     96,53       1     17,14      4,61
##   Agriculture Industry Service
## 1        0,38     0,24    0,38
## 2       0,232    0,188   0,579
## 3       0,101      0,6   0,298
  1. Starting with the basics, sum the Area (sq. mi.) column to get the landmass of the land of six of the seven continents, assign this value to sum_area
sum_area <- sum(countries$Area..sq..mi..)
  1. There is some structual issues with this dataset, write a function, named ‘convert_number’ that will accomplish the following:
convert_number <- function(x){
  x1 <- gsub(",", ".", x)
  x2 <- as.double(x1)
return(x2)
}

Now use your function on the following columns: Pop. Density (per sq. mi.), Coastline (coast/area ratio), Net migration, Phones (per 1000), Arable (%), Crops (%), Agriculture, Industry, Service

Don’t forget to check that it worked!

countries$Population <- convert_number(countries$Population)

countries$Pop..Density..per.sq..mi..<- convert_number(countries$Pop..Density..per.sq..mi..)

countries$Coastline..coast.area.ratio. <- convert_number(countries$Coastline..coast.area.ratio.)

countries$Net.migration <- convert_number(countries$Net.migration)

countries$Phones..per.1000. <- convert_number(countries$Phones..per.1000.)

countries$Arable....<- convert_number(countries$Arable....)

countries$Crops.... <- convert_number(countries$Crops....)

countries$Agriculture <- convert_number(countries$Agriculture)

countries$Industry<- convert_number(countries$Industry)

countries$Service <- convert_number(countries$Service)
  1. Check the data to see if there are any NA values in the data. Assign ‘Q8’ with the number of NAs in the climate column.
summary(countries)
##             Country                                    Region  
##  Afghanistan    :  1   SUB-SAHARAN AFRICA                 :51  
##  Albania        :  1   LATIN AMER. & CARIB                :45  
##  Algeria        :  1   ASIA (EX. NEAR EAST)               :28  
##  American Samoa :  1   WESTERN EUROPE                     :28  
##  Andorra        :  1   OCEANIA                            :21  
##  Angola         :  1   NEAR EAST                          :16  
##  (Other)        :221   (Other)                            :38  
##    Population        Area..sq..mi..     Pop..Density..per.sq..mi..
##  Min.   :7.026e+03   Min.   :       2   Min.   :    0.00          
##  1st Qu.:4.376e+05   1st Qu.:    4648   1st Qu.:   29.15          
##  Median :4.787e+06   Median :   86600   Median :   78.80          
##  Mean   :2.874e+07   Mean   :  598227   Mean   :  379.05          
##  3rd Qu.:1.750e+07   3rd Qu.:  441811   3rd Qu.:  190.15          
##  Max.   :1.314e+09   Max.   :17075200   Max.   :16271.50          
##                                                                   
##  Coastline..coast.area.ratio. Net.migration      
##  Min.   :  0.00               Min.   :-20.99000  
##  1st Qu.:  0.10               1st Qu.: -0.92750  
##  Median :  0.73               Median :  0.00000  
##  Mean   : 21.17               Mean   :  0.03812  
##  3rd Qu.: 10.35               3rd Qu.:  0.99750  
##  Max.   :870.66               Max.   : 23.06000  
##                               NA's   :3          
##  Infant.mortality..per.1000.births. GDP....per.capita.  Literacy....
##         :  3                        Min.   :  500             : 18  
##  9,95   :  3                        1st Qu.: 1900      99,0   : 13  
##  12,62  :  2                        Median : 5550      97,0   : 11  
##  4,39   :  2                        Mean   : 9690      98,0   : 10  
##  10,03  :  1                        3rd Qu.:15700      100,0  :  7  
##  10,09  :  1                        Max.   :55100      98,6   :  4  
##  (Other):215                        NA's   :1          (Other):164  
##  Phones..per.1000.   Arable....      Crops....        Other....   Climate  
##  Min.   :   0.2    Min.   : 0.00   Min.   : 0.000   100    :  8      : 22  
##  1st Qu.:  37.8    1st Qu.: 3.22   1st Qu.: 0.190   75     :  3   1  : 29  
##  Median : 176.2    Median :10.42   Median : 1.030          :  2   1,5:  8  
##  Mean   : 236.1    Mean   :13.80   Mean   : 4.564   70     :  2   2  :111  
##  3rd Qu.: 389.6    3rd Qu.:20.00   3rd Qu.: 4.440   70,44  :  2   2,5:  3  
##  Max.   :1035.6    Max.   :62.11   Max.   :50.680   73,33  :  2   3  : 48  
##  NA's   :4         NA's   :2       NA's   :2        (Other):208   4  :  6  
##    Birthrate     Deathrate    Agriculture         Industry     
##         :  3          :  4   Min.   :0.00000   Min.   :0.0200  
##  12,56  :  2   10,31  :  2   1st Qu.:0.03775   1st Qu.:0.1930  
##  18,02  :  2   12,25  :  2   Median :0.09900   Median :0.2720  
##  18,79  :  2   14,02  :  2   Mean   :0.15084   Mean   :0.2827  
##  20,48  :  2   3,92   :  2   3rd Qu.:0.22100   3rd Qu.:0.3410  
##  10     :  1   5,28   :  2   Max.   :0.76900   Max.   :0.9060  
##  (Other):215   (Other):213   NA's   :15        NA's   :16      
##     Service      
##  Min.   :0.0620  
##  1st Qu.:0.4293  
##  Median :0.5710  
##  Mean   :0.5653  
##  3rd Qu.:0.6785  
##  Max.   :0.9540  
##  NA's   :15
sum(countries$Climate== "")
## [1] 22
Q8 <-22
  1. Create a new column on the countries data set called ‘GDP’, fill this with the GDP ($ per capita) multiplied by the population. Assign ‘Q9’ with the country name with the lowest GDP in the world. (Make sure spelling is correct)
GDP <- countries$GDP....per.capita. * countries$Population
print (GDP)
##   [1] 2.173990e+10 1.611745e+10 1.975805e+11 4.623520e+08 1.352819e+09
##   [6] 2.304143e+10 1.159022e+08 7.601880e+08 4.471245e+11 1.041730e+10
##  [11] 2.012948e+09 5.876584e+11 2.457864e+11 2.706950e+10 5.072959e+09
##  [16] 1.180609e+10 2.799942e+11 4.394618e+09 6.278737e+10 3.020308e+11
##  [21] 1.409877e+09 8.649238e+09 2.367828e+09 2.963640e+09 2.157371e+10
##  [26] 2.744375e+10 1.475850e+10 1.429395e+12 3.695680e+08 7.057658e+09
##  [31] 5.612879e+10 1.529327e+10 8.528874e+10 4.854041e+09 2.637471e+10
##  [36] 3.121326e+10 9.863482e+11 5.893706e+08 1.590260e+09 4.733692e+09
##  [41] 1.193304e+10 1.597288e+11 6.569869e+12 2.746361e+11 4.836636e+08
##  [46] 4.386239e+10 2.591620e+09 1.069400e+08 3.708488e+10 2.471678e+10
##  [51] 4.764434e+10 3.301018e+10 1.505858e+10 1.606966e+11 1.695156e+11
##  [56] 6.324890e+08 3.721140e+08 5.510390e+10 5.313885e+08 4.470678e+10
##  [61] 3.155480e+11 3.274741e+10 1.458294e+09 3.350896e+09 1.628930e+10
##  [66] 5.234459e+10 1.039412e+09 5.254504e+09 1.433396e+11 1.680181e+12
##  [71] 1.655925e+09 4.805115e+09 7.836983e+09 2.790659e+09 8.572542e+08
##  [76] 1.165368e+10 2.274855e+12 4.930106e+10 4.887400e+08 2.137612e+11
##  [81] 1.127220e+09 4.485150e+08 3.622208e+09 3.591399e+09 5.040353e+10
##  [86] 1.308180e+09 2.034947e+10 1.153623e+09 3.068980e+09 1.329361e+10
##  [91] 1.904889e+10 1.998844e+11 1.387405e+11 9.251089e+09 3.176521e+12
##  [96] 7.854488e+11 4.808190e+11 4.017507e+10 1.202422e+11 1.584261e+09
## [101] 1.257719e+11 1.552165e+12 1.075668e+10 3.594474e+12 2.258883e+09
## [106] 2.539907e+10 9.596944e+10 3.470782e+10 8.434560e+07 3.004692e+10
## [111] 8.694734e+11 4.594947e+10 8.342237e+09 1.082642e+10 2.320230e+10
## [116] 1.859544e+10 6.066993e+09 3.042004e+09 3.776483e+10 8.496750e+08
## [121] 4.087933e+10 2.614016e+10 8.790625e+09 1.373871e+10 1.487638e+10
## [126] 7.808356e+09 2.194727e+11 1.400131e+09 1.054515e+10 7.083788e+09
## [131] 9.667520e+07 6.280286e+09 5.719298e+09 1.414543e+10 5.232084e+08
## [136] 9.670457e+11 2.160080e+08 8.040071e+09 8.786610e+08 5.098003e+09
## [141] 3.209260e+07 1.329650e+11 2.362381e+10 1.471786e+10 6.643500e+07
## [146] 3.960201e+10 4.716558e+11 2.527790e+09 3.288690e+09 8.804462e+10
## [151] 1.281130e+10 1.002008e+10 1.186738e+11 1.030738e+09 1.742890e+11
## [156] 4.063920e+10 3.481875e+11 1.852110e+08 2.010531e+10 1.247520e+10
## [161] 3.058038e+10 1.443433e+11 4.115559e+11 4.277592e+11 1.909057e+11
## [166] 6.597676e+10 1.903522e+10 4.567987e+09 1.561249e+11 1.271753e+12
## [171] 1.124272e+10 1.875500e+07 3.443352e+08 9.096732e+08 4.847940e+07
## [176] 3.417592e+08 9.906848e+08 1.012085e+09 2.320956e+08 3.188328e+11
## [181] 1.917939e+10 2.067210e+10 6.360198e+08 3.002625e+09 1.064640e+11
## [186] 7.234466e+10 3.819659e+10 9.391446e+08 4.431669e+09 4.728077e+11
## [191] 8.887525e+11 7.482229e+10 7.834912e+10 1.756468e+09 5.568037e+09
## [196] 2.416448e+11 2.460326e+11 6.230849e+10 5.390444e+11 7.320815e+09
## [201] 2.246724e+10 4.782738e+11 8.323053e+09 2.523158e+08 1.012550e+10
## [206] 7.020760e+10 4.717735e+11 2.924894e+10 2.030592e+08 1.299100e+07
## [211] 3.947406e+10 2.522384e+11 6.038294e+10 1.678874e+12 1.128119e+13
## [216] 4.392873e+10 4.642213e+10 6.057201e+08 1.235061e+11 2.110074e+11
## [221] 1.868006e+09 5.929250e+07 1.968394e+09           NA 1.716495e+10
## [226] 9.201608e+09 2.324993e+10
countries$GDP <- cbind(GDP)

min(countries$GDP, na.rm=T)
## [1] 12991000
Q9 <- "12991000"
  1. Since most of the countries with NA values are small countries or territories of other countries, create a new data frame named ‘noNA’ has all the rows that is complete values for all the columns. Assign dim(noNA) to Q10
noNA <- na.omit(countries)
  
Q10 <- dim(noNA)
  1. Create a histogram of the GDP ($ per capita). Which of the following bars are largest? Assign Q11 with your choice
  1. 20000 - 25000
  2. 5000 - 10000
  3. 25000 - 30000
  4. 55000 - 60000
hist(countries$GDP....per.capita.)

Q11 <- "b"
  1. Factor the Region column and create a boxplot of GDP ($ per capita) for each region. (You will want to rotate the Axis Text) Assign Q12 with the Region that has the outlier with the highest value
countries$Region <- as.factor(countries$Region) 
levels(countries$Region) <- levels (countries$Region)

boxplot(GDP....per.capita.~Region,data=countries, main="GDP ($ per capita) for each region. ",
   xlab="Region", ylab="GDP")

Q12 <- "Western Europe"
  1. Next, check for a correlation between Agriculture and Literacy (%). Create a scatter plot with Agriculture on the X axis and Literacy (%) on the Y axis. Is the correlation negative or positive? Assign your answer to Q13
  1. Postive
  2. Negative
ggplot(countries, aes(x = Agriculture, y = Literacy....)) +
  geom_point()
## Warning: Removed 15 rows containing missing values (geom_point).

Q13 <- "b"
  1. Use the glimse function on ‘countries’. What data type is Net migration? Assign your answer to Q14
glimpse(countries)
## Observations: 227
## Variables: 21
## $ Country                            <fct> "Afghanistan ", "Albania ", "Alger…
## $ Region                             <fct> ASIA (EX. NEAR EAST)         , EAS…
## $ Population                         <dbl> 31056997, 3581655, 32930091, 57794…
## $ Area..sq..mi..                     <int> 647500, 28748, 2381740, 199, 468, …
## $ Pop..Density..per.sq..mi..         <dbl> 48.0, 124.6, 13.8, 290.4, 152.1, 9…
## $ Coastline..coast.area.ratio.       <dbl> 0.00, 1.26, 0.04, 58.29, 0.00, 0.1…
## $ Net.migration                      <dbl> 23.06, -4.93, -0.39, -20.71, 6.60,…
## $ Infant.mortality..per.1000.births. <fct> "163,07", "21,52", "31", "9,27", "…
## $ GDP....per.capita.                 <int> 700, 4500, 6000, 8000, 19000, 1900…
## $ Literacy....                       <fct> "36,0", "86,5", "70,0", "97,0", "1…
## $ Phones..per.1000.                  <dbl> 3.2, 71.2, 78.1, 259.5, 497.2, 7.8…
## $ Arable....                         <dbl> 12.13, 21.09, 3.22, 10.00, 2.22, 2…
## $ Crops....                          <dbl> 0.22, 4.42, 0.25, 15.00, 0.00, 0.2…
## $ Other....                          <fct> "87,65", "74,49", "96,53", "75", "…
## $ Climate                            <fct> "1", "3", "1", "2", "3", "", "2", …
## $ Birthrate                          <fct> "46,6", "15,11", "17,14", "22,46",…
## $ Deathrate                          <fct> "20,34", "5,22", "4,61", "3,27", "…
## $ Agriculture                        <dbl> 0.380, 0.232, 0.101, NA, NA, 0.096…
## $ Industry                           <dbl> 0.240, 0.188, 0.600, NA, NA, 0.658…
## $ Service                            <dbl> 0.380, 0.579, 0.298, NA, NA, 0.246…
## $ GDP                                <dbl[,1]> <matrix[26 x 1]>
Q14 <- "Double"
  1. Sort the Coastline (coast/area ratio) column, have the variables decreasing. Assign this to Q15
Q15 <- countries[order(-countries$Coastline..coast.area.ratio.),]
  1. Which Country has the lowest GDP per Capita? Assign this to Q16
countries[which(countries$GDP....per.capita. == min(countries$GDP....per.capita., na.rm=TRUE)),]
##           Country                              Region Population Area..sq..mi..
## 59    East Timor        ASIA (EX. NEAR EAST)             1062777          15007
## 184 Sierra Leone  SUB-SAHARAN AFRICA                     6005250          71740
## 189      Somalia  SUB-SAHARAN AFRICA                     8863338         637657
##     Pop..Density..per.sq..mi.. Coastline..coast.area.ratio. Net.migration
## 59                        70.8                         4.70          0.00
## 184                       83.7                         0.56          0.00
## 189                       13.9                         0.47          5.37
##     Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## 59                               47,41                500         58,6
## 184                             143,64                500         31,4
## 189                              116,7                500         37,8
##     Phones..per.1000. Arable.... Crops.... Other.... Climate Birthrate
## 59                 NA       4.71      0.67     94,62       2     26,99
## 184               4.0       6.98      0.89     92,13       2     45,76
## 189              11.3       1.67      0.04     98,29       1     45,13
##     Deathrate Agriculture Industry Service        GDP
## 59       6,24       0.085    0.231   0.684  531388500
## 184     23,03       0.490    0.310   0.210 3002625000
## 189     16,63       0.650    0.100   0.250 4431669000
Q16 <- "Somalia"
  1. Use the tidyverse to see how many countries have a GDP less than Texas ($1803000000000). Assign the answer to Q17
sum <- countries %>% group_by (Country) %>% summarise(count=sum(GDP < 1803000000000))

countries$GDP2 <- cbind(sum$count)

sum <- filter(sum, count == "1")




Q17 <- "221"
  1. For the countries that have a smaller economy than the State of Texas, which country has the largest economy? Assign this to Q18
countries2 <- filter(countries, GDP2==1)
countries2[which(countries2$GDP == max(countries2$GDP)), ]
##    Country                              Region Population Area..sq..mi..
## 69 France  WESTERN EUROPE                        60876136         547030
##    Pop..Density..per.sq..mi.. Coastline..coast.area.ratio. Net.migration
## 69                      111.3                         0.63          0.66
##    Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## 69                               4,26              27600         99,0
##    Phones..per.1000. Arable.... Crops.... Other.... Climate Birthrate Deathrate
## 69             586.4      33.53      2.07      64,4       4     11,99      9,14
##    Agriculture Industry Service          GDP GDP2
## 69       0.022    0.214   0.764 1.680181e+12    1
Q18 <- "France"
  1. Group the countries by their Region, in this order, show me the average population density, average GDP per capita, Net migration and Agriculture. Assign your code to Q19
Q19 <- countries %>% group_by (Region) %>% summarise(Pop_Dense=mean(Pop..Density..per.sq..mi..), GDP_Avg = mean(GDP....per.capita.),Net = mean(Net.migration), Agriculture = mean(Agriculture))
  1. Finishing part 1, run a linear model where you predict Net migration using Coastline, GDP per capita, Literacy, and Crops. Assign the adjusted R-squared value to Q20
lm <- lm( `Net.migration`~ `Coastline..coast.area.ratio.`+ `GDP....per.capita.`+ `Literacy....`+ `Crops....` , data = countries)
summary(lm)
## 
## Call:
## lm(formula = Net.migration ~ Coastline..coast.area.ratio. + GDP....per.capita. + 
##     Literacy.... + Crops...., data = countries)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.2168  -0.0557   0.0000   0.0000  12.2903 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.550e+00  1.515e+00  -1.023 0.309539    
## Coastline..coast.area.ratio.  4.499e-03  6.961e-03   0.646 0.519910    
## GDP....per.capita.            2.067e-04  5.716e-05   3.616 0.000526 ***
## Literacy....100,0            -8.516e-01  2.535e+00  -0.336 0.737822    
## Literacy....17,6              7.163e-01  5.103e+00   0.140 0.888734    
## Literacy....26,6              1.354e+00  5.100e+00   0.265 0.791329    
## Literacy....31,4              1.591e+00  5.100e+00   0.312 0.755934    
## Literacy....35,9             -1.520e+00  5.081e+00  -0.299 0.765686    
## Literacy....36,0              2.450e+01  5.103e+00   4.802 7.33e-06 ***
## Literacy....37,8              6.821e+00  5.105e+00   1.336 0.185350    
## Literacy....40,1              2.848e+00  5.094e+00   0.559 0.577684    
## Literacy....40,2              1.453e+00  5.096e+00   0.285 0.776349    
## Literacy....40,4              1.368e+00  5.094e+00   0.269 0.788895    
## Literacy....40,9              1.718e+00  5.088e+00   0.338 0.736602    
## Literacy....41,7              1.179e+00  5.096e+00   0.231 0.817591    
## Literacy....42,0              1.196e+00  5.094e+00   0.235 0.814942    
## Literacy....42,2              1.352e+00  5.097e+00   0.265 0.791484    
## Literacy....42,4              1.263e+00  5.079e+00   0.249 0.804193    
## Literacy....42,7              1.529e+00  5.100e+00   0.300 0.765112    
## Literacy....43,1              9.513e-01  5.081e+00   0.187 0.851948    
## Literacy....45,2              1.366e+00  5.095e+00   0.268 0.789310    
## Literacy....45,7             -1.511e+00  5.089e+00  -0.297 0.767275    
## Literacy....46,4              1.039e+00  5.102e+00   0.204 0.839179    
## Literacy....47,5              1.195e+00  5.100e+00   0.234 0.815316    
## Literacy....47,8              1.350e+00  5.098e+00   0.265 0.791864    
## Literacy....50,2              1.423e+00  5.102e+00   0.279 0.781098    
## Literacy....50,9              3.470e+00  5.092e+00   0.682 0.497523    
## Literacy....51,0              1.346e+00  5.100e+00   0.264 0.792581    
## Literacy....51,6              3.676e+00  5.096e+00   0.721 0.472840    
## Literacy....51,7              9.899e-02  5.072e+00   0.020 0.984479    
## Literacy....52,9             -2.966e-01  5.078e+00  -0.058 0.953572    
## Literacy....53,0              2.073e+00  5.061e+00   0.410 0.683156    
## Literacy....56,5              5.177e+00  5.165e+00   1.002 0.319263    
## Literacy....57,5              1.717e+00  5.089e+00   0.337 0.736796    
## Literacy....57,7              5.813e-01  5.080e+00   0.114 0.909184    
## Literacy....58,0             -4.041e+00  5.061e+00  -0.799 0.426928    
## Literacy....58,6              1.469e+00  3.758e+00   0.391 0.696967    
## Literacy....59,5              1.331e+00  5.076e+00   0.262 0.793817    
## Literacy....60,9              1.604e+00  5.087e+00   0.315 0.753389    
## Literacy....61,1              1.167e+00  5.094e+00   0.229 0.819428    
## Literacy....62,7              1.671e+00  5.096e+00   0.328 0.743795    
## Literacy....63,2              5.206e-01  5.072e+00   0.103 0.918507    
## Literacy....64,6              1.328e+00  5.085e+00   0.261 0.794729    
## Literacy....65,5              1.491e+00  5.101e+00   0.292 0.770825    
## Literacy....66,4              1.256e+00  5.095e+00   0.247 0.805866    
## Literacy....67,0              1.430e+00  5.149e+00   0.278 0.782001    
## Literacy....67,5              1.711e-01  5.083e+00   0.034 0.973227    
## Literacy....67,9              1.275e+00  5.099e+00   0.250 0.803191    
## Literacy....68,0              2.111e+00  5.087e+00   0.415 0.679274    
## Literacy....68,9              1.551e+00  5.097e+00   0.304 0.761756    
## Literacy....69,4              1.257e+00  5.092e+00   0.247 0.805694    
## Literacy....69,9              3.015e+00  5.079e+00   0.594 0.554458    
## Literacy....70,0             -3.900e-02  5.072e+00  -0.008 0.993884    
## Literacy....70,4              3.285e+00  5.085e+00   0.646 0.520145    
## Literacy....70,6             -1.403e-01  5.064e+00  -0.028 0.977968    
## Literacy....74,2              1.815e+00  5.079e+00   0.357 0.721825    
## Literacy....74,8              2.047e+00  5.073e+00   0.404 0.687627    
## Literacy....75,8             -8.587e-01  5.059e+00  -0.170 0.865652    
## Literacy....76,2             -4.501e-01  5.076e+00  -0.089 0.929559    
## Literacy....76,6             -1.083e+01  5.093e+00  -2.127 0.036523 *  
## Literacy....76,9              1.597e+00  5.069e+00   0.315 0.753495    
## Literacy....77,9             -1.851e+00  5.095e+00  -0.363 0.717352    
## Literacy....78,2             -4.568e-01  5.098e+00  -0.090 0.928838    
## Literacy....78,8             -3.584e+00  5.059e+00  -0.708 0.480722    
## Literacy....79,0              1.603e+00  5.083e+00   0.315 0.753367    
## Literacy....79,3              6.554e+00  5.707e+00   1.149 0.254210    
## Literacy....79,4             -5.083e-01  5.063e+00  -0.100 0.920279    
## Literacy....79,8             -3.083e-01  5.063e+00  -0.061 0.951594    
## Literacy....80,2             -1.200e+00  5.072e+00  -0.237 0.813586    
## Literacy....80,6              1.390e+00  5.103e+00   0.272 0.786102    
## Literacy....81,6              6.526e-01  5.074e+00   0.129 0.897988    
## Literacy....82,5              1.342e+01  5.084e+00   2.640 0.009994 ** 
## Literacy....82,6              2.582e-01  5.070e+00   0.051 0.959518    
## Literacy....83,0              6.111e+00  5.065e+00   1.207 0.231191    
## Literacy....83,5              1.181e+01  5.072e+00   2.328 0.022456 *  
## Literacy....83,8              1.256e+00  5.103e+00   0.246 0.806154    
## Literacy....84,0              6.114e-02  5.068e+00   0.012 0.990405    
## Literacy....84,7             -1.220e+00  5.062e+00  -0.241 0.810183    
## Literacy....84,8              2.114e-01  5.088e+00   0.042 0.966962    
## Literacy....85,1              1.404e+00  5.096e+00   0.276 0.783591    
## Literacy....85,3             -4.635e-01  5.091e+00  -0.091 0.927686    
## Literacy....85,6             -1.257e+00  5.049e+00  -0.249 0.803981    
## Literacy....85,7              1.575e+00  5.074e+00   0.310 0.757014    
## Literacy....86,4             -3.625e-01  3.700e+00  -0.098 0.922213    
## Literacy....86,5             -1.440e+00  3.701e+00  -0.389 0.698196    
## Literacy....87,2             -2.347e-01  5.091e+00  -0.046 0.963342    
## Literacy....87,4              2.852e+00  5.082e+00   0.561 0.576275    
## Literacy....87,9             -2.385e-01  3.706e+00  -0.064 0.948859    
## Literacy....88,7              2.585e+00  5.111e+00   0.506 0.614437    
## Literacy....88,9              5.120e-01  5.066e+00   0.101 0.919750    
## Literacy....89,0             -1.126e+01  4.519e+00  -2.492 0.014810 *  
## Literacy....89,1             -7.397e-02  5.055e+00  -0.015 0.988361    
## Literacy....90,0              2.542e-01  5.050e+00   0.050 0.959976    
## Literacy....90,3              1.559e+00  5.070e+00   0.307 0.759303    
## Literacy....90,7              1.213e+00  5.093e+00   0.238 0.812332    
## Literacy....90,9             -8.352e-02  3.719e+00  -0.022 0.982139    
## Literacy....91,0             -1.549e+00  5.057e+00  -0.306 0.760223    
## Literacy....91,3              7.553e+00  5.072e+00   1.489 0.140460    
## Literacy....92,0              1.764e+00  5.434e+00   0.325 0.746242    
## Literacy....92,2             -4.966e+00  5.058e+00  -0.982 0.329166    
## Literacy....92,3              2.053e+00  5.095e+00   0.403 0.688104    
## Literacy....92,5              4.551e-01  3.104e+00   0.147 0.883795    
## Literacy....92,6              8.466e-01  3.113e+00   0.272 0.786342    
## Literacy....92,8              1.975e-01  5.057e+00   0.039 0.968942    
## Literacy....93,0             -3.893e+00  3.728e+00  -1.044 0.299531    
## Literacy....93,3              2.678e+00  5.075e+00   0.528 0.599163    
## Literacy....93,4              6.682e-01  5.074e+00   0.132 0.895563    
## Literacy....93,5              7.025e-01  5.138e+00   0.137 0.891589    
## Literacy....93,7             -2.389e-01  3.824e+00  -0.062 0.950354    
## Literacy....93,9              1.409e+00  5.069e+00   0.278 0.781811    
## Literacy....94,0             -4.846e+00  3.701e+00  -1.309 0.194202    
## Literacy....94,1             -8.387e-01  3.688e+00  -0.227 0.820673    
## Literacy....94,5              1.742e+00  5.128e+00   0.340 0.735002    
## Literacy....95,0              1.026e+01  5.066e+00   2.026 0.046142 *  
## Literacy....95,4             -1.181e+00  5.075e+00  -0.233 0.816620    
## Literacy....95,6             -4.150e+00  5.059e+00  -0.820 0.414525    
## Literacy....96,0              8.957e-01  3.121e+00   0.287 0.774845    
## Literacy....96,1             -3.141e+00  5.095e+00  -0.616 0.539411    
## Literacy....96,2             -4.306e-01  5.060e+00  -0.085 0.932390    
## Literacy....96,7             -1.387e+00  5.056e+00  -0.274 0.784617    
## Literacy....97,0             -2.387e+00  1.951e+00  -1.224 0.224683    
## Literacy....97,1             -7.638e-02  5.058e+00  -0.015 0.987990    
## Literacy....97,2              2.525e+00  5.164e+00   0.489 0.626309    
## Literacy....97,4             -1.722e+00  5.052e+00  -0.341 0.734116    
## Literacy....97,5              1.167e+00  5.087e+00   0.229 0.819190    
## Literacy....97,6             -1.288e+00  5.070e+00  -0.254 0.800114    
## Literacy....97,7             -6.548e-02  5.053e+00  -0.013 0.989693    
## Literacy....97,8              5.148e+00  3.683e+00   1.398 0.166164    
## Literacy....97,9             -1.102e+00  3.727e+00  -0.296 0.768346    
## Literacy....98,0              5.018e-01  2.071e+00   0.242 0.809231    
## Literacy....98,4             -1.375e+00  3.708e+00  -0.371 0.711764    
## Literacy....98,5              4.602e+00  3.812e+00   1.207 0.230911    
## Literacy....98,6             -4.914e+00  2.765e+00  -1.777 0.079407 .  
## Literacy....98,8             -1.323e+00  5.082e+00  -0.260 0.795287    
## Literacy....99,0             -1.330e+00  1.948e+00  -0.682 0.496988    
## Literacy....99,1              2.696e+00  5.078e+00   0.531 0.597007    
## Literacy....99,3             -3.846e-01  5.092e+00  -0.076 0.939981    
## Literacy....99,4             -7.442e-01  3.703e+00  -0.201 0.841237    
## Literacy....99,6              7.698e-01  3.122e+00   0.247 0.805865    
## Literacy....99,7             -2.686e+00  3.111e+00  -0.863 0.390529    
## Literacy....99,8             -2.627e+00  3.114e+00  -0.844 0.401452    
## Literacy....99,9             -1.350e+00  3.749e+00  -0.360 0.719779    
## Crops....                    -1.648e-01  6.461e-02  -2.550 0.012706 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.88 on 79 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.6431, Adjusted R-squared:  0.00159 
## F-statistic: 1.002 on 142 and 79 DF,  p-value: 0.5026
Q20 <- 0.00159 

Part Two

For part two you will be working with the ‘pokemon.csv’ data set, because not everyone has played the pokemon games. The following is information onf the dataset.

Dataset: name: The English name of the Pokemon japanese_name: The Original Japanese name of the Pokemon pokedex_number: The entry number of the Pokemon in the National Pokedex percentage_male: The percentage of the species that are male. Blank if the Pokemon is genderless. type1: The Primary Type of the Pokemon type2: The Secondary Type of the Pokemon classification: The Classification of the Pokemon as described by the Sun and Moon Pokedex height_m: Height of the Pokemon in metres weight_kg: The Weight of the Pokemon in kilograms capture_rate: Capture Rate of the Pokemon base_egg_steps: The number of steps required to hatch an egg of the Pokemon abilities: A stringified list of abilities that the Pokemon is capable of having experience_growth: The Experience Growth of the Pokemon base_happiness: Base Happiness of the Pokemon against_?: Eighteen features that denote the amount of damage taken against an attack of a particular type hp: The Base HP of the Pokemon attack: The Base Attack of the Pokemon defense: The Base Defense of the Pokemon sp_attack: The Base Special Attack of the Pokemon sp_defense: The Base Special Defense of the Pokemon speed: The Base Speed of the Pokemon generation: The numbered generation which the Pokemon was first introduced is_legendary: Denotes if the Pokemon is legendary.

  1. Start by loading in the data set as pokemon, the csv has extra columns and rows so make sure to use the correct arguement. The dimensions should be 801 rows by 40 columns when ready. Assign the dim(pokemon) to Q21.
pokemon <- (read.csv("pokemon.csv", header = TRUE, skip=3, )[, -c(1:2)])

Q21 <- dim(pokemon)
  1. Seperate the type column into a type1 and type2 column.
pokemon <- pokemon %>% separate(type, c("Type 1", "Type 2"))

For this next section, we will be using the following 2 color palettes. These are provided for you. A few colors will be defined for you so that you can use them in the appropriate place when designing the ggplot theme. “#FF0000”: Pokemon Red. When the question asks for red, use this Hex Code “#3B4CCA”: Pokemon Blue. When the question asks for blue, use this Hex Code “#FFDE00”: Pichachu Yellow. When the question asks for yellow, use this Hex Code

poke_colors <- c("#FF0000", "#3B4CCA", "#FFDE00", "#CC0000", "#B3A125" , "#31AE9D", "#F1B46D" , "#F8EED3" , "#DBD4E0", "#B38BB3")

type_color <- c("#7AC89E", "#332011", "#7C7AB0", "#FFDE00", "#FAD0D5", "#811E09" , "#F62D14",
                "#BD92AF", "#DEE4E6" , "#00ff00" , "#186218" , "#A2D7D5", "#C5915D",
                "#800080", "#EB789A", "#C7AB75" , "#727D90" , "#194787" )
  1. We will create a pokemon theme for the plots in this section. Parts of the theme is already completed. However, you will need to correctly fill in the remainder based on the instructions below. You will need to be exact for the automatic grader to score you correctly so please follow the instuctions exactly. -The plot.background will be filled as red (Use the correct red hex code!) -The panel.border will be blue, size = 3, and fill = NA -The panel.background will be filled white -the panel.grid.major will be “black” -the panel.grid.minor will be size = .1 -the axis.title will be italicized and white -the axis.text will be angled at 90º and yellow -the legend.background will be filled red, with a blue border, with size = 2 -the legend.position = “bottom”

(10 points)

pokemon.theme <- theme(plot.background = element_rect(fill = "#FF0000",
                                colour = "#FF0000"),
           panel.border = element_rect(fill = NA,
                                colour =   "#3B4CCA", size=3) ,
           panel.background = element_rect(fill = "white",
                                colour = "white"),
           panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
           panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
           axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
           axis.text = element_text(angle = 90 , vjust=.5, color =   "#FFDE00"),
           plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
           strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
           strip.text = element_text(face = "italic", size = 10),
           legend.key = element_rect(color = "#3B4CCA", size = 2),
           legend.background =element_rect(fill = "#FF0000",
                                colour =   "#3B4CCA", size=2) ,
           legend.text = element_text(color = "white" , face = "italic"),
           legend.title = element_text(color = "#FFDE00", face = "bold"),
           legend.position = "bottom"
)
  1. Show the last 10 rows of the pokemon data now. Assign this to Q24
Q24 <- tail(pokemon, 10)
  1. Reorder the dataset so that the columns are in the following order: 33, 31, 30, 25, 29, 20, 26, 34:36, 32, 24, 27, 21:23, 28, 39, 37, 38, 40, 41, 1:19 (you’re welcome) Assign this over pokemon, Also assign the new pokemon to Q25
pokemon <- pokemon[c(33, 31, 30, 25, 29, 20, 26, 34:36, 32, 24, 27, 21:23, 28, 39, 37, 38, 40, 41, 1:19)]

Q25 <- pokemon
  1. Factor the classification, type1, type2, generation and is_legendary columns
pokemon$classfication <- as.factor(pokemon$classfication)
pokemon$`Type 1` <- as.factor(pokemon$`Type 1`)
pokemon$`Type 2` <- as.factor(pokemon$`Type 2`)
pokemon$generation <- as.factor(pokemon$generation)
pokemon$is_legendary <- as.factor(pokemon$is_legendary)
  1. Within the tidyverse, group the pokemon together by generation and summarize, in this order, the data with an average HP, Attack, Special Attack, Special Defence, and Speed. Assign this to poke_generation
 poke_generation <- pokemon %>% group_by (generation) %>% summarise(HP = mean(hp),Attack = mean(attack), Special_Attack = mean(sp_attack), Special_Defense=mean(sp_defense), Speed=mean(speed) )
  1. Using the theme we created, make a bar chart of the Average attack of the Pokemon, by their generation. Have the fill be the generation and name this “Average Attack of Pokemon, by Generation”. Finally, set the fill colors of the graph to be poke_colors. Assign this graph to Q28
 Q28 <- ggplot(data=poke_generation, aes(x=Attack, y=generation, fill=generation)) + geom_bar(stat="identity") + scale_fill_manual("generation", values = poke_colors) +
  ggtitle("Average attack of Pokemon") +theme(plot.background = element_rect(fill = "#FF0000",
                                colour = "#FF0000"),
           panel.border = element_rect(fill = NA,
                                colour =   "#3B4CCA", size=3) ,
           panel.background = element_rect(fill = "white",
                                colour = "white"),
           panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
           panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
           axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
           axis.text = element_text(angle = 90 , vjust=.5, color =   "#FFDE00"),
           plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
           strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
           strip.text = element_text(face = "italic", size = 10),
           legend.key = element_rect(color = "#3B4CCA", size = 2),
           legend.background =element_rect(fill = "#FF0000",
                                colour =   "#3B4CCA", size=2) ,
           legend.text = element_text(color = "white" , face = "italic"),
           legend.title = element_text(color = "#FFDE00", face = "bold"),
           legend.position = "bottom"
)
  1. Create the following scatterplot and assign to Q29: -height on the y axis -weight on the x axis -straight trendlines for each generation, with no standard error -x label is “Weight of Pokemon (Kg)” -y label is “Height of Pokemon (M)” -colors of the points are type_color -graph titled “Heigh vs Weight of Pokemon, by generation” -Uses the pokemon.theme

Part B: Which generation has the highest slope? Assign this answer to Q29B

ggplot(pokemon, aes(x=weight_kg, y=height_m, colour=generation)) + ggtitle("Heigh vs Weight of Pokemon, by generation") + geom_point() + stat_smooth(method = "lm", formula = y ~ x, se = FALSE) + scale_color_manual("generation", values= type_color)+xlab("Weight of Pokemon (Kg)") + ylab("Height of Pokemon (M)") +theme(plot.background = element_rect(fill = "#FF0000",
                                colour = "#FF0000"),
           panel.border = element_rect(fill = NA,
                                colour =   "#3B4CCA", size=3) ,
           panel.background = element_rect(fill = "white",
                                colour = "white"),
           panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
           panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
           axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
           axis.text = element_text(angle = 90 , vjust=.5, color =   "#FFDE00"),
           plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
           strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
           strip.text = element_text(face = "italic", size = 10),
           legend.key = element_rect(color = "#3B4CCA", size = 2),
           legend.background =element_rect(fill = "#FF0000",
                                colour =   "#3B4CCA", size=2) ,
           legend.text = element_text(color = "white" , face = "italic"),
           legend.title = element_text(color = "#FFDE00", face = "bold"),
           legend.position = "bottom"
)
## Warning: Removed 20 rows containing non-finite values (stat_smooth).
## Warning: Removed 20 rows containing missing values (geom_point).

Q29B <- "2"
  1. Make the following barchart that shows how many legendary pokemon are in each generation (is_legendary = 1):

Part B Which generation has the least non-legendary pokemon? Assign this to Q30B

ggplot(data=pokemon,aes(x=is_legendary, y=generation, fill=generation)) + geom_bar(stat="identity") + scale_fill_manual("generation", values = poke_colors) +
  ggtitle("Count of Pokemon, by Generation") + theme(plot.background = element_rect(fill = "#FF0000",
                   colour = "#FF0000"),
           panel.border = element_rect(fill = NA,
                                colour =   "#3B4CCA", size=3) ,
           panel.background = element_rect(fill = "white",
                                colour = "white"),
           panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
           panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
           axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
           axis.text = element_text(angle = 90 , vjust=.5, color =   "#FFDE00"),
           plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
           strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
           strip.text = element_text(face = "italic", size = 10),
           legend.key = element_rect(color = "#3B4CCA", size = 2),
           legend.background =element_rect(fill = "#FF0000",
                                colour =   "#3B4CCA", size=2) ,
           legend.text = element_text(color = "white" , face = "italic"),
           legend.title = element_text(color = "#FFDE00", face = "bold"),
           legend.position = "bottom"

)

Q30B <- "1"
  1. Create the following graph:

Part B What is the type of the legendary pokemon with the highest average attack? (Notice the type colors) Assign to Q31B

poke_attack <- pokemon %>% group_by (`Type 1`, is_legendary) %>% summarise(Attack = mean(attack))


ggplot(poke_attack, aes(x=`Type 1`, y=Attack, colour=`Type 1`, group=1)) + ggtitle("Average Attack of Legendary and Regular Pokemon") + geom_point() +geom_smooth(method="lm") + scale_color_manual("Type 1", values=type_color)+xlab("Type !") + ylab("mean attack") + facet_wrap(poke_attack$is_legendary) + theme(plot.background = element_rect(fill = "#FF0000",
                   colour = "#FF0000"),
           panel.border = element_rect(fill = NA,
                                colour =   "#3B4CCA", size=3) ,
           panel.background = element_rect(fill = "white",
                                colour = "white"),
           panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
           panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
           axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
           axis.text = element_text(angle = 90 , vjust=.5, color =   "#FFDE00"),
           plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
           strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
           strip.text = element_text(face = "italic", size = 10),
           legend.key = element_rect(color = "#3B4CCA", size = 2),
           legend.background =element_rect(fill = "#FF0000",
                                colour =   "#3B4CCA", size=2) ,
           legend.text = element_text(color = "white" , face = "italic"),
           legend.title = element_text(color = "#FFDE00", face = "bold"),
           legend.position = "bottom"

)

Q31B <- "ground"
  1. Make a logistic linear model that predicts whether or not a pokemon is legendary by its heath, attack, defense, special attack, special defense and speed. Assign the model to poke_model.
poke_model <- glm(`is_legendary`~ `hp`+`attack`+ `defense`+ `sp_attack`+ `speed`, data = pokemon, family=binomial(link="logit"))
  1. Predict whether or not a pokemon is legendary based on the model, assign your prediction as a new column named prediction
prediction <- plogis(predict(poke_model, pokemon))


pokemon <- cbind(pokemon, prediction)

***34. Make the prediction binary with the values below 0 becoming 0 and the ones above 0 becoming 1

pokemon$prediction <- as.numeric(pokemon$prediction)

pokemon$prediction <- ifelse(pokemon$prediction > 0.0, 1,0)

***35. Create a confusion matrix of the predictions vs is_legendary. Assign this to Q35

levels(pokemon$prediction) <- levels (pokemon$is_legendary)


table(pokemon$is_legendary, pokemon$prediction)
##    
##       1
##   0 731
##   1  70

Part 3: Choice modeling

This last section comes from the last assigned DataCamp course “Choice Modeling for Marketing in R” https://www.datacamp.com/courses/marketing-analytics-in-r-choice-modeling The First three chapters are the only things in these last question.

  1. Import the sportscar_choice_long.csv as sportscar.df
sportscar.df <- read.csv("sportscar_choice_long.csv", header= TRUE)
  1. Inspect the choice data. Count the amount of times each number of seats in a car was chosen. How many people prefered cars with 5 seats? Assign your answer to Q37
xtabs(choice~seat, data=sportscar.df)
## seat
##   2   4   5 
## 608 616 776
Q37 <- 776

***38. Before you can fit a choice model. Turn the sportscar.df into a mlogit.data object so that it can have a choice model fit on it. Assign the mlogit.data object to sportscar

sportscar <- mlogit.data(sportscar.df, choice="choice", alt.var="alt", shape="long")
  1. Now fit a choice model that predicts the choice from the seat number, the transmission type, if it is convertable, and its price. Assign this to model as model3
model3 <- mlogit(choice ~ seat + trans + convert + price, data=sportscar)
  1. Make a share prediction using the model that you have created for the following cars: -Car 1: 4 seats, automatic transmission, convertible, 40 -Car 2: 2 seats, manual transmission, not convertible, 35

(Extra Credit + 5)

sportscar2 <- data.frame("seat" = c(4,2,0), "trans" = c("auto","manual","auto"), "convert" = c("yes","no", "no"), "price"=c(40,35,100))

predict (model3, newdata = sportscar2)
##            1            2            3 
## 6.712677e-01 3.287281e-01 4.139509e-06
##I added a 3rd choice because the model is expecting three choices according to alt. I could not get it to work with just 2 options, so I may have just don it wrong. 

Merry Christmas! Submit URL to Canvas and we will start grading it