Welcome to the Final, this semester has been a great opportunity for everyone to learn the basics of R. This exam will build up off of everything that we have done in the course. It is open book, but you may not ask the TAs for help nor your classmates.
Use the variable names that I gave you, I am testing automating the grading and the variable names will be important. Some of the questions are going to be muliple choice, see the example below for how to answer them:
Example Problem: Let’s start by introducing the topics of R. Assign 4 to x, “a” to y and FALSE to z. You cannot combine these variables, why?
x <- 4
y <- "a"
z <- FALSE
# Submit answer
q_example <- "a"
# For some problems, you will need to assign the value to the question. Do it like so
q_example_2 <- 2
farm_animals <- list("cat", "dog", "sheep", "cow","chicken")
report_card <- c(92,88,91,97,85)
assignments <- c("HW1", "Exam 1", "Quiz", "Exam 2", "HW2")
weight <- c(.10,.25, .3, .25, .10)
assign <- c(report_card, assignments)
weight_grade <- c(report_card * weight)
sum(weight_grade)
## [1] 91.25
student_grade <- 91.25
Create a matrix called ‘nums’ which includes numbers 1:50 and has 5 rows with the numbers ascending on each row. Your output should look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 2 3 4 5 6 7 8 9 10 [2,] 11 12 13 14 15 16 17 18 19 20 [3,] 21 22 23 24 25 26 27 28 29 30 [4,] 31 32 33 34 35 36 37 38 39 40 [5,] 41 42 43 44 45 46 47 48 49 50
nums <- matrix(
c(1:50),
nrow=5,
ncol=10,
byrow = TRUE)
4.With the looping variable created, write a for loop or lapply function that will change all the negative numbers to 0 and the positive numbers to 1
looping <- rnorm(100)
for (i in 1:NROW(looping)){
if (looping[i] > 0)
{looping[i] <- 1}
else {looping[i] <- 0}
}
For part one you will be working with the ‘countries.csv’ dataset.
countries <- read.csv("countries.csv", header = TRUE)
head(countries,3)
## Country Region Population Area..sq..mi..
## 1 Afghanistan ASIA (EX. NEAR EAST) 31056997 647500
## 2 Albania EASTERN EUROPE 3581655 28748
## 3 Algeria NORTHERN AFRICA 32930091 2381740
## Pop..Density..per.sq..mi.. Coastline..coast.area.ratio. Net.migration
## 1 48,0 0,00 23,06
## 2 124,6 1,26 -4,93
## 3 13,8 0,04 -0,39
## Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## 1 163,07 700 36,0
## 2 21,52 4500 86,5
## 3 31 6000 70,0
## Phones..per.1000. Arable.... Crops.... Other.... Climate Birthrate Deathrate
## 1 3,2 12,13 0,22 87,65 1 46,6 20,34
## 2 71,2 21,09 4,42 74,49 3 15,11 5,22
## 3 78,1 3,22 0,25 96,53 1 17,14 4,61
## Agriculture Industry Service
## 1 0,38 0,24 0,38
## 2 0,232 0,188 0,579
## 3 0,101 0,6 0,298
Area (sq. mi.) column to get the landmass of the land of six of the seven continents, assign this value to sum_areasum_area <- sum(countries$Area..sq..mi..)
convert_number <- function(x){
x1 <- gsub(",", ".", x)
x2 <- as.double(x1)
return(x2)
}
Now use your function on the following columns: Pop. Density (per sq. mi.), Coastline (coast/area ratio), Net migration, Phones (per 1000), Arable (%), Crops (%), Agriculture, Industry, Service
Don’t forget to check that it worked!
countries$Population <- convert_number(countries$Population)
countries$Pop..Density..per.sq..mi..<- convert_number(countries$Pop..Density..per.sq..mi..)
countries$Coastline..coast.area.ratio. <- convert_number(countries$Coastline..coast.area.ratio.)
countries$Net.migration <- convert_number(countries$Net.migration)
countries$Phones..per.1000. <- convert_number(countries$Phones..per.1000.)
countries$Arable....<- convert_number(countries$Arable....)
countries$Crops.... <- convert_number(countries$Crops....)
countries$Agriculture <- convert_number(countries$Agriculture)
countries$Industry<- convert_number(countries$Industry)
countries$Service <- convert_number(countries$Service)
summary(countries)
## Country Region
## Afghanistan : 1 SUB-SAHARAN AFRICA :51
## Albania : 1 LATIN AMER. & CARIB :45
## Algeria : 1 ASIA (EX. NEAR EAST) :28
## American Samoa : 1 WESTERN EUROPE :28
## Andorra : 1 OCEANIA :21
## Angola : 1 NEAR EAST :16
## (Other) :221 (Other) :38
## Population Area..sq..mi.. Pop..Density..per.sq..mi..
## Min. :7.026e+03 Min. : 2 Min. : 0.00
## 1st Qu.:4.376e+05 1st Qu.: 4648 1st Qu.: 29.15
## Median :4.787e+06 Median : 86600 Median : 78.80
## Mean :2.874e+07 Mean : 598227 Mean : 379.05
## 3rd Qu.:1.750e+07 3rd Qu.: 441811 3rd Qu.: 190.15
## Max. :1.314e+09 Max. :17075200 Max. :16271.50
##
## Coastline..coast.area.ratio. Net.migration
## Min. : 0.00 Min. :-20.99000
## 1st Qu.: 0.10 1st Qu.: -0.92750
## Median : 0.73 Median : 0.00000
## Mean : 21.17 Mean : 0.03812
## 3rd Qu.: 10.35 3rd Qu.: 0.99750
## Max. :870.66 Max. : 23.06000
## NA's :3
## Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## : 3 Min. : 500 : 18
## 9,95 : 3 1st Qu.: 1900 99,0 : 13
## 12,62 : 2 Median : 5550 97,0 : 11
## 4,39 : 2 Mean : 9690 98,0 : 10
## 10,03 : 1 3rd Qu.:15700 100,0 : 7
## 10,09 : 1 Max. :55100 98,6 : 4
## (Other):215 NA's :1 (Other):164
## Phones..per.1000. Arable.... Crops.... Other.... Climate
## Min. : 0.2 Min. : 0.00 Min. : 0.000 100 : 8 : 22
## 1st Qu.: 37.8 1st Qu.: 3.22 1st Qu.: 0.190 75 : 3 1 : 29
## Median : 176.2 Median :10.42 Median : 1.030 : 2 1,5: 8
## Mean : 236.1 Mean :13.80 Mean : 4.564 70 : 2 2 :111
## 3rd Qu.: 389.6 3rd Qu.:20.00 3rd Qu.: 4.440 70,44 : 2 2,5: 3
## Max. :1035.6 Max. :62.11 Max. :50.680 73,33 : 2 3 : 48
## NA's :4 NA's :2 NA's :2 (Other):208 4 : 6
## Birthrate Deathrate Agriculture Industry
## : 3 : 4 Min. :0.00000 Min. :0.0200
## 12,56 : 2 10,31 : 2 1st Qu.:0.03775 1st Qu.:0.1930
## 18,02 : 2 12,25 : 2 Median :0.09900 Median :0.2720
## 18,79 : 2 14,02 : 2 Mean :0.15084 Mean :0.2827
## 20,48 : 2 3,92 : 2 3rd Qu.:0.22100 3rd Qu.:0.3410
## 10 : 1 5,28 : 2 Max. :0.76900 Max. :0.9060
## (Other):215 (Other):213 NA's :15 NA's :16
## Service
## Min. :0.0620
## 1st Qu.:0.4293
## Median :0.5710
## Mean :0.5653
## 3rd Qu.:0.6785
## Max. :0.9540
## NA's :15
sum(countries$Climate== "")
## [1] 22
Q8 <-22
GDP <- countries$GDP....per.capita. * countries$Population
print (GDP)
## [1] 2.173990e+10 1.611745e+10 1.975805e+11 4.623520e+08 1.352819e+09
## [6] 2.304143e+10 1.159022e+08 7.601880e+08 4.471245e+11 1.041730e+10
## [11] 2.012948e+09 5.876584e+11 2.457864e+11 2.706950e+10 5.072959e+09
## [16] 1.180609e+10 2.799942e+11 4.394618e+09 6.278737e+10 3.020308e+11
## [21] 1.409877e+09 8.649238e+09 2.367828e+09 2.963640e+09 2.157371e+10
## [26] 2.744375e+10 1.475850e+10 1.429395e+12 3.695680e+08 7.057658e+09
## [31] 5.612879e+10 1.529327e+10 8.528874e+10 4.854041e+09 2.637471e+10
## [36] 3.121326e+10 9.863482e+11 5.893706e+08 1.590260e+09 4.733692e+09
## [41] 1.193304e+10 1.597288e+11 6.569869e+12 2.746361e+11 4.836636e+08
## [46] 4.386239e+10 2.591620e+09 1.069400e+08 3.708488e+10 2.471678e+10
## [51] 4.764434e+10 3.301018e+10 1.505858e+10 1.606966e+11 1.695156e+11
## [56] 6.324890e+08 3.721140e+08 5.510390e+10 5.313885e+08 4.470678e+10
## [61] 3.155480e+11 3.274741e+10 1.458294e+09 3.350896e+09 1.628930e+10
## [66] 5.234459e+10 1.039412e+09 5.254504e+09 1.433396e+11 1.680181e+12
## [71] 1.655925e+09 4.805115e+09 7.836983e+09 2.790659e+09 8.572542e+08
## [76] 1.165368e+10 2.274855e+12 4.930106e+10 4.887400e+08 2.137612e+11
## [81] 1.127220e+09 4.485150e+08 3.622208e+09 3.591399e+09 5.040353e+10
## [86] 1.308180e+09 2.034947e+10 1.153623e+09 3.068980e+09 1.329361e+10
## [91] 1.904889e+10 1.998844e+11 1.387405e+11 9.251089e+09 3.176521e+12
## [96] 7.854488e+11 4.808190e+11 4.017507e+10 1.202422e+11 1.584261e+09
## [101] 1.257719e+11 1.552165e+12 1.075668e+10 3.594474e+12 2.258883e+09
## [106] 2.539907e+10 9.596944e+10 3.470782e+10 8.434560e+07 3.004692e+10
## [111] 8.694734e+11 4.594947e+10 8.342237e+09 1.082642e+10 2.320230e+10
## [116] 1.859544e+10 6.066993e+09 3.042004e+09 3.776483e+10 8.496750e+08
## [121] 4.087933e+10 2.614016e+10 8.790625e+09 1.373871e+10 1.487638e+10
## [126] 7.808356e+09 2.194727e+11 1.400131e+09 1.054515e+10 7.083788e+09
## [131] 9.667520e+07 6.280286e+09 5.719298e+09 1.414543e+10 5.232084e+08
## [136] 9.670457e+11 2.160080e+08 8.040071e+09 8.786610e+08 5.098003e+09
## [141] 3.209260e+07 1.329650e+11 2.362381e+10 1.471786e+10 6.643500e+07
## [146] 3.960201e+10 4.716558e+11 2.527790e+09 3.288690e+09 8.804462e+10
## [151] 1.281130e+10 1.002008e+10 1.186738e+11 1.030738e+09 1.742890e+11
## [156] 4.063920e+10 3.481875e+11 1.852110e+08 2.010531e+10 1.247520e+10
## [161] 3.058038e+10 1.443433e+11 4.115559e+11 4.277592e+11 1.909057e+11
## [166] 6.597676e+10 1.903522e+10 4.567987e+09 1.561249e+11 1.271753e+12
## [171] 1.124272e+10 1.875500e+07 3.443352e+08 9.096732e+08 4.847940e+07
## [176] 3.417592e+08 9.906848e+08 1.012085e+09 2.320956e+08 3.188328e+11
## [181] 1.917939e+10 2.067210e+10 6.360198e+08 3.002625e+09 1.064640e+11
## [186] 7.234466e+10 3.819659e+10 9.391446e+08 4.431669e+09 4.728077e+11
## [191] 8.887525e+11 7.482229e+10 7.834912e+10 1.756468e+09 5.568037e+09
## [196] 2.416448e+11 2.460326e+11 6.230849e+10 5.390444e+11 7.320815e+09
## [201] 2.246724e+10 4.782738e+11 8.323053e+09 2.523158e+08 1.012550e+10
## [206] 7.020760e+10 4.717735e+11 2.924894e+10 2.030592e+08 1.299100e+07
## [211] 3.947406e+10 2.522384e+11 6.038294e+10 1.678874e+12 1.128119e+13
## [216] 4.392873e+10 4.642213e+10 6.057201e+08 1.235061e+11 2.110074e+11
## [221] 1.868006e+09 5.929250e+07 1.968394e+09 NA 1.716495e+10
## [226] 9.201608e+09 2.324993e+10
countries$GDP <- cbind(GDP)
min(countries$GDP, na.rm=T)
## [1] 12991000
Q9 <- "12991000"
noNA <- na.omit(countries)
Q10 <- dim(noNA)
hist(countries$GDP....per.capita.)
Q11 <- "b"
countries$Region <- as.factor(countries$Region)
levels(countries$Region) <- levels (countries$Region)
boxplot(GDP....per.capita.~Region,data=countries, main="GDP ($ per capita) for each region. ",
xlab="Region", ylab="GDP")
Q12 <- "Western Europe"
ggplot(countries, aes(x = Agriculture, y = Literacy....)) +
geom_point()
## Warning: Removed 15 rows containing missing values (geom_point).
Q13 <- "b"
Net migration? Assign your answer to Q14glimpse(countries)
## Observations: 227
## Variables: 21
## $ Country <fct> "Afghanistan ", "Albania ", "Alger…
## $ Region <fct> ASIA (EX. NEAR EAST) , EAS…
## $ Population <dbl> 31056997, 3581655, 32930091, 57794…
## $ Area..sq..mi.. <int> 647500, 28748, 2381740, 199, 468, …
## $ Pop..Density..per.sq..mi.. <dbl> 48.0, 124.6, 13.8, 290.4, 152.1, 9…
## $ Coastline..coast.area.ratio. <dbl> 0.00, 1.26, 0.04, 58.29, 0.00, 0.1…
## $ Net.migration <dbl> 23.06, -4.93, -0.39, -20.71, 6.60,…
## $ Infant.mortality..per.1000.births. <fct> "163,07", "21,52", "31", "9,27", "…
## $ GDP....per.capita. <int> 700, 4500, 6000, 8000, 19000, 1900…
## $ Literacy.... <fct> "36,0", "86,5", "70,0", "97,0", "1…
## $ Phones..per.1000. <dbl> 3.2, 71.2, 78.1, 259.5, 497.2, 7.8…
## $ Arable.... <dbl> 12.13, 21.09, 3.22, 10.00, 2.22, 2…
## $ Crops.... <dbl> 0.22, 4.42, 0.25, 15.00, 0.00, 0.2…
## $ Other.... <fct> "87,65", "74,49", "96,53", "75", "…
## $ Climate <fct> "1", "3", "1", "2", "3", "", "2", …
## $ Birthrate <fct> "46,6", "15,11", "17,14", "22,46",…
## $ Deathrate <fct> "20,34", "5,22", "4,61", "3,27", "…
## $ Agriculture <dbl> 0.380, 0.232, 0.101, NA, NA, 0.096…
## $ Industry <dbl> 0.240, 0.188, 0.600, NA, NA, 0.658…
## $ Service <dbl> 0.380, 0.579, 0.298, NA, NA, 0.246…
## $ GDP <dbl[,1]> <matrix[26 x 1]>
Q14 <- "Double"
Coastline (coast/area ratio) column, have the variables decreasing. Assign this to Q15Q15 <- countries[order(-countries$Coastline..coast.area.ratio.),]
countries[which(countries$GDP....per.capita. == min(countries$GDP....per.capita., na.rm=TRUE)),]
## Country Region Population Area..sq..mi..
## 59 East Timor ASIA (EX. NEAR EAST) 1062777 15007
## 184 Sierra Leone SUB-SAHARAN AFRICA 6005250 71740
## 189 Somalia SUB-SAHARAN AFRICA 8863338 637657
## Pop..Density..per.sq..mi.. Coastline..coast.area.ratio. Net.migration
## 59 70.8 4.70 0.00
## 184 83.7 0.56 0.00
## 189 13.9 0.47 5.37
## Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## 59 47,41 500 58,6
## 184 143,64 500 31,4
## 189 116,7 500 37,8
## Phones..per.1000. Arable.... Crops.... Other.... Climate Birthrate
## 59 NA 4.71 0.67 94,62 2 26,99
## 184 4.0 6.98 0.89 92,13 2 45,76
## 189 11.3 1.67 0.04 98,29 1 45,13
## Deathrate Agriculture Industry Service GDP
## 59 6,24 0.085 0.231 0.684 531388500
## 184 23,03 0.490 0.310 0.210 3002625000
## 189 16,63 0.650 0.100 0.250 4431669000
Q16 <- "Somalia"
sum <- countries %>% group_by (Country) %>% summarise(count=sum(GDP < 1803000000000))
countries$GDP2 <- cbind(sum$count)
sum <- filter(sum, count == "1")
Q17 <- "221"
countries2 <- filter(countries, GDP2==1)
countries2[which(countries2$GDP == max(countries2$GDP)), ]
## Country Region Population Area..sq..mi..
## 69 France WESTERN EUROPE 60876136 547030
## Pop..Density..per.sq..mi.. Coastline..coast.area.ratio. Net.migration
## 69 111.3 0.63 0.66
## Infant.mortality..per.1000.births. GDP....per.capita. Literacy....
## 69 4,26 27600 99,0
## Phones..per.1000. Arable.... Crops.... Other.... Climate Birthrate Deathrate
## 69 586.4 33.53 2.07 64,4 4 11,99 9,14
## Agriculture Industry Service GDP GDP2
## 69 0.022 0.214 0.764 1.680181e+12 1
Q18 <- "France"
Q19Q19 <- countries %>% group_by (Region) %>% summarise(Pop_Dense=mean(Pop..Density..per.sq..mi..), GDP_Avg = mean(GDP....per.capita.),Net = mean(Net.migration), Agriculture = mean(Agriculture))
Net migration using Coastline, GDP per capita, Literacy, and Crops. Assign the adjusted R-squared value to Q20lm <- lm( `Net.migration`~ `Coastline..coast.area.ratio.`+ `GDP....per.capita.`+ `Literacy....`+ `Crops....` , data = countries)
summary(lm)
##
## Call:
## lm(formula = Net.migration ~ Coastline..coast.area.ratio. + GDP....per.capita. +
## Literacy.... + Crops...., data = countries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.2168 -0.0557 0.0000 0.0000 12.2903
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.550e+00 1.515e+00 -1.023 0.309539
## Coastline..coast.area.ratio. 4.499e-03 6.961e-03 0.646 0.519910
## GDP....per.capita. 2.067e-04 5.716e-05 3.616 0.000526 ***
## Literacy....100,0 -8.516e-01 2.535e+00 -0.336 0.737822
## Literacy....17,6 7.163e-01 5.103e+00 0.140 0.888734
## Literacy....26,6 1.354e+00 5.100e+00 0.265 0.791329
## Literacy....31,4 1.591e+00 5.100e+00 0.312 0.755934
## Literacy....35,9 -1.520e+00 5.081e+00 -0.299 0.765686
## Literacy....36,0 2.450e+01 5.103e+00 4.802 7.33e-06 ***
## Literacy....37,8 6.821e+00 5.105e+00 1.336 0.185350
## Literacy....40,1 2.848e+00 5.094e+00 0.559 0.577684
## Literacy....40,2 1.453e+00 5.096e+00 0.285 0.776349
## Literacy....40,4 1.368e+00 5.094e+00 0.269 0.788895
## Literacy....40,9 1.718e+00 5.088e+00 0.338 0.736602
## Literacy....41,7 1.179e+00 5.096e+00 0.231 0.817591
## Literacy....42,0 1.196e+00 5.094e+00 0.235 0.814942
## Literacy....42,2 1.352e+00 5.097e+00 0.265 0.791484
## Literacy....42,4 1.263e+00 5.079e+00 0.249 0.804193
## Literacy....42,7 1.529e+00 5.100e+00 0.300 0.765112
## Literacy....43,1 9.513e-01 5.081e+00 0.187 0.851948
## Literacy....45,2 1.366e+00 5.095e+00 0.268 0.789310
## Literacy....45,7 -1.511e+00 5.089e+00 -0.297 0.767275
## Literacy....46,4 1.039e+00 5.102e+00 0.204 0.839179
## Literacy....47,5 1.195e+00 5.100e+00 0.234 0.815316
## Literacy....47,8 1.350e+00 5.098e+00 0.265 0.791864
## Literacy....50,2 1.423e+00 5.102e+00 0.279 0.781098
## Literacy....50,9 3.470e+00 5.092e+00 0.682 0.497523
## Literacy....51,0 1.346e+00 5.100e+00 0.264 0.792581
## Literacy....51,6 3.676e+00 5.096e+00 0.721 0.472840
## Literacy....51,7 9.899e-02 5.072e+00 0.020 0.984479
## Literacy....52,9 -2.966e-01 5.078e+00 -0.058 0.953572
## Literacy....53,0 2.073e+00 5.061e+00 0.410 0.683156
## Literacy....56,5 5.177e+00 5.165e+00 1.002 0.319263
## Literacy....57,5 1.717e+00 5.089e+00 0.337 0.736796
## Literacy....57,7 5.813e-01 5.080e+00 0.114 0.909184
## Literacy....58,0 -4.041e+00 5.061e+00 -0.799 0.426928
## Literacy....58,6 1.469e+00 3.758e+00 0.391 0.696967
## Literacy....59,5 1.331e+00 5.076e+00 0.262 0.793817
## Literacy....60,9 1.604e+00 5.087e+00 0.315 0.753389
## Literacy....61,1 1.167e+00 5.094e+00 0.229 0.819428
## Literacy....62,7 1.671e+00 5.096e+00 0.328 0.743795
## Literacy....63,2 5.206e-01 5.072e+00 0.103 0.918507
## Literacy....64,6 1.328e+00 5.085e+00 0.261 0.794729
## Literacy....65,5 1.491e+00 5.101e+00 0.292 0.770825
## Literacy....66,4 1.256e+00 5.095e+00 0.247 0.805866
## Literacy....67,0 1.430e+00 5.149e+00 0.278 0.782001
## Literacy....67,5 1.711e-01 5.083e+00 0.034 0.973227
## Literacy....67,9 1.275e+00 5.099e+00 0.250 0.803191
## Literacy....68,0 2.111e+00 5.087e+00 0.415 0.679274
## Literacy....68,9 1.551e+00 5.097e+00 0.304 0.761756
## Literacy....69,4 1.257e+00 5.092e+00 0.247 0.805694
## Literacy....69,9 3.015e+00 5.079e+00 0.594 0.554458
## Literacy....70,0 -3.900e-02 5.072e+00 -0.008 0.993884
## Literacy....70,4 3.285e+00 5.085e+00 0.646 0.520145
## Literacy....70,6 -1.403e-01 5.064e+00 -0.028 0.977968
## Literacy....74,2 1.815e+00 5.079e+00 0.357 0.721825
## Literacy....74,8 2.047e+00 5.073e+00 0.404 0.687627
## Literacy....75,8 -8.587e-01 5.059e+00 -0.170 0.865652
## Literacy....76,2 -4.501e-01 5.076e+00 -0.089 0.929559
## Literacy....76,6 -1.083e+01 5.093e+00 -2.127 0.036523 *
## Literacy....76,9 1.597e+00 5.069e+00 0.315 0.753495
## Literacy....77,9 -1.851e+00 5.095e+00 -0.363 0.717352
## Literacy....78,2 -4.568e-01 5.098e+00 -0.090 0.928838
## Literacy....78,8 -3.584e+00 5.059e+00 -0.708 0.480722
## Literacy....79,0 1.603e+00 5.083e+00 0.315 0.753367
## Literacy....79,3 6.554e+00 5.707e+00 1.149 0.254210
## Literacy....79,4 -5.083e-01 5.063e+00 -0.100 0.920279
## Literacy....79,8 -3.083e-01 5.063e+00 -0.061 0.951594
## Literacy....80,2 -1.200e+00 5.072e+00 -0.237 0.813586
## Literacy....80,6 1.390e+00 5.103e+00 0.272 0.786102
## Literacy....81,6 6.526e-01 5.074e+00 0.129 0.897988
## Literacy....82,5 1.342e+01 5.084e+00 2.640 0.009994 **
## Literacy....82,6 2.582e-01 5.070e+00 0.051 0.959518
## Literacy....83,0 6.111e+00 5.065e+00 1.207 0.231191
## Literacy....83,5 1.181e+01 5.072e+00 2.328 0.022456 *
## Literacy....83,8 1.256e+00 5.103e+00 0.246 0.806154
## Literacy....84,0 6.114e-02 5.068e+00 0.012 0.990405
## Literacy....84,7 -1.220e+00 5.062e+00 -0.241 0.810183
## Literacy....84,8 2.114e-01 5.088e+00 0.042 0.966962
## Literacy....85,1 1.404e+00 5.096e+00 0.276 0.783591
## Literacy....85,3 -4.635e-01 5.091e+00 -0.091 0.927686
## Literacy....85,6 -1.257e+00 5.049e+00 -0.249 0.803981
## Literacy....85,7 1.575e+00 5.074e+00 0.310 0.757014
## Literacy....86,4 -3.625e-01 3.700e+00 -0.098 0.922213
## Literacy....86,5 -1.440e+00 3.701e+00 -0.389 0.698196
## Literacy....87,2 -2.347e-01 5.091e+00 -0.046 0.963342
## Literacy....87,4 2.852e+00 5.082e+00 0.561 0.576275
## Literacy....87,9 -2.385e-01 3.706e+00 -0.064 0.948859
## Literacy....88,7 2.585e+00 5.111e+00 0.506 0.614437
## Literacy....88,9 5.120e-01 5.066e+00 0.101 0.919750
## Literacy....89,0 -1.126e+01 4.519e+00 -2.492 0.014810 *
## Literacy....89,1 -7.397e-02 5.055e+00 -0.015 0.988361
## Literacy....90,0 2.542e-01 5.050e+00 0.050 0.959976
## Literacy....90,3 1.559e+00 5.070e+00 0.307 0.759303
## Literacy....90,7 1.213e+00 5.093e+00 0.238 0.812332
## Literacy....90,9 -8.352e-02 3.719e+00 -0.022 0.982139
## Literacy....91,0 -1.549e+00 5.057e+00 -0.306 0.760223
## Literacy....91,3 7.553e+00 5.072e+00 1.489 0.140460
## Literacy....92,0 1.764e+00 5.434e+00 0.325 0.746242
## Literacy....92,2 -4.966e+00 5.058e+00 -0.982 0.329166
## Literacy....92,3 2.053e+00 5.095e+00 0.403 0.688104
## Literacy....92,5 4.551e-01 3.104e+00 0.147 0.883795
## Literacy....92,6 8.466e-01 3.113e+00 0.272 0.786342
## Literacy....92,8 1.975e-01 5.057e+00 0.039 0.968942
## Literacy....93,0 -3.893e+00 3.728e+00 -1.044 0.299531
## Literacy....93,3 2.678e+00 5.075e+00 0.528 0.599163
## Literacy....93,4 6.682e-01 5.074e+00 0.132 0.895563
## Literacy....93,5 7.025e-01 5.138e+00 0.137 0.891589
## Literacy....93,7 -2.389e-01 3.824e+00 -0.062 0.950354
## Literacy....93,9 1.409e+00 5.069e+00 0.278 0.781811
## Literacy....94,0 -4.846e+00 3.701e+00 -1.309 0.194202
## Literacy....94,1 -8.387e-01 3.688e+00 -0.227 0.820673
## Literacy....94,5 1.742e+00 5.128e+00 0.340 0.735002
## Literacy....95,0 1.026e+01 5.066e+00 2.026 0.046142 *
## Literacy....95,4 -1.181e+00 5.075e+00 -0.233 0.816620
## Literacy....95,6 -4.150e+00 5.059e+00 -0.820 0.414525
## Literacy....96,0 8.957e-01 3.121e+00 0.287 0.774845
## Literacy....96,1 -3.141e+00 5.095e+00 -0.616 0.539411
## Literacy....96,2 -4.306e-01 5.060e+00 -0.085 0.932390
## Literacy....96,7 -1.387e+00 5.056e+00 -0.274 0.784617
## Literacy....97,0 -2.387e+00 1.951e+00 -1.224 0.224683
## Literacy....97,1 -7.638e-02 5.058e+00 -0.015 0.987990
## Literacy....97,2 2.525e+00 5.164e+00 0.489 0.626309
## Literacy....97,4 -1.722e+00 5.052e+00 -0.341 0.734116
## Literacy....97,5 1.167e+00 5.087e+00 0.229 0.819190
## Literacy....97,6 -1.288e+00 5.070e+00 -0.254 0.800114
## Literacy....97,7 -6.548e-02 5.053e+00 -0.013 0.989693
## Literacy....97,8 5.148e+00 3.683e+00 1.398 0.166164
## Literacy....97,9 -1.102e+00 3.727e+00 -0.296 0.768346
## Literacy....98,0 5.018e-01 2.071e+00 0.242 0.809231
## Literacy....98,4 -1.375e+00 3.708e+00 -0.371 0.711764
## Literacy....98,5 4.602e+00 3.812e+00 1.207 0.230911
## Literacy....98,6 -4.914e+00 2.765e+00 -1.777 0.079407 .
## Literacy....98,8 -1.323e+00 5.082e+00 -0.260 0.795287
## Literacy....99,0 -1.330e+00 1.948e+00 -0.682 0.496988
## Literacy....99,1 2.696e+00 5.078e+00 0.531 0.597007
## Literacy....99,3 -3.846e-01 5.092e+00 -0.076 0.939981
## Literacy....99,4 -7.442e-01 3.703e+00 -0.201 0.841237
## Literacy....99,6 7.698e-01 3.122e+00 0.247 0.805865
## Literacy....99,7 -2.686e+00 3.111e+00 -0.863 0.390529
## Literacy....99,8 -2.627e+00 3.114e+00 -0.844 0.401452
## Literacy....99,9 -1.350e+00 3.749e+00 -0.360 0.719779
## Crops.... -1.648e-01 6.461e-02 -2.550 0.012706 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.88 on 79 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.6431, Adjusted R-squared: 0.00159
## F-statistic: 1.002 on 142 and 79 DF, p-value: 0.5026
Q20 <- 0.00159
For part two you will be working with the ‘pokemon.csv’ data set, because not everyone has played the pokemon games. The following is information onf the dataset.
Dataset: name: The English name of the Pokemon japanese_name: The Original Japanese name of the Pokemon pokedex_number: The entry number of the Pokemon in the National Pokedex percentage_male: The percentage of the species that are male. Blank if the Pokemon is genderless. type1: The Primary Type of the Pokemon type2: The Secondary Type of the Pokemon classification: The Classification of the Pokemon as described by the Sun and Moon Pokedex height_m: Height of the Pokemon in metres weight_kg: The Weight of the Pokemon in kilograms capture_rate: Capture Rate of the Pokemon base_egg_steps: The number of steps required to hatch an egg of the Pokemon abilities: A stringified list of abilities that the Pokemon is capable of having experience_growth: The Experience Growth of the Pokemon base_happiness: Base Happiness of the Pokemon against_?: Eighteen features that denote the amount of damage taken against an attack of a particular type hp: The Base HP of the Pokemon attack: The Base Attack of the Pokemon defense: The Base Defense of the Pokemon sp_attack: The Base Special Attack of the Pokemon sp_defense: The Base Special Defense of the Pokemon speed: The Base Speed of the Pokemon generation: The numbered generation which the Pokemon was first introduced is_legendary: Denotes if the Pokemon is legendary.
pokemon, the csv has extra columns and rows so make sure to use the correct arguement. The dimensions should be 801 rows by 40 columns when ready. Assign the dim(pokemon) to Q21.pokemon <- (read.csv("pokemon.csv", header = TRUE, skip=3, )[, -c(1:2)])
Q21 <- dim(pokemon)
pokemon <- pokemon %>% separate(type, c("Type 1", "Type 2"))
For this next section, we will be using the following 2 color palettes. These are provided for you. A few colors will be defined for you so that you can use them in the appropriate place when designing the ggplot theme. “#FF0000”: Pokemon Red. When the question asks for red, use this Hex Code “#3B4CCA”: Pokemon Blue. When the question asks for blue, use this Hex Code “#FFDE00”: Pichachu Yellow. When the question asks for yellow, use this Hex Code
poke_colors <- c("#FF0000", "#3B4CCA", "#FFDE00", "#CC0000", "#B3A125" , "#31AE9D", "#F1B46D" , "#F8EED3" , "#DBD4E0", "#B38BB3")
type_color <- c("#7AC89E", "#332011", "#7C7AB0", "#FFDE00", "#FAD0D5", "#811E09" , "#F62D14",
"#BD92AF", "#DEE4E6" , "#00ff00" , "#186218" , "#A2D7D5", "#C5915D",
"#800080", "#EB789A", "#C7AB75" , "#727D90" , "#194787" )
(10 points)
pokemon.theme <- theme(plot.background = element_rect(fill = "#FF0000",
colour = "#FF0000"),
panel.border = element_rect(fill = NA,
colour = "#3B4CCA", size=3) ,
panel.background = element_rect(fill = "white",
colour = "white"),
panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
axis.text = element_text(angle = 90 , vjust=.5, color = "#FFDE00"),
plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
strip.text = element_text(face = "italic", size = 10),
legend.key = element_rect(color = "#3B4CCA", size = 2),
legend.background =element_rect(fill = "#FF0000",
colour = "#3B4CCA", size=2) ,
legend.text = element_text(color = "white" , face = "italic"),
legend.title = element_text(color = "#FFDE00", face = "bold"),
legend.position = "bottom"
)
Q24 <- tail(pokemon, 10)
pokemon to Q25pokemon <- pokemon[c(33, 31, 30, 25, 29, 20, 26, 34:36, 32, 24, 27, 21:23, 28, 39, 37, 38, 40, 41, 1:19)]
Q25 <- pokemon
pokemon$classfication <- as.factor(pokemon$classfication)
pokemon$`Type 1` <- as.factor(pokemon$`Type 1`)
pokemon$`Type 2` <- as.factor(pokemon$`Type 2`)
pokemon$generation <- as.factor(pokemon$generation)
pokemon$is_legendary <- as.factor(pokemon$is_legendary)
poke_generation poke_generation <- pokemon %>% group_by (generation) %>% summarise(HP = mean(hp),Attack = mean(attack), Special_Attack = mean(sp_attack), Special_Defense=mean(sp_defense), Speed=mean(speed) )
poke_colors. Assign this graph to Q28 Q28 <- ggplot(data=poke_generation, aes(x=Attack, y=generation, fill=generation)) + geom_bar(stat="identity") + scale_fill_manual("generation", values = poke_colors) +
ggtitle("Average attack of Pokemon") +theme(plot.background = element_rect(fill = "#FF0000",
colour = "#FF0000"),
panel.border = element_rect(fill = NA,
colour = "#3B4CCA", size=3) ,
panel.background = element_rect(fill = "white",
colour = "white"),
panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
axis.text = element_text(angle = 90 , vjust=.5, color = "#FFDE00"),
plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
strip.text = element_text(face = "italic", size = 10),
legend.key = element_rect(color = "#3B4CCA", size = 2),
legend.background =element_rect(fill = "#FF0000",
colour = "#3B4CCA", size=2) ,
legend.text = element_text(color = "white" , face = "italic"),
legend.title = element_text(color = "#FFDE00", face = "bold"),
legend.position = "bottom"
)
Part B: Which generation has the highest slope? Assign this answer to Q29B
ggplot(pokemon, aes(x=weight_kg, y=height_m, colour=generation)) + ggtitle("Heigh vs Weight of Pokemon, by generation") + geom_point() + stat_smooth(method = "lm", formula = y ~ x, se = FALSE) + scale_color_manual("generation", values= type_color)+xlab("Weight of Pokemon (Kg)") + ylab("Height of Pokemon (M)") +theme(plot.background = element_rect(fill = "#FF0000",
colour = "#FF0000"),
panel.border = element_rect(fill = NA,
colour = "#3B4CCA", size=3) ,
panel.background = element_rect(fill = "white",
colour = "white"),
panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
axis.text = element_text(angle = 90 , vjust=.5, color = "#FFDE00"),
plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
strip.text = element_text(face = "italic", size = 10),
legend.key = element_rect(color = "#3B4CCA", size = 2),
legend.background =element_rect(fill = "#FF0000",
colour = "#3B4CCA", size=2) ,
legend.text = element_text(color = "white" , face = "italic"),
legend.title = element_text(color = "#FFDE00", face = "bold"),
legend.position = "bottom"
)
## Warning: Removed 20 rows containing non-finite values (stat_smooth).
## Warning: Removed 20 rows containing missing values (geom_point).
Q29B <- "2"
Part B Which generation has the least non-legendary pokemon? Assign this to Q30B
ggplot(data=pokemon,aes(x=is_legendary, y=generation, fill=generation)) + geom_bar(stat="identity") + scale_fill_manual("generation", values = poke_colors) +
ggtitle("Count of Pokemon, by Generation") + theme(plot.background = element_rect(fill = "#FF0000",
colour = "#FF0000"),
panel.border = element_rect(fill = NA,
colour = "#3B4CCA", size=3) ,
panel.background = element_rect(fill = "white",
colour = "white"),
panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
axis.text = element_text(angle = 90 , vjust=.5, color = "#FFDE00"),
plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
strip.text = element_text(face = "italic", size = 10),
legend.key = element_rect(color = "#3B4CCA", size = 2),
legend.background =element_rect(fill = "#FF0000",
colour = "#3B4CCA", size=2) ,
legend.text = element_text(color = "white" , face = "italic"),
legend.title = element_text(color = "#FFDE00", face = "bold"),
legend.position = "bottom"
)
Q30B <- "1"
Part B What is the type of the legendary pokemon with the highest average attack? (Notice the type colors) Assign to Q31B
poke_attack <- pokemon %>% group_by (`Type 1`, is_legendary) %>% summarise(Attack = mean(attack))
ggplot(poke_attack, aes(x=`Type 1`, y=Attack, colour=`Type 1`, group=1)) + ggtitle("Average Attack of Legendary and Regular Pokemon") + geom_point() +geom_smooth(method="lm") + scale_color_manual("Type 1", values=type_color)+xlab("Type !") + ylab("mean attack") + facet_wrap(poke_attack$is_legendary) + theme(plot.background = element_rect(fill = "#FF0000",
colour = "#FF0000"),
panel.border = element_rect(fill = NA,
colour = "#3B4CCA", size=3) ,
panel.background = element_rect(fill = "white",
colour = "white"),
panel.grid.major = element_line(color = "black" , size = .1, linetype = "twodash"),
panel.grid.minor = element_line(color = "black", size = .1 , linetype = "dotted"),
axis.title = element_text(hjust=.5, vjust = .5, size = 10, face= "italic", color = "white" ),
axis.text = element_text(angle = 90 , vjust=.5, color = "#FFDE00"),
plot.title = element_text(color = "#3B4CCA", hjust=1, face = "bold"),
strip.background = element_rect(color = "#3B4CCA", size = 2, fill = "#FF0000"),
strip.text = element_text(face = "italic", size = 10),
legend.key = element_rect(color = "#3B4CCA", size = 2),
legend.background =element_rect(fill = "#FF0000",
colour = "#3B4CCA", size=2) ,
legend.text = element_text(color = "white" , face = "italic"),
legend.title = element_text(color = "#FFDE00", face = "bold"),
legend.position = "bottom"
)
Q31B <- "ground"
poke_model.poke_model <- glm(`is_legendary`~ `hp`+`attack`+ `defense`+ `sp_attack`+ `speed`, data = pokemon, family=binomial(link="logit"))
predictionprediction <- plogis(predict(poke_model, pokemon))
pokemon <- cbind(pokemon, prediction)
***34. Make the prediction binary with the values below 0 becoming 0 and the ones above 0 becoming 1
pokemon$prediction <- as.numeric(pokemon$prediction)
pokemon$prediction <- ifelse(pokemon$prediction > 0.0, 1,0)
***35. Create a confusion matrix of the predictions vs is_legendary. Assign this to Q35
levels(pokemon$prediction) <- levels (pokemon$is_legendary)
table(pokemon$is_legendary, pokemon$prediction)
##
## 1
## 0 731
## 1 70
This last section comes from the last assigned DataCamp course “Choice Modeling for Marketing in R” https://www.datacamp.com/courses/marketing-analytics-in-r-choice-modeling The First three chapters are the only things in these last question.
sportscar.df <- read.csv("sportscar_choice_long.csv", header= TRUE)
xtabs(choice~seat, data=sportscar.df)
## seat
## 2 4 5
## 608 616 776
Q37 <- 776
***38. Before you can fit a choice model. Turn the sportscar.df into a mlogit.data object so that it can have a choice model fit on it. Assign the mlogit.data object to sportscar
sportscar <- mlogit.data(sportscar.df, choice="choice", alt.var="alt", shape="long")
model3model3 <- mlogit(choice ~ seat + trans + convert + price, data=sportscar)
(Extra Credit + 5)
sportscar2 <- data.frame("seat" = c(4,2,0), "trans" = c("auto","manual","auto"), "convert" = c("yes","no", "no"), "price"=c(40,35,100))
predict (model3, newdata = sportscar2)
## 1 2 3
## 6.712677e-01 3.287281e-01 4.139509e-06
##I added a 3rd choice because the model is expecting three choices according to alt. I could not get it to work with just 2 options, so I may have just don it wrong.
Merry Christmas! Submit URL to Canvas and we will start grading it