getwd()
[1] "/cloud/project"
# make sure the packages for this chapter
# are installed, install if necessary
pkg <- c("ggplot2", "scales", "maptools",
"sp", "maps", "grid", "car" )
new.pkg <- pkg[!(pkg %in% installed.packages())]
if (length(new.pkg)) {
install.packages(new.pkg)
}
Installing packages into ‘/cloud/lib/x86_64-pc-linux-gnu-library/4.4’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘maptools’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
also installing the dependencies ‘cpp11’, ‘rbibutils’, ‘utf8’, ‘backports’, ‘generics’, ‘purrr’, ‘tidyr’, ‘tidyselect’, ‘cowplot’, ‘Deriv’, ‘modelr’, ‘microbenchmark’, ‘Rdpack’, ‘fansi’, ‘pillar’, ‘pkgconfig’, ‘colorspace’, ‘broom’, ‘dplyr’, ‘numDeriv’, ‘doBy’, ‘SparseM’, ‘MatrixModels’, ‘minqa’, ‘nloptr’, ‘reformulas’, ‘Rcpp’, ‘RcppEigen’, ‘gtable’, ‘isoband’, ‘tibble’, ‘withr’, ‘farver’, ‘labeling’, ‘munsell’, ‘RColorBrewer’, ‘viridisLite’, ‘carData’, ‘abind’, ‘Formula’, ‘pbkrtest’, ‘quantreg’, ‘lme4’
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/cpp11_0.5.1.tar.gz'
Content type 'application/x-gzip' length 288958 bytes (282 KB)
==================================================
downloaded 282 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/rbibutils_2.3.tar.gz'
Content type 'application/x-gzip' length 1137759 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/utf8_1.2.4.tar.gz'
Content type 'application/x-gzip' length 145987 bytes (142 KB)
==================================================
downloaded 142 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/backports_1.5.0.tar.gz'
Content type 'application/x-gzip' length 116627 bytes (113 KB)
==================================================
downloaded 113 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/generics_0.1.3.tar.gz'
Content type 'application/x-gzip' length 77807 bytes (75 KB)
==================================================
downloaded 75 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/purrr_1.0.4.tar.gz'
Content type 'application/x-gzip' length 519404 bytes (507 KB)
==================================================
downloaded 507 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/tidyr_1.3.1.tar.gz'
Content type 'application/x-gzip' length 1179174 bytes (1.1 MB)
==================================================
downloaded 1.1 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/tidyselect_1.2.1.tar.gz'
Content type 'application/x-gzip' length 221715 bytes (216 KB)
==================================================
downloaded 216 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/cowplot_1.1.3.tar.gz'
Content type 'application/x-gzip' length 1377427 bytes (1.3 MB)
==================================================
downloaded 1.3 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/Deriv_4.1.6.tar.gz'
Content type 'application/x-gzip' length 149823 bytes (146 KB)
==================================================
downloaded 146 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/modelr_0.1.11.tar.gz'
Content type 'application/x-gzip' length 201200 bytes (196 KB)
==================================================
downloaded 196 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/microbenchmark_1.5.0.tar.gz'
Content type 'application/x-gzip' length 65220 bytes (63 KB)
==================================================
downloaded 63 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/Rdpack_2.6.2.tar.gz'
Content type 'application/x-gzip' length 749724 bytes (732 KB)
==================================================
downloaded 732 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/fansi_1.0.6.tar.gz'
Content type 'application/x-gzip' length 303572 bytes (296 KB)
==================================================
downloaded 296 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/pillar_1.10.1.tar.gz'
Content type 'application/x-gzip' length 654756 bytes (639 KB)
==================================================
downloaded 639 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/pkgconfig_2.0.3.tar.gz'
Content type 'application/x-gzip' length 17998 bytes (17 KB)
==================================================
downloaded 17 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/colorspace_2.1-1.tar.gz'
Content type 'application/x-gzip' length 2629335 bytes (2.5 MB)
==================================================
downloaded 2.5 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/broom_1.0.7.tar.gz'
Content type 'application/x-gzip' length 1853567 bytes (1.8 MB)
==================================================
downloaded 1.8 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/dplyr_1.1.4.tar.gz'
Content type 'application/x-gzip' length 1475398 bytes (1.4 MB)
==================================================
downloaded 1.4 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/numDeriv_2016.8-1.1.tar.gz'
Content type 'application/x-gzip' length 112835 bytes (110 KB)
==================================================
downloaded 110 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/doBy_4.6.25.tar.gz'
Content type 'application/x-gzip' length 4827686 bytes (4.6 MB)
==================================================
downloaded 4.6 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/SparseM_1.84-2.tar.gz'
Content type 'application/x-gzip' length 883858 bytes (863 KB)
==================================================
downloaded 863 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/MatrixModels_0.5-3.tar.gz'
Content type 'application/x-gzip' length 419205 bytes (409 KB)
==================================================
downloaded 409 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/minqa_1.2.8.tar.gz'
Content type 'application/x-gzip' length 121524 bytes (118 KB)
==================================================
downloaded 118 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/nloptr_2.1.1.tar.gz'
Content type 'application/x-gzip' length 555505 bytes (542 KB)
==================================================
downloaded 542 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/reformulas_0.4.0.tar.gz'
Content type 'application/x-gzip' length 90631 bytes (88 KB)
==================================================
downloaded 88 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/Rcpp_1.0.14.tar.gz'
Content type 'application/x-gzip' length 2178844 bytes (2.1 MB)
==================================================
downloaded 2.1 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/RcppEigen_0.3.4.0.2.tar.gz'
Content type 'application/x-gzip' length 1845890 bytes (1.8 MB)
==================================================
downloaded 1.8 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/gtable_0.3.6.tar.gz'
Content type 'application/x-gzip' length 219376 bytes (214 KB)
==================================================
downloaded 214 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/isoband_0.2.7.tar.gz'
Content type 'application/x-gzip' length 1642543 bytes (1.6 MB)
==================================================
downloaded 1.6 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/tibble_3.2.1.tar.gz'
Content type 'application/x-gzip' length 676334 bytes (660 KB)
==================================================
downloaded 660 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/withr_3.0.2.tar.gz'
Content type 'application/x-gzip' length 217740 bytes (212 KB)
==================================================
downloaded 212 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/farver_2.1.2.tar.gz'
Content type 'application/x-gzip' length 1485495 bytes (1.4 MB)
==================================================
downloaded 1.4 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/labeling_0.4.3.tar.gz'
Content type 'application/x-gzip' length 59707 bytes (58 KB)
==================================================
downloaded 58 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/munsell_0.5.1.tar.gz'
Content type 'application/x-gzip' length 242364 bytes (236 KB)
==================================================
downloaded 236 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/RColorBrewer_1.1-3.tar.gz'
Content type 'application/x-gzip' length 53281 bytes (52 KB)
==================================================
downloaded 52 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/viridisLite_0.4.2.tar.gz'
Content type 'application/x-gzip' length 1296920 bytes (1.2 MB)
==================================================
downloaded 1.2 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/carData_3.0-5.tar.gz'
Content type 'application/x-gzip' length 1821260 bytes (1.7 MB)
==================================================
downloaded 1.7 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/abind_1.4-8.tar.gz'
Content type 'application/x-gzip' length 63828 bytes (62 KB)
==================================================
downloaded 62 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/Formula_1.2-5.tar.gz'
Content type 'application/x-gzip' length 158504 bytes (154 KB)
==================================================
downloaded 154 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/pbkrtest_0.5.3.tar.gz'
Content type 'application/x-gzip' length 176405 bytes (172 KB)
==================================================
downloaded 172 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/quantreg_6.00.tar.gz'
Content type 'application/x-gzip' length 1447704 bytes (1.4 MB)
==================================================
downloaded 1.4 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/lme4_1.1-36.tar.gz'
Content type 'application/x-gzip' length 4230359 bytes (4.0 MB)
==================================================
downloaded 4.0 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/ggplot2_3.5.1.tar.gz'
Content type 'application/x-gzip' length 4957008 bytes (4.7 MB)
==================================================
downloaded 4.7 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/scales_1.3.0.tar.gz'
Content type 'application/x-gzip' length 703254 bytes (686 KB)
==================================================
downloaded 686 KB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/sp_2.2-0.tar.gz'
Content type 'application/x-gzip' length 5305662 bytes (5.1 MB)
==================================================
downloaded 5.1 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/maps_3.4.2.1.tar.gz'
Content type 'application/x-gzip' length 3094204 bytes (3.0 MB)
==================================================
downloaded 3.0 MB
trying URL 'http://rspm/default/__linux__/focal/latest/src/contrib/car_3.1-3.tar.gz'
Content type 'application/x-gzip' length 1510494 bytes (1.4 MB)
==================================================
downloaded 1.4 MB
* installing *binary* package ‘cpp11’ ...
* DONE (cpp11)
* installing *binary* package ‘rbibutils’ ...
* DONE (rbibutils)
* installing *binary* package ‘utf8’ ...
* DONE (utf8)
* installing *binary* package ‘backports’ ...
* DONE (backports)
* installing *binary* package ‘generics’ ...
* DONE (generics)
* installing *binary* package ‘purrr’ ...
* DONE (purrr)
* installing *binary* package ‘Deriv’ ...
* DONE (Deriv)
* installing *binary* package ‘microbenchmark’ ...
* DONE (microbenchmark)
* installing *binary* package ‘fansi’ ...
* DONE (fansi)
* installing *binary* package ‘pkgconfig’ ...
* DONE (pkgconfig)
* installing *binary* package ‘colorspace’ ...
* DONE (colorspace)
* installing *binary* package ‘numDeriv’ ...
* DONE (numDeriv)
* installing *binary* package ‘SparseM’ ...
* DONE (SparseM)
* installing *binary* package ‘MatrixModels’ ...
* DONE (MatrixModels)
* installing *binary* package ‘nloptr’ ...
* DONE (nloptr)
* installing *binary* package ‘Rcpp’ ...
* DONE (Rcpp)
* installing *binary* package ‘gtable’ ...
* DONE (gtable)
* installing *binary* package ‘isoband’ ...
* DONE (isoband)
* installing *binary* package ‘withr’ ...
* DONE (withr)
* installing *binary* package ‘farver’ ...
* DONE (farver)
* installing *binary* package ‘labeling’ ...
* DONE (labeling)
* installing *binary* package ‘RColorBrewer’ ...
* DONE (RColorBrewer)
* installing *binary* package ‘viridisLite’ ...
* DONE (viridisLite)
* installing *binary* package ‘carData’ ...
* DONE (carData)
* installing *binary* package ‘abind’ ...
* DONE (abind)
* installing *binary* package ‘Formula’ ...
* DONE (Formula)
* installing *binary* package ‘sp’ ...
* DONE (sp)
* installing *binary* package ‘maps’ ...
* DONE (maps)
* installing *binary* package ‘tidyselect’ ...
* DONE (tidyselect)
* installing *binary* package ‘Rdpack’ ...
* DONE (Rdpack)
* installing *binary* package ‘pillar’ ...
* DONE (pillar)
* installing *binary* package ‘minqa’ ...
* DONE (minqa)
* installing *binary* package ‘RcppEigen’ ...
* DONE (RcppEigen)
* installing *binary* package ‘munsell’ ...
* DONE (munsell)
* installing *binary* package ‘quantreg’ ...
* DONE (quantreg)
* installing *binary* package ‘reformulas’ ...
* DONE (reformulas)
* installing *binary* package ‘tibble’ ...
* DONE (tibble)
* installing *binary* package ‘scales’ ...
* DONE (scales)
* installing *binary* package ‘dplyr’ ...
* DONE (dplyr)
* installing *binary* package ‘lme4’ ...
* DONE (lme4)
* installing *binary* package ‘ggplot2’ ...
* DONE (ggplot2)
* installing *binary* package ‘tidyr’ ...
* DONE (tidyr)
* installing *binary* package ‘cowplot’ ...
* DONE (cowplot)
* installing *binary* package ‘broom’ ...
* DONE (broom)
* installing *binary* package ‘modelr’ ...
* DONE (modelr)
* installing *binary* package ‘doBy’ ...
* DONE (doBy)
* installing *binary* package ‘pbkrtest’ ...
* DONE (pbkrtest)
* installing *binary* package ‘car’ ...
* DONE (car)
The downloaded source packages are in
‘/tmp/Rtmp8WMpGM/downloaded_packages’
# read the CSV with headers
regression1<-read.csv("incidents (5).csv", header=T,sep =",")
#View(regression1)
summary(regression1)
area zone population incidents
Length:16 Length:16 Length:16 Min. : 103.0
Class :character Class :character Class :character 1st Qu.: 277.8
Mode :character Mode :character Mode :character Median : 654.0
Mean : 695.2
3rd Qu.: 853.0
Max. :2072.0
str(regression1)
'data.frame': 16 obs. of 4 variables:
$ area : chr "Boulder" "California-lexington" "Huntsville" "Seattle" ...
$ zone : chr "west" "east" "east" "west" ...
$ population: chr "107,353" "326,534" "444,752" "750,000" ...
$ incidents : int 605 103 161 1703 1003 527 721 704 105 403 ...
regression1$population <- as.numeric(gsub(",","",regression1$population))
regression1$population
[1] 107353 326534 444752 750000 64403 2744878 1600000 2333000 1572816
[10] 712091 6900000 2700000 4900000 4200000 5200000 7100000
str(regression1$population)
num [1:16] 107353 326534 444752 750000 64403 ...
regression2<-regression1[,-1]#new data frame with the deletion of column 1
head(regression2)
reg.fit1<-lm(regression1$incidents ~ regression1$population)
summary(reg.fit1)
Call:
lm(formula = regression1$incidents ~ regression1$population)
Residuals:
Min 1Q Median 3Q Max
-684.5 -363.5 -156.2 133.9 1164.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.749e+02 2.018e+02 2.353 0.0337 *
regression1$population 8.462e-05 5.804e-05 1.458 0.1669
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 534.9 on 14 degrees of freedom
Multiple R-squared: 0.1318, Adjusted R-squared: 0.0698
F-statistic: 2.126 on 1 and 14 DF, p-value: 0.1669
Based on the output obtained above, please answer the following question: Is Population significant at a 5% significance level? What is the adjusted-R squared of the model?
To determine if population is significant at the 5% significance level, we examine its p-value, which is 0.1669. Since this is greater than 0.05, we conclude that population is not statistically significant in predicting incidents. The adjusted R-squared of the model is 0.0698, indicating that only about 6.98% of the variability in incidents is explained by population, after accounting for the number of predictors. This suggests that the model does not provide a strong fit to the data. Additionally, the F-statistic of 2.126 with a p-value of 0.1669 further confirms that the overall model is not statistically significant.
reg.fit2<-lm(incidents ~ zone+population, data = regression1)
summary(reg.fit2)
Based on the output obtained above, please answer the following question: Are Population and/or Zone significant at a 5% significance level? What is the adjusted-R squared of the model?
At the 5% significance level, we determine significance by examining the p-values of the predictors: Zone is significant, as its p-value is 0.00243, which is less than 0.05. Population is not significant, as its p-value is 0.14300, which is greater than 0.05. The adjusted R-squared of the model is 0.5186, indicating that approximately 51.86% of the variability in incidents is explained by the model, accounting for the number of predictors. This suggests a moderate fit, which is a notable improvement from the previous model.
regression1$zone <- ifelse(regression1$zone == "west", 1, 0)#Please explain the syntax and the output
#View(regression1)
str(regression1)
'data.frame': 16 obs. of 4 variables:
$ area : chr "Boulder" "California-lexington" "Huntsville" "Seattle" ...
$ zone : num 1 0 0 1 1 0 1 1 0 0 ...
$ population: num 107353 326534 444752 750000 64403 ...
$ incidents : int 605 103 161 1703 1003 527 721 704 105 403 ...
#regression1$zone<-as.integer((regression1$zone),replace=TRUE) was not necessary
interaction<-regression1$zone*regression1$population#Explain the syntax
reg.fit3<-lm(regression1$incidents~interaction+regression1$population+regression1$zone)
summary(reg.fit3)
Call:
lm(formula = regression1$incidents ~ interaction + regression1$population +
regression1$zone)
Residuals:
Min 1Q Median 3Q Max
-540.91 -270.93 -59.56 187.99 767.99
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.659e+02 2.313e+02 0.717 0.4869
interaction 2.974e-06 9.469e-05 0.031 0.9755
regression1$population 6.352e-05 7.868e-05 0.807 0.4352
regression1$zone 7.192e+02 3.108e+02 2.314 0.0392 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 400.5 on 12 degrees of freedom
Multiple R-squared: 0.5829, Adjusted R-squared: 0.4786
F-statistic: 5.589 on 3 and 12 DF, p-value: 0.01237
Based on the output obtained above, please answer the following question: Is Population significant at a 5% significance level? Is Zone significant at a 5% significance level? Is the interaction term significant at a 5% significance level? What is the adjusted-R squared of the model?
At the 5% significance level: - Population is not significant, as its p-value is 0.4352, which is greater than 0.05. - Zone is significant, as its p-value is 0.0392, which is less than 0.05. - The interaction term is not significant, as its p-value is 0.9755, which is much greater than 0.05. The adjusted R-squared of the model is 0.4786, meaning that about 47.86% of the variability in incidents is explained by the model after accounting for the number of predictors. This suggests a moderate fit, although adding the interaction term does not improve significance.
Let us now run a model where the only feature is the interaction term.
reg.fit4<-lm(regression1$incidents~interaction)
summary(reg.fit4)
Call:
lm(formula = regression1$incidents ~ interaction)
Residuals:
Min 1Q Median 3Q Max
-650.28 -301.09 -83.71 123.23 1103.76
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.951e+02 1.320e+02 3.751 0.00215 **
interaction 1.389e-04 4.737e-05 2.932 0.01093 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 451.9 on 14 degrees of freedom
Multiple R-squared: 0.3804, Adjusted R-squared: 0.3361
F-statistic: 8.595 on 1 and 14 DF, p-value: 0.01093
Is the interaction term significant at a 5% significance level? What is the adjusted-R squared of the model?
At the 5% significance level, the interaction term is significant because its p-value is 0.01093 which is less than 0.05. The adjusted R-squared value of the model is 0.3361, meaning that about 33.61% of the variability in incidents is explained by the model after accounting for the number of predictors.
Which of the models run above would you choose to make predictions? Why??
Model 2 (reg.fit2) would be the best choice for making predictions. This model includes both zone and population as predictors, with zone being statistically significant at the 5% significance level. It also has the highest adjusted R-squared value (0.5186) among all models, indicating a better fit to the data. Furthermore, the overall model is statistically significant (p-value = 0.003404), meaning it provides a more reliable framework for predicting incidents compared to the other models, where key predictors were not significant.