Handout_Reg

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

regression1<-read.csv("incidents.csv",header=T,sep=",")
str(regression1)

'data.frame':   16 obs. of  4 variables:
 $ area      : chr  "Boulder" "California-lexington" "Huntsville" "Seattle" ...
 $ zone      : chr  "west" "east" "east" "west" ...
 $ population: chr  "107,353" "326,534" "444,752" "750,000" ...
 $ incidents : int  605 103 161 1703 1003 527 721 704 105 403 ...

summary(regression1)

     area               zone            population          incidents     
 Length:16          Length:16          Length:16          Min.   : 103.0  
 Class :character   Class :character   Class :character   1st Qu.: 277.8  
 Mode  :character   Mode  :character   Mode  :character   Median : 654.0  
                                                          Mean   : 695.2  
                                                          3rd Qu.: 853.0  
                                                          Max.   :2072.0

# make sure the packages for this chapter
# are installed, install if necessary
#pkg <- c("ggplot2", "scales", "maptools",
#             "sp", "maps", "grid", "car" )
#new.pkg <- pkg[!(pkg %in% installed.packages())]
#if (length(new.pkg)) {
#  install.packages(new.pkg)  
#}

regression1$population <- as.numeric(gsub(",","",regression1$population))
regression1$population

 [1]  107353  326534  444752  750000   64403 2744878 1600000 2333000 1572816  712091 6900000 2700000
[13] 4900000 4200000 5200000 7100000

regression2<-regression1[,-1]

head(regression2)

reg.fit1<-lm(regression1$incidents ~ regression1$population)

summary(reg.fit1)


Call:
lm(formula = regression1$incidents ~ regression1$population)

Residuals:
   Min     1Q Median     3Q    Max 
-684.5 -363.5 -156.2  133.9 1164.7 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)  
(Intercept)            4.749e+02  2.018e+02   2.353   0.0337 *
regression1$population 8.462e-05  5.804e-05   1.458   0.1669  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 534.9 on 14 degrees of freedom
Multiple R-squared:  0.1318,    Adjusted R-squared:  0.0698 
F-statistic: 2.126 on 1 and 14 DF,  p-value: 0.1669

Based on the output obtained above, please answer the following question:

Is Population significant at a 5% significance level? What is the adjusted-R squared of the model?

no because the P value is 0.1669, the adjusted-R squared of the model is 0.0698 so approximately 7% of the variation in incidents

reg.fit2<-lm(incidents~zone+population,data=regression2)
summary(reg.fit2)


Call:
lm(formula = incidents ~ zone + population, data = regression2)

Residuals:
    Min      1Q  Median      3Q     Max 
-537.21 -273.14  -57.89  188.17  766.03 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 1.612e+02  1.675e+02   0.962  0.35363   
zonewest    7.266e+02  1.938e+02   3.749  0.00243 **
population  6.557e-05  4.206e-05   1.559  0.14300   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 384.8 on 13 degrees of freedom
Multiple R-squared:  0.5828,    Adjusted R-squared:  0.5186 
F-statistic: 9.081 on 2 and 13 DF,  p-value: 0.003404

Based on the output obtained above, please answer the following question:

Are Population and/or Zone significant at a 5% significance level? What is the adjusted-R squared of the model?

Population - no because the P value is 0.14300, Zone - yes because the P value is 0.00243 The adjusted R squared of the model is 0.5186

regression2$zone<-ifelse(regression2$zone=="west",1,0)

interaction<-regression2$zone*regression2$population

reg.fit3<-lm(regression1$incidents~interaction+regression1$population+regression1$zone)
summary(reg.fit3)


Call:
lm(formula = regression1$incidents ~ interaction + regression1$population + 
    regression1$zone)

Residuals:
    Min      1Q  Median      3Q     Max 
-540.91 -270.93  -59.56  187.99  767.99 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)  
(Intercept)            1.659e+02  2.313e+02   0.717   0.4869  
interaction            2.974e-06  9.469e-05   0.031   0.9755  
regression1$population 6.352e-05  7.868e-05   0.807   0.4352  
regression1$zonewest   7.192e+02  3.108e+02   2.314   0.0392 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 400.5 on 12 degrees of freedom
Multiple R-squared:  0.5829,    Adjusted R-squared:  0.4786 
F-statistic: 5.589 on 3 and 12 DF,  p-value: 0.01237

Based on the output obtained above, please answer the following question:

Is Population significant at a 5% significance level? Is Zone significant at a 5% significance level? Is the interaction term significant at a 5% significance level? What is the adjusted-R squared of the model?

Population is not significant as the P value is 0.4352
Zone is not significant as the P value is 0.0392 Interaction term is not significant as the P value is 0.9755, the adjusted R squared of the model is 0.4786

Let us now run a model where the only feature is the interaction term.

Is the interaction term significant at a 5% significance level? What is the adjusted-R squared of the model?

Yes because the P value is 0.01093,the adjusted-R squared of the model is 0.3361

reg.fit4<-lm(regression2$incidents~interaction)
summary(reg.fit4)


Call:
lm(formula = regression2$incidents ~ interaction)

Residuals:
    Min      1Q  Median      3Q     Max 
-650.28 -301.09  -83.71  123.23 1103.76 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 4.951e+02  1.320e+02   3.751  0.00215 **
interaction 1.389e-04  4.737e-05   2.932  0.01093 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 451.9 on 14 degrees of freedom
Multiple R-squared:  0.3804,    Adjusted R-squared:  0.3361 
F-statistic: 8.595 on 1 and 14 DF,  p-value: 0.01093

Which of the models run above would you choose to make predictions? Why??

Model 2 because it has the highest adjusted-R squared and the Zone is statistically significant

LS0tCnRpdGxlOiAiSGFuZG91dF9SZWdfMTEiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KClRoaXMgaXMgYW4gW1IgTWFya2Rvd25dKGh0dHA6Ly9ybWFya2Rvd24ucnN0dWRpby5jb20pIE5vdGVib29rLiBXaGVuIHlvdSBleGVjdXRlIGNvZGUgd2l0aGluIHRoZSBub3RlYm9vaywgdGhlIHJlc3VsdHMgYXBwZWFyIGJlbmVhdGggdGhlIGNvZGUuIAoKVHJ5IGV4ZWN1dGluZyB0aGlzIGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqUnVuKiBidXR0b24gd2l0aGluIHRoZSBjaHVuayBvciBieSBwbGFjaW5nIHlvdXIgY3Vyc29yIGluc2lkZSBpdCBhbmQgcHJlc3NpbmcgKkN0cmwrU2hpZnQrRW50ZXIqLiAKCkFkZCBhIG5ldyBjaHVuayBieSBjbGlja2luZyB0aGUgKkluc2VydCBDaHVuayogYnV0dG9uIG9uIHRoZSB0b29sYmFyIG9yIGJ5IHByZXNzaW5nICpDdHJsK0FsdCtJKi4KCldoZW4geW91IHNhdmUgdGhlIG5vdGVib29rLCBhbiBIVE1MIGZpbGUgY29udGFpbmluZyB0aGUgY29kZSBhbmQgb3V0cHV0IHdpbGwgYmUgc2F2ZWQgYWxvbmdzaWRlIGl0IChjbGljayB0aGUgKlByZXZpZXcqIGJ1dHRvbiBvciBwcmVzcyAqQ3RybCtTaGlmdCtLKiB0byBwcmV2aWV3IHRoZSBIVE1MIGZpbGUpLgoKVGhlIHByZXZpZXcgc2hvd3MgeW91IGEgcmVuZGVyZWQgSFRNTCBjb3B5IG9mIHRoZSBjb250ZW50cyBvZiB0aGUgZWRpdG9yLiBDb25zZXF1ZW50bHksIHVubGlrZSAqS25pdCosICpQcmV2aWV3KiBkb2VzIG5vdCBydW4gYW55IFIgY29kZSBjaHVua3MuIEluc3RlYWQsIHRoZSBvdXRwdXQgb2YgdGhlIGNodW5rIHdoZW4gaXQgd2FzIGxhc3QgcnVuIGluIHRoZSBlZGl0b3IgaXMgZGlzcGxheWVkLgoKYGBge3J9CnJlZ3Jlc3Npb24xPC1yZWFkLmNzdigiaW5jaWRlbnRzLmNzdiIsaGVhZGVyPVQsc2VwPSIsIikKc3RyKHJlZ3Jlc3Npb24xKQpzdW1tYXJ5KHJlZ3Jlc3Npb24xKQpgYGAKCgpgYGB7cn0KIyBtYWtlIHN1cmUgdGhlIHBhY2thZ2VzIGZvciB0aGlzIGNoYXB0ZXIKIyBhcmUgaW5zdGFsbGVkLCBpbnN0YWxsIGlmIG5lY2Vzc2FyeQojcGtnIDwtIGMoImdncGxvdDIiLCAic2NhbGVzIiwgIm1hcHRvb2xzIiwKIyAgICAgICAgICAgICAic3AiLCAibWFwcyIsICJncmlkIiwgImNhciIgKQojbmV3LnBrZyA8LSBwa2dbIShwa2cgJWluJSBpbnN0YWxsZWQucGFja2FnZXMoKSldCiNpZiAobGVuZ3RoKG5ldy5wa2cpKSB7CiMgIGluc3RhbGwucGFja2FnZXMobmV3LnBrZykgIAojfQpgYGAKCgpgYGB7cn0KcmVncmVzc2lvbjEkcG9wdWxhdGlvbiA8LSBhcy5udW1lcmljKGdzdWIoIiwiLCIiLHJlZ3Jlc3Npb24xJHBvcHVsYXRpb24pKQpyZWdyZXNzaW9uMSRwb3B1bGF0aW9uCmBgYAoKCmBgYHtyfQpyZWdyZXNzaW9uMjwtcmVncmVzc2lvbjFbLC0xXQpgYGAKCgpgYGB7cn0KaGVhZChyZWdyZXNzaW9uMikKYGBgCgoKYGBge3J9CnJlZy5maXQxPC1sbShyZWdyZXNzaW9uMSRpbmNpZGVudHMgfiByZWdyZXNzaW9uMSRwb3B1bGF0aW9uKQpgYGAKCgpgYGB7cn0Kc3VtbWFyeShyZWcuZml0MSkKYGBgCgoKQmFzZWQgb24gdGhlIG91dHB1dCBvYnRhaW5lZCBhYm92ZSwgcGxlYXNlIGFuc3dlciB0aGUgZm9sbG93aW5nIHF1ZXN0aW9uOgoKSXMgUG9wdWxhdGlvbiBzaWduaWZpY2FudCBhdCBhIDUlIHNpZ25pZmljYW5jZSBsZXZlbD8gV2hhdCBpcyB0aGUgYWRqdXN0ZWQtUiBzcXVhcmVkIG9mIHRoZSBtb2RlbD8KCgpubyBiZWNhdXNlIHRoZSBQIHZhbHVlIGlzIDAuMTY2OSwgdGhlIGFkanVzdGVkLVIgc3F1YXJlZCBvZiB0aGUgbW9kZWwgaXMgMC4wNjk4IHNvIGFwcHJveGltYXRlbHkgNyUgb2YgdGhlIHZhcmlhdGlvbiBpbiBpbmNpZGVudHMKCgpgYGB7cn0KcmVnLmZpdDI8LWxtKGluY2lkZW50c356b25lK3BvcHVsYXRpb24sZGF0YT1yZWdyZXNzaW9uMikKc3VtbWFyeShyZWcuZml0MikKYGBgCgpCYXNlZCBvbiB0aGUgb3V0cHV0IG9idGFpbmVkIGFib3ZlLCBwbGVhc2UgYW5zd2VyIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb246CgpBcmUgUG9wdWxhdGlvbiBhbmQvb3IgWm9uZSAgc2lnbmlmaWNhbnQgYXQgYSA1JSBzaWduaWZpY2FuY2UgbGV2ZWw/IFdoYXQgaXMgdGhlIGFkanVzdGVkLVIgc3F1YXJlZCBvZiB0aGUgbW9kZWw/CgpQb3B1bGF0aW9uIC0gbm8gYmVjYXVzZSB0aGUgUCB2YWx1ZSBpcyAwLjE0MzAwLCBab25lIC0geWVzIGJlY2F1c2UgdGhlIFAgdmFsdWUgaXMgMC4wMDI0MwpUaGUgYWRqdXN0ZWQgUiBzcXVhcmVkIG9mIHRoZSBtb2RlbCBpcyAwLjUxODYgCgoKCmBgYHtyfQpyZWdyZXNzaW9uMiR6b25lPC1pZmVsc2UocmVncmVzc2lvbjIkem9uZT09Indlc3QiLDEsMCkKI2lmIHpvbmUgZXF1YWxzIHdlc3QgdGhlbiAxLCBlbHNlIDAKYGBgCgoKYGBge3J9CmludGVyYWN0aW9uPC1yZWdyZXNzaW9uMiR6b25lKnJlZ3Jlc3Npb24yJHBvcHVsYXRpb24KI2RlcGVuZGluZyBvbiB3aGV0aGVyIHRoZSB6b25lIGlzIHdlc3Qgb3Igbm90IHRoZSBlZmZlY3Qgb2YgcG9wdWxhdGlvbiBvbiBpbmNpZGVudHMgaXMgYWxsb3dlZApgYGAKCgpgYGB7cn0KcmVnLmZpdDM8LWxtKHJlZ3Jlc3Npb24xJGluY2lkZW50c35pbnRlcmFjdGlvbityZWdyZXNzaW9uMSRwb3B1bGF0aW9uK3JlZ3Jlc3Npb24xJHpvbmUpCnN1bW1hcnkocmVnLmZpdDMpCmBgYAoKCkJhc2VkIG9uIHRoZSBvdXRwdXQgb2J0YWluZWQgYWJvdmUsIHBsZWFzZSBhbnN3ZXIgdGhlIGZvbGxvd2luZyBxdWVzdGlvbjoKCklzIFBvcHVsYXRpb24gc2lnbmlmaWNhbnQgYXQgYSA1JSBzaWduaWZpY2FuY2UgbGV2ZWw/IApJcyBab25lIHNpZ25pZmljYW50IGF0IGEgNSUgc2lnbmlmaWNhbmNlIGxldmVsPwpJcyB0aGUgaW50ZXJhY3Rpb24gdGVybSBzaWduaWZpY2FudCBhdCBhIDUlIHNpZ25pZmljYW5jZSBsZXZlbD8gV2hhdCBpcyB0aGUgYWRqdXN0ZWQtUiBzcXVhcmVkIG9mIHRoZSBtb2RlbD8KCgpQb3B1bGF0aW9uIGlzIG5vdCBzaWduaWZpY2FudCBhcyB0aGUgUCB2YWx1ZSBpcyAwLjQzNTIgIApab25lIGlzIG5vdCBzaWduaWZpY2FudCBhcyB0aGUgUCB2YWx1ZSBpcyAwLjAzOTIgCkludGVyYWN0aW9uIHRlcm0gaXMgbm90IHNpZ25pZmljYW50IGFzIHRoZSBQIHZhbHVlIGlzIDAuOTc1NSwgdGhlIGFkanVzdGVkIFIgc3F1YXJlZCBvZiB0aGUgbW9kZWwgaXMgMC40Nzg2IAoKCkxldCB1cyBub3cgcnVuIGEgbW9kZWwgd2hlcmUgdGhlIG9ubHkgZmVhdHVyZSBpcyB0aGUgaW50ZXJhY3Rpb24gdGVybS4gCgpJcyB0aGUgaW50ZXJhY3Rpb24gdGVybSBzaWduaWZpY2FudCBhdCBhIDUlIHNpZ25pZmljYW5jZSBsZXZlbD8gV2hhdCBpcyB0aGUgYWRqdXN0ZWQtUiBzcXVhcmVkIG9mIHRoZSBtb2RlbD8KClllcyBiZWNhdXNlIHRoZSBQIHZhbHVlIGlzIDAuMDEwOTMsdGhlIGFkanVzdGVkLVIgc3F1YXJlZCBvZiB0aGUgbW9kZWwgaXMgMC4zMzYxIAoKCmBgYHtyfQpyZWcuZml0NDwtbG0ocmVncmVzc2lvbjIkaW5jaWRlbnRzfmludGVyYWN0aW9uKQpzdW1tYXJ5KHJlZy5maXQ0KQpgYGAKCldoaWNoIG9mIHRoZSBtb2RlbHMgcnVuIGFib3ZlIHdvdWxkIHlvdSBjaG9vc2UgdG8gbWFrZSBwcmVkaWN0aW9ucz8gV2h5Pz8KCk1vZGVsIDIgYmVjYXVzZSBpdCBoYXMgdGhlIGhpZ2hlc3QgYWRqdXN0ZWQtUiBzcXVhcmVkIGFuZCB0aGUgWm9uZSBpcyBzdGF0aXN0aWNhbGx5IHNpZ25pZmljYW50Cgo=

Handout_Reg_11