title: “Quiz on Correlation and Regression”

author: “Nicolas Flammia”

output:

html_document:

toc: true


Q1. Describe the relationship between the two variables - education level (bachelors) and income (per_capita_income).

Answer: According to the scatterplot, it show that the relationship has a positive direction since its upward sloping. That means that theres a relationship between the two variables, the magnitude of the relationship is determined by the absolute value of the correlation coefficient, in this case it is greater than 0.6 meaning that there is a strong relationship between income and educatinal levels.


# Load the package

library(openintro)

library(ggplot2)

str(countyComplete)
## 'data.frame':    3143 obs. of  53 variables:
##  $ state                                    : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ name                                     : Factor w/ 1877 levels "Abbeville County",..: 83 90 101 151 166 227 237 250 298 320 ...
##  $ FIPS                                     : num  1001 1003 1005 1007 1009 ...
##  $ pop2010                                  : num  54571 182265 27457 22915 57322 ...
##  $ pop2000                                  : num  43671 140415 29038 20826 51024 ...
##  $ age_under_5                              : num  6.6 6.1 6.2 6 6.3 6.8 6.5 6.1 5.7 5.3 ...
##  $ age_under_18                             : num  26.8 23 21.9 22.7 24.6 22.3 24.1 22.9 22.5 21.4 ...
##  $ age_over_65                              : num  12 16.8 14.2 12.7 14.7 13.5 16.7 14.3 16.7 17.9 ...
##  $ female                                   : num  51.3 51.1 46.9 46.3 50.5 45.8 53 51.8 52.2 50.4 ...
##  $ white                                    : num  78.5 85.7 48 75.8 92.6 23 54.4 74.9 58.8 92.7 ...
##  $ black                                    : num  17.7 9.4 46.9 22 1.3 70.2 43.4 20.6 38.7 4.6 ...
##  $ native                                   : num  0.4 0.7 0.4 0.3 0.5 0.2 0.3 0.5 0.2 0.5 ...
##  $ asian                                    : num  0.9 0.7 0.4 0.1 0.2 0.2 0.8 0.7 0.5 0.2 ...
##  $ pac_isl                                  : num  NA NA NA NA NA NA 0 0.1 0 0 ...
##  $ two_plus_races                           : num  1.6 1.5 0.9 0.9 1.2 0.8 0.8 1.7 1.1 1.5 ...
##  $ hispanic                                 : num  2.4 4.4 5.1 1.8 8.1 7.1 0.9 3.3 1.6 1.2 ...
##  $ white_not_hispanic                       : num  77.2 83.5 46.8 75 88.9 21.9 54.1 73.6 58.1 92.1 ...
##  $ no_move_in_one_plus_year                 : num  86.3 83 83 90.5 87.2 88.5 92.8 82.9 86.2 88.1 ...
##  $ foreign_born                             : num  2 3.6 2.8 0.7 4.7 1.1 1.1 2.5 0.9 0.5 ...
##  $ foreign_spoken_at_home                   : num  3.7 5.5 4.7 1.5 7.2 3.8 1.6 4.5 1.6 1.4 ...
##  $ hs_grad                                  : num  85.3 87.6 71.9 74.5 74.7 74.7 74.8 78.5 71.8 73.4 ...
##  $ bachelors                                : num  21.7 26.8 13.5 10 12.5 12 11 16.1 10.8 10.5 ...
##  $ veterans                                 : num  5817 20396 2327 1883 4072 ...
##  $ mean_work_travel                         : num  25.1 25.8 23.8 28.3 33.2 28.1 25.1 22.1 23.6 26.2 ...
##  $ housing_units                            : num  22135 104061 11829 8981 23887 ...
##  $ home_ownership                           : num  77.5 76.7 68 82.9 82 76.9 69 70.7 71.4 77.5 ...
##  $ housing_multi_unit                       : num  7.2 22.6 11.1 6.6 3.7 9.9 13.7 14.3 8.7 4.3 ...
##  $ median_val_owner_occupied                : num  133900 177200 88200 81200 113700 ...
##  $ households                               : num  19718 69476 9795 7441 20605 ...
##  $ persons_per_household                    : num  2.7 2.5 2.52 3.02 2.73 2.85 2.58 2.46 2.51 2.22 ...
##  $ per_capita_income                        : num  24568 26469 15875 19918 21070 ...
##  $ median_household_income                  : num  53255 50147 33219 41770 45549 ...
##  $ poverty                                  : num  10.6 12.2 25 12.6 13.4 25.3 25 19.5 20.3 17.6 ...
##  $ private_nonfarm_establishments           : num  877 4812 522 318 749 ...
##  $ private_nonfarm_employment               : num  10628 52233 7990 2927 6968 ...
##  $ percent_change_private_nonfarm_employment: num  16.6 17.4 -27 -14 -11.4 -18.5 2.1 -5.6 -45.8 5.4 ...
##  $ nonemployment_establishments             : num  2971 14175 1527 1192 3501 ...
##  $ firms                                    : num  4067 19035 1667 1385 4458 ...
##  $ black_owned_firms                        : num  15.2 2.7 NA 14.9 NA NA NA 7.2 NA NA ...
##  $ native_owned_firms                       : num  NA 0.4 NA NA NA NA NA NA NA NA ...
##  $ asian_owned_firms                        : num  1.3 1 NA NA NA NA 3.3 1.6 NA NA ...
##  $ pac_isl_owned_firms                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hispanic_owned_firms                     : num  0.7 1.3 NA NA NA NA NA 0.5 NA NA ...
##  $ women_owned_firms                        : num  31.7 27.3 27 NA 23.2 38.8 NA 24.7 29.3 14.5 ...
##  $ manufacturer_shipments_2007              : num  NA 1410273 NA 0 341544 ...
##  $ mercent_whole_sales_2007                 : num  NA NA NA NA NA ...
##  $ sales                                    : num  598175 2966489 188337 124707 319700 ...
##  $ sales_per_capita                         : num  12003 17166 6334 5804 5622 ...
##  $ accommodation_food_service               : num  88157 436955 NA 10757 20941 ...
##  $ building_permits                         : num  191 696 10 8 18 1 3 107 10 6 ...
##  $ fed_spending                             : num  331142 1119082 240308 163201 294114 ...
##  $ area                                     : num  594 1590 885 623 645 ...
##  $ density                                  : num  91.8 114.6 31 36.8 88.9 ...

ggplot(data = countyComplete, aes(x = bachelors, y = per_capita_income)) + #cars dataset is from openintro rpackage

geom_point()


cor(countyComplete$bachelors, countyComplete$per_capita_income, use = "pairwise.complete.obs")
## [1] 0.7924464

Run a regression model for income (per_capita_income) with one explanatory variable, educational level (bachelors), and answer Q2 through Q5.


summary(countyComplete)
##       state                     name           FIPS      
##  Texas   : 254   Washington County:  30   Min.   : 1001  
##  Georgia : 159   Jefferson County :  25   1st Qu.:18178  
##  Virginia: 134   Franklin County  :  24   Median :29177  
##  Kentucky: 120   Jackson County   :  23   Mean   :30390  
##  Missouri: 115   Lincoln County   :  23   3rd Qu.:45082  
##  Kansas  : 105   Madison County   :  19   Max.   :56045  
##  (Other) :2256   (Other)          :2999                  
##     pop2010           pop2000         age_under_5      age_under_18  
##  Min.   :     82   Min.   :     67   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.:  11104   1st Qu.:  11210   1st Qu.: 5.500   1st Qu.:21.40  
##  Median :  25857   Median :  24608   Median : 6.200   Median :23.30  
##  Mean   :  98233   Mean   :  89623   Mean   : 6.261   Mean   :23.42  
##  3rd Qu.:  66699   3rd Qu.:  61766   3rd Qu.: 6.800   3rd Qu.:25.10  
##  Max.   :9818605   Max.   :9519338   Max.   :12.600   Max.   :41.60  
##                    NA's   :3                                         
##   age_over_65        female          white           black       
##  Min.   : 3.50   Min.   :27.90   Min.   : 2.70   Min.   : 0.000  
##  1st Qu.:13.10   1st Qu.:49.60   1st Qu.:75.25   1st Qu.: 0.500  
##  Median :15.60   Median :50.50   Median :89.10   Median : 2.000  
##  Mean   :15.88   Mean   :50.03   Mean   :82.89   Mean   : 8.931  
##  3rd Qu.:18.20   3rd Qu.:51.10   3rd Qu.:95.50   3rd Qu.:10.200  
##  Max.   :43.40   Max.   :56.80   Max.   :99.20   Max.   :85.700  
##                                                  NA's   :17      
##      native           asian           pac_isl       two_plus_races  
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.100  
##  1st Qu.: 0.200   1st Qu.: 0.300   1st Qu.: 0.000   1st Qu.: 1.100  
##  Median : 0.400   Median : 0.500   Median : 0.000   Median : 1.600  
##  Mean   : 2.026   Mean   : 1.166   Mean   : 0.154   Mean   : 1.976  
##  3rd Qu.: 0.800   3rd Qu.: 1.000   3rd Qu.: 0.100   3rd Qu.: 2.300  
##  Max.   :96.000   Max.   :43.900   Max.   :48.900   Max.   :29.500  
##  NA's   :4        NA's   :24       NA's   :1697                     
##     hispanic      white_not_hispanic no_move_in_one_plus_year
##  Min.   : 0.000   Min.   : 2.70      Min.   : 51.6           
##  1st Qu.: 1.600   1st Qu.:66.95      1st Qu.: 83.2           
##  Median : 3.300   Median :85.80      Median : 86.3           
##  Mean   : 8.284   Mean   :78.29      Mean   : 85.8           
##  3rd Qu.: 8.200   3rd Qu.:94.20      3rd Qu.: 89.0           
##  Max.   :95.700   Max.   :99.20      Max.   :100.0           
##                                                              
##   foreign_born    foreign_spoken_at_home    hs_grad        bachelors    
##  Min.   : 0.000   Min.   : 0.000         Min.   :47.90   Min.   : 3.70  
##  1st Qu.: 1.200   1st Qu.: 2.800         1st Qu.:78.40   1st Qu.:13.10  
##  Median : 2.400   Median : 4.800         Median :84.60   Median :16.90  
##  Mean   : 4.372   Mean   : 9.057         Mean   :83.11   Mean   :19.03  
##  3rd Qu.: 5.300   3rd Qu.:10.000         3rd Qu.:88.60   3rd Qu.:22.60  
##  Max.   :72.200   Max.   :96.000         Max.   :99.30   Max.   :71.00  
##                                                                         
##     veterans      mean_work_travel housing_units     home_ownership 
##  Min.   :     0   Min.   : 4.30    Min.   :     50   Min.   : 0.00  
##  1st Qu.:   958   1st Qu.:19.00    1st Qu.:   5416   1st Qu.:69.50  
##  Median :  2180   Median :22.40    Median :  12162   Median :74.60  
##  Mean   :  7207   Mean   :22.73    Mean   :  41904   Mean   :73.26  
##  3rd Qu.:  5944   3rd Qu.:26.10    3rd Qu.:  30574   3rd Qu.:78.40  
##  Max.   :368128   Max.   :44.20    Max.   :3445076   Max.   :91.30  
##                                                                     
##  housing_multi_unit median_val_owner_occupied   households     
##  Min.   : 0.00      Min.   :      0           Min.   :     22  
##  1st Qu.: 6.10      1st Qu.:  80200           1st Qu.:   4260  
##  Median : 9.70      Median : 105900           Median :   9868  
##  Mean   :12.33      Mean   : 132545           Mean   :  36346  
##  3rd Qu.:15.90      3rd Qu.: 152950           3rd Qu.:  25358  
##  Max.   :98.50      Max.   :1000001           Max.   :3217889  
##                                                                
##  persons_per_household per_capita_income median_household_income
##  Min.   :1.100         Min.   : 7772     Min.   : 19351         
##  1st Qu.:2.370         1st Qu.:19030     1st Qu.: 36952         
##  Median :2.490         Median :21773     Median : 42445         
##  Mean   :2.513         Mean   :22505     Mean   : 44270         
##  3rd Qu.:2.630         3rd Qu.:24814     3rd Qu.: 49142         
##  Max.   :4.470         Max.   :64381     Max.   :115574         
##                                                                 
##     poverty     private_nonfarm_establishments private_nonfarm_employment
##  Min.   : 0.0   Min.   :     0                 Min.   :      0           
##  1st Qu.:11.0   1st Qu.:   229                 1st Qu.:   2109           
##  Median :14.7   Median :   551                 Median :   6351           
##  Mean   :15.5   Mean   :  2362                 Mean   :  35656           
##  3rd Qu.:19.0   3rd Qu.:  1484                 3rd Qu.:  19436           
##  Max.   :53.5   Max.   :245523                 Max.   :3703233           
##                                                                          
##  percent_change_private_nonfarm_employment nonemployment_establishments
##  Min.   :-83.2000                          Min.   :    21              
##  1st Qu.:-12.0000                          1st Qu.:   729              
##  Median : -2.0000                          Median :  1594              
##  Mean   :  0.5338                          Mean   :  6720              
##  3rd Qu.:  9.8000                          3rd Qu.:  4130              
##  Max.   :386.5000                          Max.   :821177              
##  NA's   :67                                NA's   :5                   
##      firms         black_owned_firms native_owned_firms asian_owned_firms
##  Min.   :     27   Min.   : 0.200    Min.   : 0.200     Min.   : 0.300   
##  1st Qu.:   1074   1st Qu.: 2.100    1st Qu.: 0.525     1st Qu.: 1.400   
##  Median :   2350   Median : 5.700    Median : 0.900     Median : 2.200   
##  Mean   :   9301   Mean   : 9.806    Mean   : 3.785     Mean   : 3.422   
##  3rd Qu.:   6034   3rd Qu.:13.350    3rd Qu.: 2.300     3rd Qu.: 3.700   
##  Max.   :1046940   Max.   :66.700    Max.   :71.800     Max.   :56.600   
##  NA's   :176       NA's   :2376      NA's   :2653       NA's   :2408     
##  pac_isl_owned_firms hispanic_owned_firms women_owned_firms
##  Min.   : 0.0000     Min.   : 0.300       Min.   : 6.50    
##  1st Qu.: 0.1000     1st Qu.: 1.400       1st Qu.:22.70    
##  Median : 0.1000     Median : 2.800       Median :26.20    
##  Mean   : 0.7171     Mean   : 6.811       Mean   :25.96    
##  3rd Qu.: 0.3000     3rd Qu.: 6.800       3rd Qu.:29.20    
##  Max.   :10.5000     Max.   :78.000       Max.   :56.20    
##  NA's   :3073        NA's   :2363         NA's   :970      
##  manufacturer_shipments_2007 mercent_whole_sales_2007     sales          
##  Min.   :        0           Min.   :        0        Min.   :        0  
##  1st Qu.:        0           1st Qu.:    42125        1st Qu.:    79988  
##  Median :   238180           Median :   138930        Median :   257667  
##  Mean   :  1680613           Mean   :  1794262        Mean   :  1262270  
##  3rd Qu.:  1161878           3rd Qu.:   562056        3rd Qu.:   791435  
##  Max.   :169275136           Max.   :205478751        Max.   :119111840  
##  NA's   :488                 NA's   :1022             NA's   :42         
##  sales_per_capita accommodation_food_service building_permits
##  Min.   :    0    Min.   :       0           Min.   :    0   
##  1st Qu.: 6993    1st Qu.:    9349           1st Qu.:    5   
##  Median : 9793    Median :   31065           Median :   32   
##  Mean   :10375    Mean   :  211181           Mean   :  192   
##  3rd Qu.:12980    3rd Qu.:  110695           3rd Qu.:  123   
##  Max.   :80800    Max.   :24857836           Max.   :15039   
##  NA's   :42       NA's   :272                                
##   fed_spending           area             density       
##  Min.   :       0   Min.   :     2.0   Min.   :    0.0  
##  1st Qu.:  102922   1st Qu.:   430.7   1st Qu.:   16.9  
##  Median :  214994   Median :   615.6   Median :   45.2  
##  Mean   :  944376   Mean   :  1123.7   Mean   :  259.3  
##  3rd Qu.:  522228   3rd Qu.:   924.0   3rd Qu.:  113.8  
##  Max.   :80457156   Max.   :145504.8   Max.   :69467.5  
##  NA's   :4

library(ggplot2)

str(countyComplete)
## 'data.frame':    3143 obs. of  53 variables:
##  $ state                                    : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ name                                     : Factor w/ 1877 levels "Abbeville County",..: 83 90 101 151 166 227 237 250 298 320 ...
##  $ FIPS                                     : num  1001 1003 1005 1007 1009 ...
##  $ pop2010                                  : num  54571 182265 27457 22915 57322 ...
##  $ pop2000                                  : num  43671 140415 29038 20826 51024 ...
##  $ age_under_5                              : num  6.6 6.1 6.2 6 6.3 6.8 6.5 6.1 5.7 5.3 ...
##  $ age_under_18                             : num  26.8 23 21.9 22.7 24.6 22.3 24.1 22.9 22.5 21.4 ...
##  $ age_over_65                              : num  12 16.8 14.2 12.7 14.7 13.5 16.7 14.3 16.7 17.9 ...
##  $ female                                   : num  51.3 51.1 46.9 46.3 50.5 45.8 53 51.8 52.2 50.4 ...
##  $ white                                    : num  78.5 85.7 48 75.8 92.6 23 54.4 74.9 58.8 92.7 ...
##  $ black                                    : num  17.7 9.4 46.9 22 1.3 70.2 43.4 20.6 38.7 4.6 ...
##  $ native                                   : num  0.4 0.7 0.4 0.3 0.5 0.2 0.3 0.5 0.2 0.5 ...
##  $ asian                                    : num  0.9 0.7 0.4 0.1 0.2 0.2 0.8 0.7 0.5 0.2 ...
##  $ pac_isl                                  : num  NA NA NA NA NA NA 0 0.1 0 0 ...
##  $ two_plus_races                           : num  1.6 1.5 0.9 0.9 1.2 0.8 0.8 1.7 1.1 1.5 ...
##  $ hispanic                                 : num  2.4 4.4 5.1 1.8 8.1 7.1 0.9 3.3 1.6 1.2 ...
##  $ white_not_hispanic                       : num  77.2 83.5 46.8 75 88.9 21.9 54.1 73.6 58.1 92.1 ...
##  $ no_move_in_one_plus_year                 : num  86.3 83 83 90.5 87.2 88.5 92.8 82.9 86.2 88.1 ...
##  $ foreign_born                             : num  2 3.6 2.8 0.7 4.7 1.1 1.1 2.5 0.9 0.5 ...
##  $ foreign_spoken_at_home                   : num  3.7 5.5 4.7 1.5 7.2 3.8 1.6 4.5 1.6 1.4 ...
##  $ hs_grad                                  : num  85.3 87.6 71.9 74.5 74.7 74.7 74.8 78.5 71.8 73.4 ...
##  $ bachelors                                : num  21.7 26.8 13.5 10 12.5 12 11 16.1 10.8 10.5 ...
##  $ veterans                                 : num  5817 20396 2327 1883 4072 ...
##  $ mean_work_travel                         : num  25.1 25.8 23.8 28.3 33.2 28.1 25.1 22.1 23.6 26.2 ...
##  $ housing_units                            : num  22135 104061 11829 8981 23887 ...
##  $ home_ownership                           : num  77.5 76.7 68 82.9 82 76.9 69 70.7 71.4 77.5 ...
##  $ housing_multi_unit                       : num  7.2 22.6 11.1 6.6 3.7 9.9 13.7 14.3 8.7 4.3 ...
##  $ median_val_owner_occupied                : num  133900 177200 88200 81200 113700 ...
##  $ households                               : num  19718 69476 9795 7441 20605 ...
##  $ persons_per_household                    : num  2.7 2.5 2.52 3.02 2.73 2.85 2.58 2.46 2.51 2.22 ...
##  $ per_capita_income                        : num  24568 26469 15875 19918 21070 ...
##  $ median_household_income                  : num  53255 50147 33219 41770 45549 ...
##  $ poverty                                  : num  10.6 12.2 25 12.6 13.4 25.3 25 19.5 20.3 17.6 ...
##  $ private_nonfarm_establishments           : num  877 4812 522 318 749 ...
##  $ private_nonfarm_employment               : num  10628 52233 7990 2927 6968 ...
##  $ percent_change_private_nonfarm_employment: num  16.6 17.4 -27 -14 -11.4 -18.5 2.1 -5.6 -45.8 5.4 ...
##  $ nonemployment_establishments             : num  2971 14175 1527 1192 3501 ...
##  $ firms                                    : num  4067 19035 1667 1385 4458 ...
##  $ black_owned_firms                        : num  15.2 2.7 NA 14.9 NA NA NA 7.2 NA NA ...
##  $ native_owned_firms                       : num  NA 0.4 NA NA NA NA NA NA NA NA ...
##  $ asian_owned_firms                        : num  1.3 1 NA NA NA NA 3.3 1.6 NA NA ...
##  $ pac_isl_owned_firms                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hispanic_owned_firms                     : num  0.7 1.3 NA NA NA NA NA 0.5 NA NA ...
##  $ women_owned_firms                        : num  31.7 27.3 27 NA 23.2 38.8 NA 24.7 29.3 14.5 ...
##  $ manufacturer_shipments_2007              : num  NA 1410273 NA 0 341544 ...
##  $ mercent_whole_sales_2007                 : num  NA NA NA NA NA ...
##  $ sales                                    : num  598175 2966489 188337 124707 319700 ...
##  $ sales_per_capita                         : num  12003 17166 6334 5804 5622 ...
##  $ accommodation_food_service               : num  88157 436955 NA 10757 20941 ...
##  $ building_permits                         : num  191 696 10 8 18 1 3 107 10 6 ...
##  $ fed_spending                             : num  331142 1119082 240308 163201 294114 ...
##  $ area                                     : num  594 1590 885 623 645 ...
##  $ density                                  : num  91.8 114.6 31 36.8 88.9 ...

ggplot(data = countyComplete, aes(x = bachelors, y = per_capita_income)) +

geom_point() +

geom_smooth(method = "lm", se = FALSE)


lm(per_capita_income ~ bachelors, data = countyComplete)
## 
## Call:
## lm(formula = per_capita_income ~ bachelors, data = countyComplete)
## 
## Coefficients:
## (Intercept)    bachelors  
##     13087.7        494.8

lm(log(per_capita_income) ~ log(bachelors), data = countyComplete)
## 
## Call:
## lm(formula = log(per_capita_income) ~ log(bachelors), data = countyComplete)
## 
## Coefficients:
##    (Intercept)  log(bachelors)  
##         8.8050          0.4167

mod <- lm(per_capita_income ~ bachelors, data = countyComplete)

coef(mod)
## (Intercept)   bachelors 
##  13087.6795    494.7534

summary(mod)
## 
## Call:
## lm(formula = per_capita_income ~ bachelors, data = countyComplete)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18032.7  -1708.2     73.8   1748.0  21756.5 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13087.680    142.091   92.11   <2e-16 ***
## bachelors     494.753      6.795   72.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3299 on 3141 degrees of freedom
## Multiple R-squared:  0.628,  Adjusted R-squared:  0.6279 
## F-statistic:  5302 on 1 and 3141 DF,  p-value: < 2.2e-16

Q2. See model 1. Is the coefficient of bachelors statistically significant at 5%? Interpret the coefficient.

Answer: No, it is not significant. The coefficient of 494.7 shows that for every 494 bachelors there is 13087 non-bachelors. The value of Pr show that the coefficient is not statistically significant at 5% because Pr<0.001

Q4. See model 1. What is the reported residual standard error? What does it mean?

Answer: 3299 on 3141 degrees of freedom means that 3141 of all the observations are free to vary.

Q5. See model 1. What is the reported adjusted R squared? What does it mean?

Answer: The adjusted R squared of 0.62 tells us that a 62.7% of the variations in the per_capita_income variable are explained by the regression model above.

Run a second regression model for income (per_capita_income) with two explanatory variables: education (bachelors) and density (density), and answer Q6.

Q6. Compare model 1 and model 2. Which of the two models better fits the data? Discuss your answer by comparing the residual standard error and the adjusted R squared between the two models.

Answer: The 2nd model explains the data set better its adjusted R squared of 0.63 is higher than 0.62 is not much greater than model 1 but adding density to the formula doesnt provide more accurate information, since the residual standard error for model 2 is 3140 is not much different from the 3131 from the 1st model.


# Create a linear model 1

mod_1 <- lm(per_capita_income ~ bachelors, data = countyComplete)

# View summary of model 1

summary(mod_1)
## 
## Call:
## lm(formula = per_capita_income ~ bachelors, data = countyComplete)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18032.7  -1708.2     73.8   1748.0  21756.5 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13087.680    142.091   92.11   <2e-16 ***
## bachelors     494.753      6.795   72.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3299 on 3141 degrees of freedom
## Multiple R-squared:  0.628,  Adjusted R-squared:  0.6279 
## F-statistic:  5302 on 1 and 3141 DF,  p-value: < 2.2e-16

# Create a linear model 2

mod_2 <- lm(per_capita_income ~ bachelors + density, data = countyComplete)

# View summary of model 2

summary(mod_2)
## 
## Call:
## lm(formula = per_capita_income ~ bachelors + density, data = countyComplete)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18257.9  -1707.7     71.5   1749.6  22101.1 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.319e+04  1.433e+02  92.042  < 2e-16 ***
## bachelors   4.872e+02  6.963e+00  69.973  < 2e-16 ***
## density     1.623e-01  3.499e-02   4.639 3.65e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3289 on 3140 degrees of freedom
## Multiple R-squared:  0.6305, Adjusted R-squared:  0.6303 
## F-statistic:  2679 on 2 and 3140 DF,  p-value: < 2.2e-16