Dependencies.

First, I’ll load the state dataset. One of the datasets it contains is a matrix of 5 rows with 8 columns: population, income, illiteracy, life expectancy, high graduate ration, mean number of days below freezing, and area in square miles.

library(tidyr)
library(ggplot2)
data("state")
df <- as.data.frame(state.x77)
head(df)
##            Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20
## Alaska            365   6315        1.5    69.31   11.3    66.7   152
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65
## California      21198   5114        1.1    71.71   10.3    62.6    20
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166
##              Area
## Alabama     50708
## Alaska     566432
## Arizona    113417
## Arkansas    51945
## California 156361
## Colorado   103766

Next, I have to convert a column into a boolean variable. In this case, I will split states into ‘cold’ and ‘not cold’ represented by a 1 and 0 respectively, using the median number of frost days at the midpoint. This creates our ‘dichotomous’ or ‘binary’ variable. I then do the same for Murder rate.

# Frost
med <- median(df$Frost)
for(i in 1:nrow(df)){
  if (df$Frost[i] < med){
    df$Frost[i] <- FALSE
  }
  else if(df$Frost[i] > med){
    df$Frost[i] <- TRUE
  }
  else{
    df$Frost[i] <- TRUE
  }
}

#Murder Rate
med <- median(df$Murder)
for(i in 1:nrow(df)){
  if (df$Murder[i] < med){
    df$Murder[i] <- FALSE
  }
  else if(df$Murder[i] > med){
    df$Murder[i] <- TRUE
  }
  else{
    df$Murder[i] <- TRUE
  }
}

df
##                Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama              3615   3624        2.1    69.05      1    41.3     0
## Alaska                365   6315        1.5    69.31      1    66.7     1
## Arizona              2212   4530        1.8    70.55      1    58.1     0
## Arkansas             2110   3378        1.9    70.66      1    39.9     0
## California          21198   5114        1.1    71.71      1    62.6     0
## Colorado             2541   4884        0.7    72.06      0    63.9     1
## Connecticut          3100   5348        1.1    72.48      0    56.0     1
## Delaware              579   4809        0.9    70.06      0    54.6     0
## Florida              8277   4815        1.3    70.66      1    52.6     0
## Georgia              4931   4091        2.0    68.54      1    40.6     0
## Hawaii                868   4963        1.9    73.60      0    61.9     0
## Idaho                 813   4119        0.6    71.87      0    59.5     1
## Illinois            11197   5107        0.9    70.14      1    52.6     1
## Indiana              5313   4458        0.7    70.88      1    52.9     1
## Iowa                 2861   4628        0.5    72.56      0    59.0     1
## Kansas               2280   4669        0.6    72.58      0    59.9     0
## Kentucky             3387   3712        1.6    70.10      1    38.5     0
## Louisiana            3806   3545        2.8    68.76      1    42.2     0
## Maine                1058   3694        0.7    70.39      0    54.7     1
## Maryland             4122   5299        0.9    70.22      1    52.3     0
## Massachusetts        5814   4755        1.1    71.83      0    58.5     0
## Michigan             9111   4751        0.9    70.63      1    52.8     1
## Minnesota            3921   4675        0.6    72.96      0    57.6     1
## Mississippi          2341   3098        2.4    68.09      1    41.0     0
## Missouri             4767   4254        0.8    70.69      1    48.8     0
## Montana               746   4347        0.6    70.56      0    59.2     1
## Nebraska             1544   4508        0.6    72.60      0    59.3     1
## Nevada                590   5149        0.5    69.03      1    65.2     1
## New Hampshire         812   4281        0.7    71.23      0    57.6     1
## New Jersey           7333   5237        1.1    70.93      0    52.5     1
## New Mexico           1144   3601        2.2    70.32      1    55.2     1
## New York            18076   4903        1.4    70.55      1    52.7     0
## North Carolina       5441   3875        1.8    69.21      1    38.5     0
## North Dakota          637   5087        0.8    72.78      0    50.3     1
## Ohio                10735   4561        0.8    70.82      1    53.2     1
## Oklahoma             2715   3983        1.1    71.42      0    51.6     0
## Oregon               2284   4660        0.6    72.13      0    60.0     0
## Pennsylvania        11860   4449        1.0    70.43      0    50.2     1
## Rhode Island          931   4558        1.3    71.90      0    46.4     1
## South Carolina       2816   3635        2.3    67.96      1    37.8     0
## South Dakota          681   4167        0.5    72.08      0    53.3     1
## Tennessee            4173   3821        1.7    70.11      1    41.8     0
## Texas               12237   4188        2.2    70.90      1    47.4     0
## Utah                 1203   4022        0.6    72.90      0    67.3     1
## Vermont               472   3907        0.6    71.64      0    57.1     1
## Virginia             4981   4701        1.4    70.08      1    47.8     0
## Washington           3559   4864        0.6    71.72      0    63.5     0
## West Virginia        1799   3617        1.4    69.48      0    41.6     0
## Wisconsin            4589   4468        0.7    72.48      0    54.5     1
## Wyoming               376   4566        0.6    70.29      1    62.9     1
##                  Area
## Alabama         50708
## Alaska         566432
## Arizona        113417
## Arkansas        51945
## California     156361
## Colorado       103766
## Connecticut      4862
## Delaware         1982
## Florida         54090
## Georgia         58073
## Hawaii           6425
## Idaho           82677
## Illinois        55748
## Indiana         36097
## Iowa            55941
## Kansas          81787
## Kentucky        39650
## Louisiana       44930
## Maine           30920
## Maryland         9891
## Massachusetts    7826
## Michigan        56817
## Minnesota       79289
## Mississippi     47296
## Missouri        68995
## Montana        145587
## Nebraska        76483
## Nevada         109889
## New Hampshire    9027
## New Jersey       7521
## New Mexico     121412
## New York        47831
## North Carolina  48798
## North Dakota    69273
## Ohio            40975
## Oklahoma        68782
## Oregon          96184
## Pennsylvania    44966
## Rhode Island     1049
## South Carolina  30225
## South Dakota    75955
## Tennessee       41328
## Texas          262134
## Utah            82096
## Vermont          9267
## Virginia        39780
## Washington      66570
## West Virginia   24070
## Wisconsin       54464
## Wyoming         97203

I will test the income among different states. I will test the the high school graduation rate as an exponential term, frost as the dichotomous term, the product of illiteracy and high school graduation rate as the dichotomous vs. quantitative interaction term. I will test life expectancy as a linear term.

plot(df$Murder, df$Income) 

plot(df$Frost, df$Income) 

plot(df$Illiteracy*df$`HS Grad`, df$Income) 

plot(df$`Life Exp`, df$Income)

hist(df$Murder)

hist(df$Frost)

hist(df$Illiteracy*df$`HS Grad`)

hist(df$`Life Exp`)

product <- df$Illiteracy*df$`HS Grad`

Then I created a multiple regression model using the terms and prescribed in the assignment.

linear.model <- lm(df$Income~ I(df$Murder ** 2) + df$Frost + product + df$`Life Exp`, df)
linear.model
## 
## Call:
## lm(formula = df$Income ~ I(df$Murder^2) + df$Frost + product + 
##     df$`Life Exp`, data = df)
## 
## Coefficients:
##    (Intercept)  I(df$Murder^2)        df$Frost         product  
##     -10116.252         344.292         220.121          -1.639  
##  df$`Life Exp`  
##        202.691
plot(linear.model)

summary(linear.model)
## 
## Call:
## lm(formula = df$Income ~ I(df$Murder^2) + df$Frost + product + 
##     df$`Life Exp`, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1048.0  -293.2  -101.2   371.1  1982.3 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    -10116.252   6182.117  -1.636   0.1087  
## I(df$Murder^2)    344.292    237.101   1.452   0.1534  
## df$Frost          220.121    190.373   1.156   0.2537  
## product            -1.639      4.000  -0.410   0.6840  
## df$`Life Exp`     202.691     85.943   2.358   0.0228 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 581.1 on 45 degrees of freedom
## Multiple R-squared:  0.1786, Adjusted R-squared:  0.1056 
## F-statistic: 2.447 on 4 and 45 DF,  p-value: 0.05994

This is a terrible model. The only thing that remotely indicates average income level is life expectancy, which makes sense because every marginal year of life adds a marginal year of income. An arbitratrialy squared murder rate is about as good an indicator as a coin flip and the non-sense education metric is just that. Whether or not a state has more cold days than average also has little to do with income.