Extended Example: Regression Analysis of Infections

Let us create a dataframe by using one of the files available in Canvas, PopInf.

 infections <- read.table("PopInf.txt",header=FALSE)

Does our dataset have columns headers? Why?

#View(infections)

We just noticed that our dataset do have headers

infections <- read.table("PopInf.txt",header=TRUE)
#View(infections)

What is the class of the dataframe infections?

class(infections)
## [1] "data.frame"

Inspect the dataframe infections using the head function. Describe the syntax and the output.

head(infections)

Define a variable lm as a linear regression model explaning how the population may or may not affect the number of infections. Describe the syntax and the output.

lm1<-lm(infections$infections~infections$pop)
lm1
## 
## Call:
## lm(formula = infections$infections ~ infections$pop)
## 
## Coefficients:
##    (Intercept)  infections$pop  
##      6.275e+02       3.601e-03

Let us run a chunck of code returning a summary of the model.

summary(lm1)
## 
## Call:
## lm(formula = infections$infections ~ infections$pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1242.9  -635.5  -537.3  -367.5  6085.0 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.275e+02  3.126e+02   2.007 0.054826 .  
## infections$pop 3.601e-03  9.668e-04   3.725 0.000912 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1528 on 27 degrees of freedom
## Multiple R-squared:  0.3394, Adjusted R-squared:  0.315 
## F-statistic: 13.87 on 1 and 27 DF,  p-value: 0.0009122

What was the null hypothesis? Did we reject or fail to reject the null?

Run a function returning the different attributes available for our linear regression model.

attributes(lm1)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"

What are the values of the coefficients? Please interpret the coefficient for the population and the meaning for our model(problem).

lm1$coefficients
##    (Intercept) infections$pop 
##   6.275345e+02   3.601114e-03

Above, we just used the R language to analyze a simple linear regression problem(SLR). Let us know apply our basic knowledge of R to a multiple linear regression (MLR) scenario.

Let us define the function moreinfections by using the dataset available in Canvas,infections1.

moreinfections <- read.table("infections1.txt",header=TRUE)

Let us inspect the dataframe moreinfections.

head(moreinfections)

Define a new variable that stores the linear regression model describing the number of infections as a function of ufo and pop. Display the summary. Interpret the output.

lm2<-lm(moreinfections$infections~moreinfections$ufo2010+moreinfections$pop)
summary(lm2)
## 
## Call:
## lm(formula = moreinfections$infections ~ moreinfections$ufo2010 + 
##     moreinfections$pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1210.1  -595.9  -510.3  -192.3  6100.0 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            6.187e+02  3.129e+02   1.977   0.0587 .
## moreinfections$ufo2010 2.235e+01  2.267e+01   0.986   0.3332  
## moreinfections$pop     9.281e-04  2.878e-03   0.322   0.7497  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1528 on 26 degrees of freedom
## Multiple R-squared:  0.3632, Adjusted R-squared:  0.3143 
## F-statistic: 7.416 on 2 and 26 DF,  p-value: 0.002829