Extended Example: Regression Analysis of Infections
Let us create a dataframe by using one of the files available in Canvas, PopInf.
infections <- read.table("PopInf.txt",header=FALSE)
Does our dataset have columns headers? Why?
#View(infections)
We just noticed that our dataset do have headers
infections <- read.table("PopInf.txt",header=TRUE)
#View(infections)
What is the class of the dataframe infections?
class(infections)
## [1] "data.frame"
Inspect the dataframe infections using the head function. Describe the syntax and the output.
head(infections)
Define a variable lm as a linear regression model explaning how the population may or may not affect the number of infections. Describe the syntax and the output.
lm1<-lm(infections$infections~infections$pop)
lm1
##
## Call:
## lm(formula = infections$infections ~ infections$pop)
##
## Coefficients:
## (Intercept) infections$pop
## 6.275e+02 3.601e-03
Let us run a chunck of code returning a summary of the model.
summary(lm1)
##
## Call:
## lm(formula = infections$infections ~ infections$pop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1242.9 -635.5 -537.3 -367.5 6085.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.275e+02 3.126e+02 2.007 0.054826 .
## infections$pop 3.601e-03 9.668e-04 3.725 0.000912 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1528 on 27 degrees of freedom
## Multiple R-squared: 0.3394, Adjusted R-squared: 0.315
## F-statistic: 13.87 on 1 and 27 DF, p-value: 0.0009122
What was the null hypothesis? Did we reject or fail to reject the null?
Run a function returning the different attributes available for our linear regression model.
attributes(lm1)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
What are the values of the coefficients? Please interpret the coefficient for the population and the meaning for our model(problem).
lm1$coefficients
## (Intercept) infections$pop
## 6.275345e+02 3.601114e-03
Above, we just used the R language to analyze a simple linear regression problem(SLR). Let us know apply our basic knowledge of R to a multiple linear regression (MLR) scenario.
Let us define the function moreinfections by using the dataset available in Canvas,infections1.
moreinfections <- read.table("infections1.txt",header=TRUE)
Let us inspect the dataframe moreinfections.
head(moreinfections)
Define a new variable that stores the linear regression model describing the number of infections as a function of ufo and pop. Display the summary. Interpret the output.
lm2<-lm(moreinfections$infections~moreinfections$ufo2010+moreinfections$pop)
summary(lm2)
##
## Call:
## lm(formula = moreinfections$infections ~ moreinfections$ufo2010 +
## moreinfections$pop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1210.1 -595.9 -510.3 -192.3 6100.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.187e+02 3.129e+02 1.977 0.0587 .
## moreinfections$ufo2010 2.235e+01 2.267e+01 0.986 0.3332
## moreinfections$pop 9.281e-04 2.878e-03 0.322 0.7497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1528 on 26 degrees of freedom
## Multiple R-squared: 0.3632, Adjusted R-squared: 0.3143
## F-statistic: 7.416 on 2 and 26 DF, p-value: 0.002829