Check current location, assign appropriate and add data
setwd("C:/1rdata")
getwd()
## [1] "C:/1rdata"
popinf <- read.table("popinf.txt", header = TRUE)
Inspect head & class; Dataset has 2 columns: pop & infection; Class is dataframe;
head(popinf)
class(popinf)
## [1] "data.frame"
Define a variable ‘lm1’ as a linear regression model (LRM) with one dependent variables ‘infections’ and one independent variable ‘pop’. ‘lm1’ - name new variable, which will save result of LRM lm() - LRM function popinf\(infections - dependent variable, to take all rows from column popinf\)pop - independent variable, to take all rows from column
Output explanation see below (rows #52-53)
lm1<-lm(popinf$infections~popinf$pop)
lm1
##
## Call:
## lm(formula = popinf$infections ~ popinf$pop)
##
## Coefficients:
## (Intercept) popinf$pop
## 6.275e+02 3.601e-03
Returns summary of the ‘lm1’ (2 variable LRM)
summary(lm1)
##
## Call:
## lm(formula = popinf$infections ~ popinf$pop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1242.9 -635.5 -537.3 -367.5 6085.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.275e+02 3.126e+02 2.007 0.054826 .
## popinf$pop 3.601e-03 9.668e-04 3.725 0.000912 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1528 on 27 degrees of freedom
## Multiple R-squared: 0.3394, Adjusted R-squared: 0.315
## F-statistic: 13.87 on 1 and 27 DF, p-value: 0.0009122
The null hypothesis was that number of infections isn’t influenced by number of population. Null hypothesis was rejected.
Run function returning the different attributes available for LRM.
attributes(lm1)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
Determine the values of the coefficients for LRM ‘lm1’
lm1$coefficients
## (Intercept) popinf$pop
## 6.275345e+02 3.601114e-03
intercept = 6.27 means that at population = 0 there will 6.27 infections popinf$pop = 3.6 means that for each 1000 people of population it will be 3.6 additional infections
Add new data
moreinfections <- read.table("infections1.txt",header=TRUE)
Inspect headers. 3 columns: ‘infections’, ‘ufo2010’, ‘pop’
head(moreinfections)
Define a new variable that stores the linear regression model describing the number of infections as a function of ufo and pop. Display the summary.
lm2<-lm(moreinfections$infections~moreinfections$ufo2010+moreinfections$pop)
summary(lm2)
##
## Call:
## lm(formula = moreinfections$infections ~ moreinfections$ufo2010 +
## moreinfections$pop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1210.1 -595.9 -510.3 -192.3 6100.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.187e+02 3.129e+02 1.977 0.0587 .
## moreinfections$ufo2010 2.235e+01 2.267e+01 0.986 0.3332
## moreinfections$pop 9.281e-04 2.878e-03 0.322 0.7497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1528 on 26 degrees of freedom
## Multiple R-squared: 0.3632, Adjusted R-squared: 0.3143
## F-statistic: 7.416 on 2 and 26 DF, p-value: 0.002829
Both model parameters aren’t significant. Null hypotheses is confirmed (both parameters have no influence on final outcome).