Check current location, assign appropriate and add data

setwd("C:/1rdata")
getwd()
## [1] "C:/1rdata"
popinf <- read.table("popinf.txt", header = TRUE)

Inspect head & class; Dataset has 2 columns: pop & infection; Class is dataframe;

head(popinf)
class(popinf)
## [1] "data.frame"

Define a variable ‘lm1’ as a linear regression model (LRM) with one dependent variables ‘infections’ and one independent variable ‘pop’. ‘lm1’ - name new variable, which will save result of LRM lm() - LRM function popinf\(infections - dependent variable, to take all rows from column popinf\)pop - independent variable, to take all rows from column

Output explanation see below (rows #52-53)

lm1<-lm(popinf$infections~popinf$pop)
lm1
## 
## Call:
## lm(formula = popinf$infections ~ popinf$pop)
## 
## Coefficients:
## (Intercept)   popinf$pop  
##   6.275e+02    3.601e-03

Returns summary of the ‘lm1’ (2 variable LRM)

summary(lm1)
## 
## Call:
## lm(formula = popinf$infections ~ popinf$pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1242.9  -635.5  -537.3  -367.5  6085.0 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.275e+02  3.126e+02   2.007 0.054826 .  
## popinf$pop  3.601e-03  9.668e-04   3.725 0.000912 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1528 on 27 degrees of freedom
## Multiple R-squared:  0.3394, Adjusted R-squared:  0.315 
## F-statistic: 13.87 on 1 and 27 DF,  p-value: 0.0009122

The null hypothesis was that number of infections isn’t influenced by number of population. Null hypothesis was rejected.

Run function returning the different attributes available for LRM.

attributes(lm1)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"

Determine the values of the coefficients for LRM ‘lm1’

lm1$coefficients
##  (Intercept)   popinf$pop 
## 6.275345e+02 3.601114e-03

intercept = 6.27 means that at population = 0 there will 6.27 infections popinf$pop = 3.6 means that for each 1000 people of population it will be 3.6 additional infections

Add new data

moreinfections <- read.table("infections1.txt",header=TRUE)

Inspect headers. 3 columns: ‘infections’, ‘ufo2010’, ‘pop’

head(moreinfections)

Define a new variable that stores the linear regression model describing the number of infections as a function of ufo and pop. Display the summary.

lm2<-lm(moreinfections$infections~moreinfections$ufo2010+moreinfections$pop)
summary(lm2)
## 
## Call:
## lm(formula = moreinfections$infections ~ moreinfections$ufo2010 + 
##     moreinfections$pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1210.1  -595.9  -510.3  -192.3  6100.0 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            6.187e+02  3.129e+02   1.977   0.0587 .
## moreinfections$ufo2010 2.235e+01  2.267e+01   0.986   0.3332  
## moreinfections$pop     9.281e-04  2.878e-03   0.322   0.7497  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1528 on 26 degrees of freedom
## Multiple R-squared:  0.3632, Adjusted R-squared:  0.3143 
## F-statistic: 7.416 on 2 and 26 DF,  p-value: 0.002829

Both model parameters aren’t significant. Null hypotheses is confirmed (both parameters have no influence on final outcome).