Activity 3 - Infections

Importing Data

Assign dataset (dataframe) to infections variable without headers being set

infections <- read.table("C:/Users/marxm/OneDrive/Documents/ProgrammingForData_R/acti3/data/PopInf.txt",header=FALSE)

View dataset

#View(infections)

Assign dataset to variable with headers being set

infections <- read.table("C:/Users/marxm/OneDrive/Documents/ProgrammingForData_R/acti3/data/PopInf.txt",header=TRUE)

View data

#View(infections)

Returning the class of variable (data.frame)

class(infections)
## [1] "data.frame"

View first 5 records in dataframe

head functions returns first 5 records the headers set earlier show population and infections columns

head(infections)
##      pop infections
## 1  25101        245
## 2  61912        215
## 3  33341       2076
## 4 409061       5023
## 5   7481        189
## 6  18675        195

Linear Regression

a linear regression model is created and trained the dependent (to be predicted) variable first -> infections the independent (used to predict) second -> population trying to predict the number of infections based on the population

lm1<-lm(infections$infections~infections$pop)
lm1
## 
## Call:
## lm(formula = infections$infections ~ infections$pop)
## 
## Coefficients:
##    (Intercept)  infections$pop  
##      6.275e+02       3.601e-03

show the summary: Ho: no correlation between population and infections with p-value being less than 0.05 the Ho is rejected -> there is a correlation

summary(lm1)
## 
## Call:
## lm(formula = infections$infections ~ infections$pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1242.9  -635.5  -537.3  -367.5  6085.0 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.275e+02  3.126e+02   2.007 0.054826 .  
## infections$pop 3.601e-03  9.668e-04   3.725 0.000912 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1528 on 27 degrees of freedom
## Multiple R-squared:  0.3394, Adjusted R-squared:  0.315 
## F-statistic: 13.87 on 1 and 27 DF,  p-value: 0.0009122

returns attributes

attributes(lm1)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"

coefficients: the coefficients of 6.28 and 3.60 tell us that: 6.28e+02 + 3.60*pope-03 = infections

lm1$coefficients
##    (Intercept) infections$pop 
##   6.275345e+02   3.601114e-03

Import second dataset

moreinfections <- read.table("C:/Users/marxm/OneDrive/Documents/ProgrammingForData_R/acti3/data/infections1.txt",header=TRUE)

5 first rows of second dataset

head(moreinfections)
##   infections ufo2010    pop
## 1        245       2  25101
## 2        215       6  61912
## 3       2076       2  33341
## 4       5023      59 409061
## 5        189       0   7481
## 6        195       1  18675

train second linear regression model show summary: dependent var -> infections independent (predictors) -> ufo sightings and population Ho -> no correlation p-value -> less than 0.05 Ho rejected -> there is correlation Correlation is not equal to causation!!!

lm2<-lm(moreinfections$infections~moreinfections$ufo2010+moreinfections$pop)
summary(lm2)
## 
## Call:
## lm(formula = moreinfections$infections ~ moreinfections$ufo2010 + 
##     moreinfections$pop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1210.1  -595.9  -510.3  -192.3  6100.0 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            6.187e+02  3.129e+02   1.977   0.0587 .
## moreinfections$ufo2010 2.235e+01  2.267e+01   0.986   0.3332  
## moreinfections$pop     9.281e-04  2.878e-03   0.322   0.7497  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1528 on 26 degrees of freedom
## Multiple R-squared:  0.3632, Adjusted R-squared:  0.3143 
## F-statistic: 7.416 on 2 and 26 DF,  p-value: 0.002829