Assign dataset (dataframe) to infections variable without headers being set
infections <- read.table("C:/Users/marxm/OneDrive/Documents/ProgrammingForData_R/acti3/data/PopInf.txt",header=FALSE)
View dataset
#View(infections)
Assign dataset to variable with headers being set
infections <- read.table("C:/Users/marxm/OneDrive/Documents/ProgrammingForData_R/acti3/data/PopInf.txt",header=TRUE)
View data
#View(infections)
Returning the class of variable (data.frame)
class(infections)
## [1] "data.frame"
head functions returns first 5 records the headers set earlier show population and infections columns
head(infections)
## pop infections
## 1 25101 245
## 2 61912 215
## 3 33341 2076
## 4 409061 5023
## 5 7481 189
## 6 18675 195
a linear regression model is created and trained the dependent (to be predicted) variable first -> infections the independent (used to predict) second -> population trying to predict the number of infections based on the population
lm1<-lm(infections$infections~infections$pop)
lm1
##
## Call:
## lm(formula = infections$infections ~ infections$pop)
##
## Coefficients:
## (Intercept) infections$pop
## 6.275e+02 3.601e-03
show the summary: Ho: no correlation between population and infections with p-value being less than 0.05 the Ho is rejected -> there is a correlation
summary(lm1)
##
## Call:
## lm(formula = infections$infections ~ infections$pop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1242.9 -635.5 -537.3 -367.5 6085.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.275e+02 3.126e+02 2.007 0.054826 .
## infections$pop 3.601e-03 9.668e-04 3.725 0.000912 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1528 on 27 degrees of freedom
## Multiple R-squared: 0.3394, Adjusted R-squared: 0.315
## F-statistic: 13.87 on 1 and 27 DF, p-value: 0.0009122
returns attributes
attributes(lm1)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
coefficients: the coefficients of 6.28 and 3.60 tell us that: 6.28e+02 + 3.60*pope-03 = infections
lm1$coefficients
## (Intercept) infections$pop
## 6.275345e+02 3.601114e-03
Import second dataset
moreinfections <- read.table("C:/Users/marxm/OneDrive/Documents/ProgrammingForData_R/acti3/data/infections1.txt",header=TRUE)
5 first rows of second dataset
head(moreinfections)
## infections ufo2010 pop
## 1 245 2 25101
## 2 215 6 61912
## 3 2076 2 33341
## 4 5023 59 409061
## 5 189 0 7481
## 6 195 1 18675
train second linear regression model show summary: dependent var -> infections independent (predictors) -> ufo sightings and population Ho -> no correlation p-value -> less than 0.05 Ho rejected -> there is correlation Correlation is not equal to causation!!!
lm2<-lm(moreinfections$infections~moreinfections$ufo2010+moreinfections$pop)
summary(lm2)
##
## Call:
## lm(formula = moreinfections$infections ~ moreinfections$ufo2010 +
## moreinfections$pop)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1210.1 -595.9 -510.3 -192.3 6100.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.187e+02 3.129e+02 1.977 0.0587 .
## moreinfections$ufo2010 2.235e+01 2.267e+01 0.986 0.3332
## moreinfections$pop 9.281e-04 2.878e-03 0.322 0.7497
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1528 on 26 degrees of freedom
## Multiple R-squared: 0.3632, Adjusted R-squared: 0.3143
## F-statistic: 7.416 on 2 and 26 DF, p-value: 0.002829