Welcome to R. R is very different from SAS. R is an interpreted language, not a compiled one. This means, you type something into R and it does it. There is no data step. There are no procs. The SAS and R book is very useful for going between the two programs. R uses libraries to do different types of analysis, so we will need to install lots of different libraries to do different things. These need to be downloaded from the internet, using the install.packages() command. You only need to install a package once. E.g.
install.packages("lme4") will install the lme4 library. To use the functions within it, type
library(lme4)
Now you have access to those fuctions.
Below we will go through a simple R session where we basically review much of what we did last semester. We will load a dataset, print some cases, do some descriptice statistics and plots, t tests, and some linear models and diagnostics of those models.
#Load some libraries that I need
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library (car)
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
library(sandwich)
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: TH.data
#Read in a Commma separated values file, In this case, I am using the Population Reference Bureau's Population Data sheet from 2008.
dat<-read.csv("~/Google Drive/dem7283/PRB2008_All.csv")
#print all of the variable names in the data set
names(dat)
## [1] "Y"
## [2] "X"
## [3] "ID"
## [4] "Country"
## [5] "Continent"
## [6] "Region"
## [7] "Year"
## [8] "Population."
## [9] "CBR"
## [10] "CDR"
## [11] "Rate.of.natural.increase"
## [12] "Net.Migration.Rate"
## [13] "ProjectedPopMid2025"
## [14] "ProjectedPopMid2050"
## [15] "ProjectedPopChange_08_50Perc"
## [16] "IMR"
## [17] "WomandLifeTimeRiskMaternalDeath"
## [18] "TFR"
## [19] "PercPopLT15"
## [20] "PercPopGT65"
## [21] "e0Total"
## [22] "e0Male"
## [23] "e0Female"
## [24] "PercUrban"
## [25] "PercPopinUrbanGT750k"
## [26] "PercPop1549HIVAIDS2001"
## [27] "PercPop1549HIVAIDS2007"
## [28] "PercMarWomContraALL"
## [29] "PercMarWomContraModern"
## [30] "PercPpUnderNourished0204"
## [31] "MotorVehper1000Pop0005"
## [32] "PercPopwAccessImprovedWaterSource"
## [33] "GNIPPPperCapitaUSDollars"
## [34] "PopDensPerSqKM"
## [35] "PopDensPerSqMile"
#look at the first 5 cases
head(dat, n=5)
## Y X ID Country Continent Region Year Population. CBR
## 1 1 1 115 Afghanistan Asia South Central Asia 2008 32.7 47
## 2 2 2 178 Albania Europe Southern Europe 2008 3.2 13
## 3 3 3 1 Algeria Africa NORTHERN AFRICA 2008 34.7 22
## 4 4 4 179 Andorra Europe Southern Europe 2008 0.1 10
## 5 5 5 43 Angola Africa MIDDLE AFRICA 2008 16.8 47
## CDR Rate.of.natural.increase Net.Migration.Rate ProjectedPopMid2025
## 1 21 2.6 0 50.3
## 2 6 0.7 -3 3.5
## 3 4 1.8 -1 43.3
## 4 3 0.7 26 0.1
## 5 21 2.7 2 26.2
## ProjectedPopMid2050 ProjectedPopChange_08_50Perc IMR
## 1 81.9 150 163.0
## 2 3.6 11 8.0
## 3 50.1 44 27.0
## 4 0.1 -4 2.5
## 5 42.7 155 132.0
## WomandLifeTimeRiskMaternalDeath TFR PercPopLT15 PercPopGT65 e0Total
## 1 8 6.8 45 2 43
## 2 490 1.6 27 8 75
## 3 220 2.3 30 5 72
## 4 NA 1.2 15 12 NA
## 5 12 6.8 46 2 43
## e0Male e0Female PercUrban PercPopinUrbanGT750k PercPop1549HIVAIDS2001
## 1 43 43 20 12 NA
## 2 72 79 45 NA NA
## 3 71 74 63 12 0.1
## 4 NA NA 90 NA NA
## 5 41 44 57 27 1.6
## PercPop1549HIVAIDS2007 PercMarWomContraALL PercMarWomContraModern
## 1 NA 10 9
## 2 NA 75 8
## 3 0.1 61 52
## 4 NA NA NA
## 5 2.1 6 5
## PercPpUnderNourished0204 MotorVehper1000Pop0005
## 1 NA 6
## 2 6 85
## 3 4 91
## 4 NA 750
## 5 35 NA
## PercPopwAccessImprovedWaterSource GNIPPPperCapitaUSDollars
## 1 22 NA
## 2 97 6580
## 3 85 5490
## 4 100 NA
## 5 51 4400
## PopDensPerSqKM PopDensPerSqMile
## 1 50 129.50
## 2 113 292.67
## 3 15 38.85
## 4 182 471.38
## 5 13 33.67
#Frequency Table of # of Contries by Continent
table(dat$Continent)
##
## Africa Asia Europe North America Oceania
## 56 51 45 27 17
## South America
## 13
#basic summary statistics
summary(dat)
## Y X ID Country
## Min. : 1 Min. : 1 Min. : 1.0 Afghanistan : 1
## 1st Qu.: 53 1st Qu.: 53 1st Qu.: 53.0 Albania : 1
## Median :105 Median :105 Median :104.0 Algeria : 1
## Mean :105 Mean :105 Mean :105.2 Andorra : 1
## 3rd Qu.:157 3rd Qu.:157 3rd Qu.:158.0 Angola : 1
## Max. :209 Max. :209 Max. :210.0 Antigua and Barbuda: 1
## (Other) :203
## Continent Region Year
## Africa :56 EASTERN AFRICA : 19 Min. :2008
## Asia :51 Western Asia : 18 1st Qu.:2008
## Europe :45 Carribean : 17 Median :2008
## North America:27 Oceania : 17 Mean :2008
## Oceania :17 WESTERN AFRICA : 16 3rd Qu.:2008
## South America:13 Southern Europe : 15 Max. :2008
## (Other) :107
## Population. CBR CDR
## Min. : 0.01 Min. : 8.00 Min. : 2.00
## 1st Qu.: 0.90 1st Qu.:13.00 1st Qu.: 6.00
## Median : 5.90 Median :21.00 Median : 8.00
## Mean : 32.08 Mean :23.17 Mean : 9.11
## 3rd Qu.: 20.70 3rd Qu.:31.00 3rd Qu.:10.00
## Max. :1324.70 Max. :50.00 Max. :31.00
##
## Rate.of.natural.increase Net.Migration.Rate ProjectedPopMid2025
## Min. :-0.600 Min. :-18 Min. : 0.01
## 1st Qu.: 0.500 1st Qu.: -1 1st Qu.: 1.10
## Median : 1.400 Median : 0 Median : 7.70
## Mean : 1.407 Mean : 1 Mean : 38.27
## 3rd Qu.: 2.200 3rd Qu.: 2 3rd Qu.: 26.20
## Max. : 3.600 Max. : 41 Max. :1476.00
## NA's :1
## ProjectedPopMid2050 ProjectedPopChange_08_50Perc IMR
## Min. : 0.02 Min. :-35.00 Min. : 1.30
## 1st Qu.: 1.30 1st Qu.: 6.00 1st Qu.: 7.20
## Median : 8.80 Median : 38.00 Median : 20.00
## Mean : 44.74 Mean : 47.84 Mean : 35.39
## 3rd Qu.: 34.70 3rd Qu.: 75.00 3rd Qu.: 58.50
## Max. :1755.20 Max. :263.00 Max. :163.00
## NA's :2
## WomandLifeTimeRiskMaternalDeath TFR PercPopLT15
## Min. : 7.0 Min. :1.000 Min. :13.00
## 1st Qu.: 46.5 1st Qu.:1.775 1st Qu.:20.00
## Median : 290.0 Median :2.500 Median :30.00
## Mean : 3462.9 Mean :3.032 Mean :29.93
## 3rd Qu.: 4450.0 3rd Qu.:4.000 3rd Qu.:39.00
## Max. :47600.0 Max. :7.100 Max. :49.00
## NA's :38 NA's :1
## PercPopGT65 e0Total e0Male e0Female
## Min. : 1.000 Min. :33.00 Min. :33.0 Min. :34.00
## 1st Qu.: 4.000 1st Qu.:61.50 1st Qu.:59.0 1st Qu.:63.50
## Median : 6.000 Median :72.00 Median :69.0 Median :74.00
## Mean : 7.512 Mean :67.85 Mean :65.6 Mean :70.15
## 3rd Qu.:11.000 3rd Qu.:75.00 3rd Qu.:73.0 3rd Qu.:79.00
## Max. :22.000 Max. :82.00 Max. :80.0 Max. :86.00
## NA's :2 NA's :2 NA's :2
## PercUrban PercPopinUrbanGT750k PercPop1549HIVAIDS2001
## Min. : 10.00 Min. : 3.00 Min. : 0.100
## 1st Qu.: 35.75 1st Qu.: 12.00 1st Qu.: 0.100
## Median : 57.50 Median : 19.00 Median : 0.500
## Mean : 55.94 Mean : 24.12 Mean : 2.434
## 3rd Qu.: 74.50 3rd Qu.: 31.00 3rd Qu.: 1.725
## Max. :100.00 Max. :100.00 Max. :26.500
## NA's :1 NA's :87 NA's :81
## PercPop1549HIVAIDS2007 PercMarWomContraALL PercMarWomContraModern
## Min. : 0.100 Min. : 3.00 Min. : 1.00
## 1st Qu.: 0.100 1st Qu.:31.00 1st Qu.:17.75
## Median : 0.500 Median :53.00 Median :39.00
## Mean : 2.145 Mean :49.44 Mean :38.61
## 3rd Qu.: 1.700 3rd Qu.:70.00 3rd Qu.:58.00
## Max. :26.100 Max. :90.00 Max. :90.00
## NA's :69 NA's :50 NA's :41
## PercPpUnderNourished0204 MotorVehper1000Pop0005
## Min. : 2.50 Min. : 1.0
## 1st Qu.: 3.00 1st Qu.: 54.5
## Median : 8.00 Median :159.0
## Mean :14.83 Mean :249.8
## 3rd Qu.:23.00 3rd Qu.:454.5
## Max. :75.00 Max. :787.0
## NA's :36 NA's :78
## PercPopwAccessImprovedWaterSource GNIPPPperCapitaUSDollars
## Min. : 22.00 Min. : 290
## 1st Qu.: 73.00 1st Qu.: 2110
## Median : 91.00 Median : 5950
## Mean : 84.04 Mean :11878
## 3rd Qu.: 99.00 3rd Qu.:15185
## Max. :100.00 Max. :64400
## NA's :31 NA's :33
## PopDensPerSqKM PopDensPerSqMile
## Min. : 2.0 Min. : 5.18
## 1st Qu.: 31.0 1st Qu.: 80.29
## Median : 77.0 Median : 199.43
## Mean : 472.9 Mean : 1224.67
## 3rd Qu.: 195.0 3rd Qu.: 505.05
## Max. :34000.0 Max. :88060.00
##
#just want a mean
mean(dat$IMR, na.rm=T)
## [1] 35.38744
#standard deviation
sd(dat$IMR, na.rm=T)
## [1] 35.4421
#Quantiles
quantile(dat$IMR, na.rm=T)
## 0% 25% 50% 75% 100%
## 1.3 7.2 20.0 58.5 163.0
#histogram of the infant mortality rate
hist(dat$IMR, main="Histogram of Infant Mortality Rate")
#Box plot for IMR* Continent
boxplot(IMR~Continent, dat,main="Boxplot of Infant Mortality Rate by continent")
#scatter plot of IMR * TFR
plot(IMR~TFR, data=dat, main="Bivariate Association between TFR and IMR")
#t-test for Africa vs Rest of the world
#Useing the I() funciton, which generates a T/F value, i.e. 2 groups, Africa or Not Africa
t.test(IMR~I(Continent=="Africa"), dat)
##
## Welch Two Sample t-test
##
## data: IMR by I(Continent == "Africa")
## t = -11.5883, df = 74.547, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -64.50901 -45.58185
## sample estimates:
## mean in group FALSE mean in group TRUE
## 20.76184 75.80727
#Simple ANOVA model version of a t-test
fit<-lm(IMR~I(Continent=="Africa"), dat)
summary(fit)
##
## Call:
## lm(formula = IMR ~ I(Continent == "Africa"), data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -67.807 -15.612 -4.762 11.238 142.238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.762 2.092 9.924 <2e-16 ***
## I(Continent == "Africa")TRUE 55.045 4.059 13.562 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.79 on 205 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.4729, Adjusted R-squared: 0.4703
## F-statistic: 183.9 on 1 and 205 DF, p-value: < 2.2e-16
#ANOVA model for IMR by continent
fit<-lm(IMR~Continent, dat)
anova(fit)
## Analysis of Variance Table
##
## Response: IMR
## Df Sum Sq Mean Sq F value Pr(>F)
## Continent 5 139806 27961.1 47.245 < 2.2e-16 ***
## Residuals 201 118960 591.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Post hoc tests using Tukey comparisons
glht(fit, linfct=mcp(Continent="Tukey"))
##
## General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Linear Hypotheses:
## Estimate
## Asia - Africa == 0 -43.084
## Europe - Africa == 0 -69.716
## North America - Africa == 0 -57.874
## Oceania - Africa == 0 -50.384
## South America - Africa == 0 -52.538
## Europe - Asia == 0 -26.633
## North America - Asia == 0 -14.790
## Oceania - Asia == 0 -7.300
## South America - Asia == 0 -9.454
## North America - Europe == 0 11.842
## Oceania - Europe == 0 19.333
## South America - Europe == 0 17.178
## Oceania - North America == 0 7.490
## South America - North America == 0 5.336
## South America - Oceania == 0 -2.154
#Basic OLS models
#fit the basic regression model
fit1<-lm(IMR ~ TFR + log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile), data=dat)
summary(fit1)
##
## Call:
## lm(formula = IMR ~ TFR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile),
## data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.374 -7.184 0.466 6.207 61.161
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.4360 16.1225 4.059 7.48e-05 ***
## TFR 14.1539 1.1307 12.518 < 2e-16 ***
## log(GNIPPPperCapitaUSDollars) -7.9888 1.3964 -5.721 4.60e-08 ***
## log(PopDensPerSqMile) -0.5812 0.7857 -0.740 0.46
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.42 on 172 degrees of freedom
## (33 observations deleted due to missingness)
## Multiple R-squared: 0.8331, Adjusted R-squared: 0.8302
## F-statistic: 286.3 on 3 and 172 DF, p-value: < 2.2e-16
#do some diagnostic plots
plot(fit1)
#Normality of residuals from the first fit?
shapiro.test(rstudent(fit1))
##
## Shapiro-Wilk normality test
##
## data: rstudent(fit1)
## W = 0.9562, p-value = 2.683e-05
plot(density(rstudent(fit1)))
#test for heteroskedasticity
bptest(fit1)
##
## studentized Breusch-Pagan test
##
## data: fit1
## BP = 26.6895, df = 3, p-value = 6.839e-06
#variance inflation factors
vif(fit1)
## TFR log(GNIPPPperCapitaUSDollars)
## 2.836245 2.720001
## log(PopDensPerSqMile)
## 1.071165
#make White-corrected t-statistics and p values
coeftest(fit1, vcov=vcovHC(fit1, type = "HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.43597 17.17594 3.8097 0.0001934 ***
## TFR 14.15386 1.39263 10.1634 < 2.2e-16 ***
## log(GNIPPPperCapitaUSDollars) -7.98885 1.53293 -5.2115 5.318e-07 ***
## log(PopDensPerSqMile) -0.58116 0.63402 -0.9166 0.3606190
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#now we fit a model that includes differences between continents
fit2<-lm(IMR ~ TFR + log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile)+ Continent, data=dat)
summary(fit2)
##
## Call:
## lm(formula = IMR ~ TFR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) +
## Continent, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.344 -5.561 -0.035 5.857 58.758
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.6778 15.1184 5.535 1.19e-07 ***
## TFR 11.6411 1.1912 9.772 < 2e-16 ***
## log(GNIPPPperCapitaUSDollars) -7.8487 1.3118 -5.983 1.29e-08 ***
## log(PopDensPerSqMile) -0.8502 0.7691 -1.105 0.27054
## ContinentAsia -10.9400 3.4574 -3.164 0.00185 **
## ContinentEurope -13.1368 4.3141 -3.045 0.00270 **
## ContinentNorth America -17.6359 4.1083 -4.293 2.98e-05 ***
## ContinentOceania -25.8905 4.7398 -5.462 1.68e-07 ***
## ContinentSouth America -17.3111 4.8089 -3.600 0.00042 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.15 on 167 degrees of freedom
## (33 observations deleted due to missingness)
## Multiple R-squared: 0.8653, Adjusted R-squared: 0.8588
## F-statistic: 134.1 on 8 and 167 DF, p-value: < 2.2e-16
#We can compare how model 2 performs compared to model 1 by using an F test
anova (fit1, fit2, test="F")
## Analysis of Variance Table
##
## Model 1: IMR ~ TFR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile)
## Model 2: IMR ~ TFR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) +
## Continent
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 172 35745
## 2 167 28859 5 6886.1 7.9698 9.276e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Now we fit a model with an interaction term between continent and GDP, in R this is easy, all we have to do is use the * operation
fit3<-lm(IMR ~ TFR + log(GNIPPPperCapitaUSDollars)*Continent+ log(PopDensPerSqMile), data=dat)
summary(fit3)
##
## Call:
## lm(formula = IMR ~ TFR + log(GNIPPPperCapitaUSDollars) * Continent +
## log(PopDensPerSqMile), data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.432 -6.263 0.448 5.030 59.455
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 69.6034 23.2830
## TFR 11.9535 1.3409
## log(GNIPPPperCapitaUSDollars) -6.4823 2.2798
## ContinentAsia 23.6205 22.9200
## ContinentEurope -23.0881 38.1155
## ContinentNorth America -49.9613 34.0903
## ContinentOceania -24.8931 36.7178
## ContinentSouth America 63.9326 71.9055
## log(PopDensPerSqMile) -0.3578 0.8147
## log(GNIPPPperCapitaUSDollars):ContinentAsia -4.0986 2.7276
## log(GNIPPPperCapitaUSDollars):ContinentEurope 0.7370 3.9652
## log(GNIPPPperCapitaUSDollars):ContinentNorth America 3.4640 3.9116
## log(GNIPPPperCapitaUSDollars):ContinentOceania -0.1979 4.4117
## log(GNIPPPperCapitaUSDollars):ContinentSouth America -9.2290 8.1168
## t value Pr(>|t|)
## (Intercept) 2.989 0.00323 **
## TFR 8.914 9.83e-16 ***
## log(GNIPPPperCapitaUSDollars) -2.843 0.00504 **
## ContinentAsia 1.031 0.30428
## ContinentEurope -0.606 0.54554
## ContinentNorth America -1.466 0.14471
## ContinentOceania -0.678 0.49877
## ContinentSouth America 0.889 0.37526
## log(PopDensPerSqMile) -0.439 0.66116
## log(GNIPPPperCapitaUSDollars):ContinentAsia -1.503 0.13488
## log(GNIPPPperCapitaUSDollars):ContinentEurope 0.186 0.85278
## log(GNIPPPperCapitaUSDollars):ContinentNorth America 0.886 0.37716
## log(GNIPPPperCapitaUSDollars):ContinentOceania -0.045 0.96427
## log(GNIPPPperCapitaUSDollars):ContinentSouth America -1.137 0.25720
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.11 on 162 degrees of freedom
## (33 observations deleted due to missingness)
## Multiple R-squared: 0.87, Adjusted R-squared: 0.8596
## F-statistic: 83.39 on 13 and 162 DF, p-value: < 2.2e-16
#We can compare out model 2 with the interaction model to see if there is a significant difference in how GDP operates across continents
anova (fit2, fit3, test="F")
## Analysis of Variance Table
##
## Model 1: IMR ~ TFR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) +
## Continent
## Model 2: IMR ~ TFR + log(GNIPPPperCapitaUSDollars) * Continent + log(PopDensPerSqMile)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 167 28858
## 2 162 27848 5 1010.7 1.176 0.3232
#This test suggests that there is not a significant interaction between GDP and Continent
#examine a log transformed outcome, this can be easily done within the lm() function
lm2<-lm(log(IMR) ~ TFR + log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile), data=dat)
summary(lm2)
##
## Call:
## lm(formula = log(IMR) ~ TFR + log(GNIPPPperCapitaUSDollars) +
## log(PopDensPerSqMile), data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.80051 -0.29549 -0.00936 0.28241 1.61830
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.76378 0.54737 14.184 < 2e-16 ***
## TFR 0.21874 0.03839 5.698 5.16e-08 ***
## log(GNIPPPperCapitaUSDollars) -0.57417 0.04741 -12.111 < 2e-16 ***
## log(PopDensPerSqMile) -0.08209 0.02667 -3.078 0.00243 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4894 on 172 degrees of freedom
## (33 observations deleted due to missingness)
## Multiple R-squared: 0.8292, Adjusted R-squared: 0.8263
## F-statistic: 278.4 on 3 and 172 DF, p-value: < 2.2e-16
#Normality of errors
shapiro.test(rstudent(lm2))
##
## Shapiro-Wilk normality test
##
## data: rstudent(lm2)
## W = 0.9815, p-value = 0.01955
#Heterosckedasticity test
bptest(lm2)
##
## studentized Breusch-Pagan test
##
## data: lm2
## BP = 14.9121, df = 3, p-value = 0.001893
#diagnostic plots
plot(lm2)