Climate Change

There have been many studies documenting that the average global temperature has been increasing over the last century. The consequences of a continued rise in global temperature will be dire. Rising sea levels and an increased frequency of extreme weather events will affect billions of people.

In this problem, we will attempt to study the relationship between average global temperature and several other factors.

The file climate_change.csv contains climate data from May 1983 to December 2008. The available variables include:

Year: the observation year.
Month: the observation month.
Temp: the difference in degrees Celsius between the average global temperature in that period and a reference value. This data comes from the Climatic Research Unit at the University of East Anglia.
CO2, N2O, CH4, CFC.11, CFC.12: atmospheric concentrations of carbon dioxide (CO2), nitrous oxide (N2O), methane  (CH4), trichlorofluoromethane (CCl3F; commonly referred to as CFC-11) and dichlorodifluoromethane (CCl2F2; commonly referred to as CFC-12), respectively. This data comes from the ESRL/NOAA Global Monitoring Division.
    CO2, N2O and CH4 are expressed in ppmv (parts per million by volume  -- i.e., 397 ppmv of CO2 means that CO2 constitutes 397 millionths of the total volume of the atmosphere)
    CFC.11 and CFC.12 are expressed in ppbv (parts per billion by volume). 
Aerosols: the mean stratospheric aerosol optical depth at 550 nm. This variable is linked to volcanoes, as volcanic eruptions result in new particles being added to the atmosphere, which affect how much of the sun's energy is reflected back into space. This data is from the Godard Institute for Space Studies at NASA.
TSI: the total solar irradiance (TSI) in W/m2 (the rate at which the sun's energy is deposited per unit area). Due to sunspots and other solar phenomena, the amount of energy that is given off by the sun varies substantially with time. This data is from the SOLARIS-HEPPA project website.
MEI: multivariate El Nino Southern Oscillation index (MEI), a measure of the strength of the El Nino/La Nina-Southern Oscillation (a weather effect in the Pacific Ocean that affects global temperatures). This data comes from the ESRL/NOAA Physical Sciences Division.
clim_change <- read.csv("climate_change.csv")
str(clim_change)
## 'data.frame':    308 obs. of  11 variables:
##  $ Year    : int  1983 1983 1983 1983 1983 1983 1983 1983 1984 1984 ...
##  $ Month   : int  5 6 7 8 9 10 11 12 1 2 ...
##  $ MEI     : num  2.556 2.167 1.741 1.13 0.428 ...
##  $ CO2     : num  346 346 344 342 340 ...
##  $ CH4     : num  1639 1634 1633 1631 1648 ...
##  $ N2O     : num  304 304 304 304 304 ...
##  $ CFC.11  : num  191 192 193 194 194 ...
##  $ CFC.12  : num  350 352 354 356 357 ...
##  $ TSI     : num  1366 1366 1366 1366 1366 ...
##  $ Aerosols: num  0.0863 0.0794 0.0731 0.0673 0.0619 0.0569 0.0524 0.0486 0.0451 0.0416 ...
##  $ Temp    : num  0.109 0.118 0.137 0.176 0.149 0.093 0.232 0.078 0.089 0.013 ...

Splitting test and train data

Let’s make them separated. We are interested in how changes in these variables affect future temperatures, as well as how well these variables explain temperature changes so far. To do this, first read the dataset climate_change.csv into R.

Then, split the data into a training set, consisting of all the observations up to and including 2006, and a testing set consisting of the remaining years (hint: use subset). A training set refers to the data that will be used to build the model (this is the data we give to the lm() function), and a testing set refers to the data we will use to test our predictive ability.

Next, build a linear regression model to predict the dependent variable Temp, using MEI, CO2, CH4, N2O, CFC.11, CFC.12, TSI, and Aerosols as independent variables (Year and Month should NOT be used in the model). Use the training set to build the model.

Enter the model R2 (the “Multiple R-squared” value):

train_cc <- subset(clim_change, Year <=2006)
test_cc <- subset(clim_change, Year > 2006)

RegTemp <- lm(Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols,
              data = train_cc)
summary(RegTemp)
## 
## Call:
## lm(formula = Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + 
##     TSI + Aerosols, data = train_cc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25888 -0.05913 -0.00082  0.05649  0.32433 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.246e+02  1.989e+01  -6.265 1.43e-09 ***
## MEI          6.421e-02  6.470e-03   9.923  < 2e-16 ***
## CO2          6.457e-03  2.285e-03   2.826  0.00505 ** 
## CH4          1.240e-04  5.158e-04   0.240  0.81015    
## N2O         -1.653e-02  8.565e-03  -1.930  0.05467 .  
## CFC.11      -6.631e-03  1.626e-03  -4.078 5.96e-05 ***
## CFC.12       3.808e-03  1.014e-03   3.757  0.00021 ***
## TSI          9.314e-02  1.475e-02   6.313 1.10e-09 ***
## Aerosols    -1.538e+00  2.133e-01  -7.210 5.41e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09171 on 275 degrees of freedom
## Multiple R-squared:  0.7509, Adjusted R-squared:  0.7436 
## F-statistic: 103.6 on 8 and 275 DF,  p-value: < 2.2e-16

Understanding the model

Current scientific opinion is that nitrous oxide and CFC-11 are greenhouse gases: gases that are able to trap heat from the sun and contribute to the heating of the Earth. However, the regression coefficients of both the N2O and CFC-11 variables are negative, indicating that increasing atmospheric concentrations of either of these two compounds is associated with lower global temperatures.

Which of the following is the simplest correct explanation for this contradiction? - All variables are correlated with the others :)

Understanding the model 2

Compute the correlations between all the variables in the training set. Which of the following independent variables is N2O highly correlated with (absolute correlation greater than 0.7)? Select all that apply.

cor(train_cc)
##                 Year         Month           MEI         CO2         CH4
## Year      1.00000000 -0.0279419602 -0.0369876842  0.98274939  0.91565945
## Month    -0.02794196  1.0000000000  0.0008846905 -0.10673246  0.01856866
## MEI      -0.03698768  0.0008846905  1.0000000000 -0.04114717 -0.03341930
## CO2       0.98274939 -0.1067324607 -0.0411471651  1.00000000  0.87727963
## CH4       0.91565945  0.0185686624 -0.0334193014  0.87727963  1.00000000
## N2O       0.99384523  0.0136315303 -0.0508197755  0.97671982  0.89983864
## CFC.11    0.56910643 -0.0131112236  0.0690004387  0.51405975  0.77990402
## CFC.12    0.89701166  0.0006751102  0.0082855443  0.85268963  0.96361625
## TSI       0.17030201 -0.0346061935 -0.1544919227  0.17742893  0.24552844
## Aerosols -0.34524670  0.0148895406  0.3402377871 -0.35615480 -0.26780919
## Temp      0.78679714 -0.0998567411  0.1724707512  0.78852921  0.70325502
##                  N2O      CFC.11        CFC.12         TSI    Aerosols
## Year      0.99384523  0.56910643  0.8970116635  0.17030201 -0.34524670
## Month     0.01363153 -0.01311122  0.0006751102 -0.03460619  0.01488954
## MEI      -0.05081978  0.06900044  0.0082855443 -0.15449192  0.34023779
## CO2       0.97671982  0.51405975  0.8526896272  0.17742893 -0.35615480
## CH4       0.89983864  0.77990402  0.9636162478  0.24552844 -0.26780919
## N2O       1.00000000  0.52247732  0.8679307757  0.19975668 -0.33705457
## CFC.11    0.52247732  1.00000000  0.8689851828  0.27204596 -0.04392120
## CFC.12    0.86793078  0.86898518  1.0000000000  0.25530281 -0.22513124
## TSI       0.19975668  0.27204596  0.2553028138  1.00000000  0.05211651
## Aerosols -0.33705457 -0.04392120 -0.2251312440  0.05211651  1.00000000
## Temp      0.77863893  0.40771029  0.6875575483  0.24338269 -0.38491375
##                 Temp
## Year      0.78679714
## Month    -0.09985674
## MEI       0.17247075
## CO2       0.78852921
## CH4       0.70325502
## N2O       0.77863893
## CFC.11    0.40771029
## CFC.12    0.68755755
## TSI       0.24338269
## Aerosols -0.38491375
## Temp      1.00000000

Simplifying the model

Given that the correlations are so high, let us focus on the N2O variable and build a model with only MEI, TSI, Aerosols and N2O as independent variables. Remember to use the training set to build the model.

Enter the coefficient of N2O in this reduced model: unanswered

(How does this compare to the coefficient in the previous model with all of the variables?)

Enter the model R2:

RegN2O <- lm(Temp ~ MEI + TSI + Aerosols + N2O, data = train_cc)
summary(RegN2O)
## 
## Call:
## lm(formula = Temp ~ MEI + TSI + Aerosols + N2O, data = train_cc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27916 -0.05975 -0.00595  0.05672  0.34195 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.162e+02  2.022e+01  -5.747 2.37e-08 ***
## MEI          6.419e-02  6.652e-03   9.649  < 2e-16 ***
## TSI          7.949e-02  1.487e-02   5.344 1.89e-07 ***
## Aerosols    -1.702e+00  2.180e-01  -7.806 1.19e-13 ***
## N2O          2.532e-02  1.311e-03  19.307  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09547 on 279 degrees of freedom
## Multiple R-squared:  0.7261, Adjusted R-squared:  0.7222 
## F-statistic: 184.9 on 4 and 279 DF,  p-value: < 2.2e-16

Automatically building the model

We have many variables in this problem, and as we have seen above, dropping some from the model does not decrease model quality. R provides a function, step, that will automate the procedure of trying different combinations of variables to find a good compromise of model simplicity and R2. This trade-off is formalized by the Akaike information criterion (AIC) - it can be informally thought of as the quality of the model with a penalty for the number of variables in the model.

The step function has one argument - the name of the initial model. It returns a simplified model. Use the step function in R to derive a new model, with the full model as the initial model (HINT: If your initial full model was called “climateLM”, you could create a new model with the step function by typing step(climateLM). Be sure to save your new model to a variable name so that you can look at the summary. For more information about the step function, type ?step in your R console.)

Enter the R2 value of the model produced by the step function:

Reg_wStep <- step(RegTemp)
## Start:  AIC=-1348.16
## Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols
## 
##            Df Sum of Sq    RSS     AIC
## - CH4       1   0.00049 2.3135 -1350.1
## <none>                  2.3130 -1348.2
## - N2O       1   0.03132 2.3443 -1346.3
## - CO2       1   0.06719 2.3802 -1342.0
## - CFC.12    1   0.11874 2.4318 -1335.9
## - CFC.11    1   0.13986 2.4529 -1333.5
## - TSI       1   0.33516 2.6482 -1311.7
## - Aerosols  1   0.43727 2.7503 -1301.0
## - MEI       1   0.82823 3.1412 -1263.2
## 
## Step:  AIC=-1350.1
## Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI + Aerosols
## 
##            Df Sum of Sq    RSS     AIC
## <none>                  2.3135 -1350.1
## - N2O       1   0.03133 2.3448 -1348.3
## - CO2       1   0.06672 2.3802 -1344.0
## - CFC.12    1   0.13023 2.4437 -1336.5
## - CFC.11    1   0.13938 2.4529 -1335.5
## - TSI       1   0.33500 2.6485 -1313.7
## - Aerosols  1   0.43987 2.7534 -1302.7
## - MEI       1   0.83118 3.1447 -1264.9
summary(Reg_wStep)
## 
## Call:
## lm(formula = Temp ~ MEI + CO2 + N2O + CFC.11 + CFC.12 + TSI + 
##     Aerosols, data = train_cc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25770 -0.05994 -0.00104  0.05588  0.32203 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.245e+02  1.985e+01  -6.273 1.37e-09 ***
## MEI          6.407e-02  6.434e-03   9.958  < 2e-16 ***
## CO2          6.402e-03  2.269e-03   2.821 0.005129 ** 
## N2O         -1.602e-02  8.287e-03  -1.933 0.054234 .  
## CFC.11      -6.609e-03  1.621e-03  -4.078 5.95e-05 ***
## CFC.12       3.868e-03  9.812e-04   3.942 0.000103 ***
## TSI          9.312e-02  1.473e-02   6.322 1.04e-09 ***
## Aerosols    -1.540e+00  2.126e-01  -7.244 4.36e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09155 on 276 degrees of freedom
## Multiple R-squared:  0.7508, Adjusted R-squared:  0.7445 
## F-statistic: 118.8 on 7 and 276 DF,  p-value: < 2.2e-16

Testing new data

We have developed an understanding of how well we can fit a linear regression to the training data, but does the model quality hold when applied to unseen data?

Using the model produced from the step function, calculate temperature predictions for the testing data set, using the predict function.

Enter the testing set R2:

Pred_new <- predict(Reg_wStep, newdata = test_cc)
SSE <- sum((test_cc$Temp - Pred_new)^2)
SST <- sum((test_cc$Temp - mean(train_cc$Temp))^2)
R_sq <- 1 - SSE/SST
R_sq
## [1] 0.6286051

Part 2: Reading test scores

The Programme for International Student Assessment (PISA) is a test given every three years to 15-year-old students from around the world to evaluate their performance in mathematics, reading, and science. This test provides a quantitative way to compare the performance of students from different parts of the world. In this homework assignment, we will predict the reading scores of students from the United States of America on the 2009 PISA exam.

The datasets pisa2009train.csv and pisa2009test.csv contain information about the demographics and schools for American students taking the exam, derived from 2009 PISA Public-Use Data Files distributed by the United States National Center for Education Statistics (NCES). While the datasets are not supposed to contain identifying information about students taking the test, by using the data you are bound by the NCES data use agreement, which prohibits any attempt to determine the identity of any student in the datasets.

Each row in the datasets pisa2009train.csv and pisa2009test.csv represents one student taking the exam. The datasets have the following variables:

grade: The grade in school of the student (most 15-year-olds in America are in 10th grade)

male: Whether the student is male (1/0)

raceeth: The race/ethnicity composite of the student

preschool: Whether the student attended preschool (1/0)

expectBachelors: Whether the student expects to obtain a bachelor’s degree (1/0)

motherHS: Whether the student’s mother completed high school (1/0)

motherBachelors: Whether the student’s mother obtained a bachelor’s degree (1/0)

motherWork: Whether the student’s mother has part-time or full-time work (1/0)

fatherHS: Whether the student’s father completed high school (1/0)

fatherBachelors: Whether the student’s father obtained a bachelor’s degree (1/0)

fatherWork: Whether the student’s father has part-time or full-time work (1/0)

selfBornUS: Whether the student was born in the United States of America (1/0)

motherBornUS: Whether the student’s mother was born in the United States of America (1/0)

fatherBornUS: Whether the student’s father was born in the United States of America (1/0)

englishAtHome: Whether the student speaks English at home (1/0)

computerForSchoolwork: Whether the student has access to a computer for schoolwork (1/0)

read30MinsADay: Whether the student reads for pleasure for 30 minutes/day (1/0)

minutesPerWeekEnglish: The number of minutes per week the student spend in English class

studentsInEnglish: The number of students in this student’s English class at school

schoolHasLibrary: Whether this student’s school has a library (1/0)

publicSchool: Whether this student attends a public school (1/0)

urban: Whether this student’s school is in an urban area (1/0)

schoolSize: The number of students in this student’s school

readingScore: The student’s reading score, on a 1000-point scale

Loading dataset and size

Load the training and testing sets using the read.csv() function, and save them as variables with the names pisaTrain and pisaTest.

How many students are there in the training set?

pisaTrain <- read.csv("pisa2009train.csv")
pisaTest <- read.csv("pisa2009test.csv")
nrow(pisaTrain)
## [1] 3663

Summarizing dataset

Using tapply() on pisaTrain, what is the average reading test score of males? Check this link for tapply: http://makemeanalyst.com/r-programming/loop-functions/tapply/ 1 = male 0 = female

str(pisaTrain)
## 'data.frame':    3663 obs. of  24 variables:
##  $ grade                : int  11 11 9 10 10 10 10 10 9 10 ...
##  $ male                 : int  1 1 1 0 1 1 0 0 0 1 ...
##  $ raceeth              : chr  NA "White" "White" "Black" ...
##  $ preschool            : int  NA 0 1 1 1 1 0 1 1 1 ...
##  $ expectBachelors      : int  0 0 1 1 0 1 1 1 0 1 ...
##  $ motherHS             : int  NA 1 1 0 1 NA 1 1 1 1 ...
##  $ motherBachelors      : int  NA 1 1 0 0 NA 0 0 NA 1 ...
##  $ motherWork           : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ fatherHS             : int  NA 1 1 1 1 1 NA 1 0 0 ...
##  $ fatherBachelors      : int  NA 0 NA 0 0 0 NA 0 NA 0 ...
##  $ fatherWork           : int  1 1 1 1 0 1 NA 1 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 1 0 1 1 1 ...
##  $ motherBornUS         : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ fatherBornUS         : int  0 1 1 1 0 1 NA 1 1 1 ...
##  $ englishAtHome        : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ read30MinsADay       : int  0 1 0 1 1 0 0 1 0 0 ...
##  $ minutesPerWeekEnglish: int  225 450 250 200 250 300 250 300 378 294 ...
##  $ studentsInEnglish    : int  NA 25 28 23 35 20 28 30 20 24 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 0 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ urban                : int  1 0 0 1 1 0 1 0 1 0 ...
##  $ schoolSize           : int  673 1173 1233 2640 1095 227 2080 1913 502 899 ...
##  $ readingScore         : num  476 575 555 458 614 ...
tapply(pisaTrain$readingScore, pisaTrain$male, FUN = mean)
##        0        1 
## 512.9406 483.5325
summary(pisaTrain)
##      grade            male          raceeth            preschool     
##  Min.   : 8.00   Min.   :0.0000   Length:3663        Min.   :0.0000  
##  1st Qu.:10.00   1st Qu.:0.0000   Class :character   1st Qu.:0.0000  
##  Median :10.00   Median :1.0000   Mode  :character   Median :1.0000  
##  Mean   :10.09   Mean   :0.5111                      Mean   :0.7228  
##  3rd Qu.:10.00   3rd Qu.:1.0000                      3rd Qu.:1.0000  
##  Max.   :12.00   Max.   :1.0000                      Max.   :1.0000  
##                                                      NA's   :56      
##  expectBachelors     motherHS    motherBachelors    motherWork    
##  Min.   :0.0000   Min.   :0.00   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:1.00   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :1.00   Median :0.0000   Median :1.0000  
##  Mean   :0.7859   Mean   :0.88   Mean   :0.3481   Mean   :0.7345  
##  3rd Qu.:1.0000   3rd Qu.:1.00   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.00   Max.   :1.0000   Max.   :1.0000  
##  NA's   :62       NA's   :97     NA's   :397      NA's   :93      
##     fatherHS      fatherBachelors    fatherWork       selfBornUS    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :1.0000  
##  Mean   :0.8593   Mean   :0.3319   Mean   :0.8531   Mean   :0.9313  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :245      NA's   :569      NA's   :233      NA's   :69      
##   motherBornUS     fatherBornUS    englishAtHome    computerForSchoolwork
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000       
##  1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:1.0000       
##  Median :1.0000   Median :1.0000   Median :1.0000   Median :1.0000       
##  Mean   :0.7725   Mean   :0.7668   Mean   :0.8717   Mean   :0.8994       
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000       
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000       
##  NA's   :71       NA's   :113      NA's   :71       NA's   :65           
##  read30MinsADay   minutesPerWeekEnglish studentsInEnglish schoolHasLibrary
##  Min.   :0.0000   Min.   :   0.0        Min.   : 1.0      Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.: 225.0        1st Qu.:20.0      1st Qu.:1.0000  
##  Median :0.0000   Median : 250.0        Median :25.0      Median :1.0000  
##  Mean   :0.2899   Mean   : 266.2        Mean   :24.5      Mean   :0.9676  
##  3rd Qu.:1.0000   3rd Qu.: 300.0        3rd Qu.:30.0      3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :2400.0        Max.   :75.0      Max.   :1.0000  
##  NA's   :34       NA's   :186           NA's   :249       NA's   :143     
##   publicSchool        urban          schoolSize    readingScore  
##  Min.   :0.0000   Min.   :0.0000   Min.   : 100   Min.   :168.6  
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.: 712   1st Qu.:431.7  
##  Median :1.0000   Median :0.0000   Median :1212   Median :499.7  
##  Mean   :0.9339   Mean   :0.3849   Mean   :1369   Mean   :497.9  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1900   3rd Qu.:566.2  
##  Max.   :1.0000   Max.   :1.0000   Max.   :6694   Max.   :746.0  
##                                    NA's   :162

Removing missing Values with na.omit

Linear regression discards observations with missing data, so we will remove all such observations from the training and testing sets. Later in the course, we will learn about imputation, which deals with missing data by filling in missing values with plausible information.

Type the following commands into your R console to remove observations with any missing value from pisaTrain and pisaTest:

pisaTrain = na.omit(pisaTrain)

pisaTest = na.omit(pisaTest)

How many observations are now in the training set?

pisaTrain <- na.omit(pisaTrain)
pisaTest <- na.omit(pisaTest)

Factor variables

Factor variables are variables that take on a discrete set of values, like the “Region” variable in the WHO dataset from the second lecture of Unit 1. This is an unordered factor because there isn’t any natural ordering between the levels. An ordered factor has a natural ordering between the levels (an example would be the classifications “large,” “medium,” and “small”).

Which of the following variables is an unordered factor with at least 3 levels? (Select all that apply.)

str(pisaTrain)
## 'data.frame':    2414 obs. of  24 variables:
##  $ grade                : int  11 10 10 10 10 10 10 10 11 9 ...
##  $ male                 : int  1 0 1 0 1 0 0 0 1 1 ...
##  $ raceeth              : chr  "White" "Black" "Hispanic" "White" ...
##  $ preschool            : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ expectBachelors      : int  0 1 0 1 1 1 1 0 1 1 ...
##  $ motherHS             : int  1 0 1 1 1 1 1 0 1 1 ...
##  $ motherBachelors      : int  1 0 0 0 1 0 0 0 0 1 ...
##  $ motherWork           : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ fatherHS             : int  1 1 1 1 0 1 1 0 1 1 ...
##  $ fatherBachelors      : int  0 0 0 0 0 0 1 0 1 1 ...
##  $ fatherWork           : int  1 1 0 1 1 0 1 1 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ motherBornUS         : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ fatherBornUS         : int  1 1 0 1 1 0 1 0 1 1 ...
##  $ englishAtHome        : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 0 1 1 1 1 ...
##  $ read30MinsADay       : int  1 1 1 1 0 1 1 1 0 0 ...
##  $ minutesPerWeekEnglish: int  450 200 250 300 294 232 225 270 275 225 ...
##  $ studentsInEnglish    : int  25 23 35 30 24 14 20 25 30 15 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ urban                : int  0 1 1 0 0 0 0 1 1 1 ...
##  $ schoolSize           : int  1173 2640 1095 1913 899 1733 149 1400 1988 915 ...
##  $ readingScore         : num  575 458 614 439 466 ...
##  - attr(*, "na.action")= 'omit' Named int [1:1249] 1 3 6 7 9 11 13 21 29 30 ...
##   ..- attr(*, "names")= chr [1:1249] "1" "3" "6" "7" ...
pisaTrain$raceeth <- as.factor(pisaTrain$raceeth)

class(pisaTrain$raceeth)
## [1] "factor"
str(pisaTrain)
## 'data.frame':    2414 obs. of  24 variables:
##  $ grade                : int  11 10 10 10 10 10 10 10 11 9 ...
##  $ male                 : int  1 0 1 0 1 0 0 0 1 1 ...
##  $ raceeth              : Factor w/ 7 levels "American Indian/Alaska Native",..: 7 3 4 7 5 4 7 4 7 7 ...
##  $ preschool            : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ expectBachelors      : int  0 1 0 1 1 1 1 0 1 1 ...
##  $ motherHS             : int  1 0 1 1 1 1 1 0 1 1 ...
##  $ motherBachelors      : int  1 0 0 0 1 0 0 0 0 1 ...
##  $ motherWork           : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ fatherHS             : int  1 1 1 1 0 1 1 0 1 1 ...
##  $ fatherBachelors      : int  0 0 0 0 0 0 1 0 1 1 ...
##  $ fatherWork           : int  1 1 0 1 1 0 1 1 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ motherBornUS         : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ fatherBornUS         : int  1 1 0 1 1 0 1 0 1 1 ...
##  $ englishAtHome        : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 0 1 1 1 1 ...
##  $ read30MinsADay       : int  1 1 1 1 0 1 1 1 0 0 ...
##  $ minutesPerWeekEnglish: int  450 200 250 300 294 232 225 270 275 225 ...
##  $ studentsInEnglish    : int  25 23 35 30 24 14 20 25 30 15 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ urban                : int  0 1 1 0 0 0 0 1 1 1 ...
##  $ schoolSize           : int  1173 2640 1095 1913 899 1733 149 1400 1988 915 ...
##  $ readingScore         : num  575 458 614 439 466 ...
##  - attr(*, "na.action")= 'omit' Named int [1:1249] 1 3 6 7 9 11 13 21 29 30 ...
##   ..- attr(*, "names")= chr [1:1249] "1" "3" "6" "7" ...
summary(pisaTrain)
##      grade            male                                          raceeth    
##  Min.   : 8.00   Min.   :0.0000   American Indian/Alaska Native         :  20  
##  1st Qu.:10.00   1st Qu.:0.0000   Asian                                 :  95  
##  Median :10.00   Median :1.0000   Black                                 : 228  
##  Mean   :10.13   Mean   :0.5012   Hispanic                              : 500  
##  3rd Qu.:10.00   3rd Qu.:1.0000   More than one race                    :  81  
##  Max.   :12.00   Max.   :1.0000   Native Hawaiian/Other Pacific Islander:  20  
##                                   White                                 :1470  
##    preschool      expectBachelors     motherHS     motherBachelors 
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :1.0000   Median :1.0000   Median :1.000   Median :0.0000  
##  Mean   :0.7274   Mean   :0.8343   Mean   :0.896   Mean   :0.3637  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000   Max.   :1.0000  
##                                                                    
##    motherWork        fatherHS      fatherBachelors    fatherWork    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:1.0000  
##  Median :1.0000   Median :1.0000   Median :0.0000   Median :1.0000  
##  Mean   :0.7357   Mean   :0.8741   Mean   :0.3484   Mean   :0.8571  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                     
##    selfBornUS      motherBornUS   fatherBornUS    englishAtHome   
##  Min.   :0.0000   Min.   :0.00   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:1.00   1st Qu.:1.0000   1st Qu.:1.0000  
##  Median :1.0000   Median :1.00   Median :1.0000   Median :1.0000  
##  Mean   :0.9362   Mean   :0.79   Mean   :0.7854   Mean   :0.8815  
##  3rd Qu.:1.0000   3rd Qu.:1.00   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.00   Max.   :1.0000   Max.   :1.0000  
##                                                                   
##  computerForSchoolwork read30MinsADay   minutesPerWeekEnglish studentsInEnglish
##  Min.   :0.0000        Min.   :0.0000   Min.   :   0.0        Min.   : 1.00    
##  1st Qu.:1.0000        1st Qu.:0.0000   1st Qu.: 225.0        1st Qu.:20.00    
##  Median :1.0000        Median :0.0000   Median : 250.0        Median :25.00    
##  Mean   :0.9155        Mean   :0.3016   Mean   : 269.8        Mean   :24.56    
##  3rd Qu.:1.0000        3rd Qu.:1.0000   3rd Qu.: 300.0        3rd Qu.:30.00    
##  Max.   :1.0000        Max.   :1.0000   Max.   :1680.0        Max.   :75.00    
##                                                                                
##  schoolHasLibrary  publicSchool        urban          schoolSize  
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   : 100  
##  1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.: 712  
##  Median :1.0000   Median :1.0000   Median :0.0000   Median :1233  
##  Mean   :0.9714   Mean   :0.9176   Mean   :0.3629   Mean   :1372  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1900  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :6694  
##                                                                   
##   readingScore  
##  Min.   :244.5  
##  1st Qu.:455.8  
##  Median :520.2  
##  Mean   :518.0  
##  3rd Qu.:581.4  
##  Max.   :746.0  
## 

Underodered factors in regression models

To include unordered factors in a linear regression model, we define one level as the “reference level” and add a binary variable for each of the remaining levels. In this way, a factor with n levels is replaced by n-1 binary variables. The reference level is typically selected to be the most frequently occurring level in the dataset.

As an example, consider the unordered factor variable “color”, with levels “red”, “green”, and “blue”. If “green” were the reference level, then we would add binary variables “colorred” and “colorblue” to a linear regression problem. All red examples would have colorred=1 and colorblue=0. All blue examples would have colorred=0 and colorblue=1. All green examples would have colorred=0 and colorblue=0.

Now, consider the variable “raceeth” in our problem, which has levels “American Indian/Alaska Native”, “Asian”, “Black”, “Hispanic”, “More than one race”, “Native Hawaiian/Other Pacific Islander”, and “White”. Because it is the most common in our population, we will select White as the reference level.

Which binary variables will be included in the regression model? (Select all that apply.)

Building a model

Because the race variable takes on text values, it was loaded as a factor variable when we read in the dataset with read.csv() – you can see this when you run str(pisaTrain) or str(pisaTest). However, by default R selects the first level alphabetically (“American Indian/Alaska Native”) as the reference level of our factor instead of the most common level (“White”). Set the reference level of the factor by typing the following two lines in your R console:

pisaTrain\(raceeth = relevel(pisaTrain\)raceeth, “White”)

pisaTest\(raceeth = relevel(pisaTest\)raceeth, “White”)

Now, build a linear regression model (call it lmScore) using the training set to predict readingScore using all the remaining variables.

It would be time-consuming to type all the variables, but R provides the shorthand notation “readingScore ~ .” to mean “predict readingScore using all the other variables in the data frame.” The period is used to replace listing out all of the independent variables. As an example, if your dependent variable is called “Y”, your independent variables are called “X1”, “X2”, and “X3”, and your training data set is called “Train”, instead of the regular notation:

LinReg = lm(Y ~ X1 + X2 + X3, data = Train)

You would use the following command to build your model:

LinReg = lm(Y ~ ., data = Train)

What is the Multiple R-squared value of lmScore on the training set?

# RELEVELLING   

str(pisaTrain)
## 'data.frame':    2414 obs. of  24 variables:
##  $ grade                : int  11 10 10 10 10 10 10 10 11 9 ...
##  $ male                 : int  1 0 1 0 1 0 0 0 1 1 ...
##  $ raceeth              : Factor w/ 7 levels "American Indian/Alaska Native",..: 7 3 4 7 5 4 7 4 7 7 ...
##  $ preschool            : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ expectBachelors      : int  0 1 0 1 1 1 1 0 1 1 ...
##  $ motherHS             : int  1 0 1 1 1 1 1 0 1 1 ...
##  $ motherBachelors      : int  1 0 0 0 1 0 0 0 0 1 ...
##  $ motherWork           : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ fatherHS             : int  1 1 1 1 0 1 1 0 1 1 ...
##  $ fatherBachelors      : int  0 0 0 0 0 0 1 0 1 1 ...
##  $ fatherWork           : int  1 1 0 1 1 0 1 1 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ motherBornUS         : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ fatherBornUS         : int  1 1 0 1 1 0 1 0 1 1 ...
##  $ englishAtHome        : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 0 1 1 1 1 ...
##  $ read30MinsADay       : int  1 1 1 1 0 1 1 1 0 0 ...
##  $ minutesPerWeekEnglish: int  450 200 250 300 294 232 225 270 275 225 ...
##  $ studentsInEnglish    : int  25 23 35 30 24 14 20 25 30 15 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ urban                : int  0 1 1 0 0 0 0 1 1 1 ...
##  $ schoolSize           : int  1173 2640 1095 1913 899 1733 149 1400 1988 915 ...
##  $ readingScore         : num  575 458 614 439 466 ...
##  - attr(*, "na.action")= 'omit' Named int [1:1249] 1 3 6 7 9 11 13 21 29 30 ...
##   ..- attr(*, "names")= chr [1:1249] "1" "3" "6" "7" ...
str(pisaTest)
## 'data.frame':    990 obs. of  24 variables:
##  $ grade                : int  10 10 10 10 11 10 10 10 10 10 ...
##  $ male                 : int  0 0 0 0 0 1 0 1 1 0 ...
##  $ raceeth              : chr  "White" "White" "American Indian/Alaska Native" "White" ...
##  $ preschool            : int  1 1 1 1 0 1 0 1 1 1 ...
##  $ expectBachelors      : int  0 1 0 0 0 1 1 0 1 1 ...
##  $ motherHS             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ motherBachelors      : int  1 0 0 0 1 1 0 0 1 0 ...
##  $ motherWork           : int  1 0 0 1 1 1 0 1 1 1 ...
##  $ fatherHS             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ fatherBachelors      : int  0 1 0 0 1 0 0 0 1 1 ...
##  $ fatherWork           : int  0 1 0 1 1 1 1 0 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ motherBornUS         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ fatherBornUS         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ englishAtHome        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ read30MinsADay       : int  0 0 1 1 1 1 0 0 0 1 ...
##  $ minutesPerWeekEnglish: int  240 240 240 270 270 350 350 360 350 360 ...
##  $ studentsInEnglish    : int  30 30 30 35 30 25 27 28 25 27 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ urban                : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ schoolSize           : int  808 808 808 808 808 899 899 899 899 899 ...
##  $ readingScore         : num  355 454 405 665 605 ...
##  - attr(*, "na.action")= 'omit' Named int [1:580] 2 3 4 6 12 16 17 19 22 23 ...
##   ..- attr(*, "names")= chr [1:580] "2" "3" "4" "6" ...
pisaTrain$raceeth <- relevel(pisaTrain$raceeth, "White")
pisaTest$raceeth <- as.factor(pisaTest$raceeth)

pisaTest$raceeth <- relevel(pisaTest$raceeth, "White")

# https://stackoverflow.com/questions/33180058/coerce-multiple-columns-to-factors-at-once
str(pisaTrain)
## 'data.frame':    2414 obs. of  24 variables:
##  $ grade                : int  11 10 10 10 10 10 10 10 11 9 ...
##  $ male                 : int  1 0 1 0 1 0 0 0 1 1 ...
##  $ raceeth              : Factor w/ 7 levels "White","American Indian/Alaska Native",..: 1 4 5 1 6 5 1 5 1 1 ...
##  $ preschool            : int  0 1 1 1 1 1 1 1 1 1 ...
##  $ expectBachelors      : int  0 1 0 1 1 1 1 0 1 1 ...
##  $ motherHS             : int  1 0 1 1 1 1 1 0 1 1 ...
##  $ motherBachelors      : int  1 0 0 0 1 0 0 0 0 1 ...
##  $ motherWork           : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ fatherHS             : int  1 1 1 1 0 1 1 0 1 1 ...
##  $ fatherBachelors      : int  0 0 0 0 0 0 1 0 1 1 ...
##  $ fatherWork           : int  1 1 0 1 1 0 1 1 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ motherBornUS         : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ fatherBornUS         : int  1 1 0 1 1 0 1 0 1 1 ...
##  $ englishAtHome        : int  1 1 1 1 1 0 1 0 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 0 1 1 1 1 ...
##  $ read30MinsADay       : int  1 1 1 1 0 1 1 1 0 0 ...
##  $ minutesPerWeekEnglish: int  450 200 250 300 294 232 225 270 275 225 ...
##  $ studentsInEnglish    : int  25 23 35 30 24 14 20 25 30 15 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ urban                : int  0 1 1 0 0 0 0 1 1 1 ...
##  $ schoolSize           : int  1173 2640 1095 1913 899 1733 149 1400 1988 915 ...
##  $ readingScore         : num  575 458 614 439 466 ...
##  - attr(*, "na.action")= 'omit' Named int [1:1249] 1 3 6 7 9 11 13 21 29 30 ...
##   ..- attr(*, "names")= chr [1:1249] "1" "3" "6" "7" ...
str(pisaTest)
## 'data.frame':    990 obs. of  24 variables:
##  $ grade                : int  10 10 10 10 11 10 10 10 10 10 ...
##  $ male                 : int  0 0 0 0 0 1 0 1 1 0 ...
##  $ raceeth              : Factor w/ 7 levels "White","American Indian/Alaska Native",..: 1 1 2 1 1 5 1 5 1 5 ...
##  $ preschool            : int  1 1 1 1 0 1 0 1 1 1 ...
##  $ expectBachelors      : int  0 1 0 0 0 1 1 0 1 1 ...
##  $ motherHS             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ motherBachelors      : int  1 0 0 0 1 1 0 0 1 0 ...
##  $ motherWork           : int  1 0 0 1 1 1 0 1 1 1 ...
##  $ fatherHS             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ fatherBachelors      : int  0 1 0 0 1 0 0 0 1 1 ...
##  $ fatherWork           : int  0 1 0 1 1 1 1 0 1 1 ...
##  $ selfBornUS           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ motherBornUS         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ fatherBornUS         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ englishAtHome        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ computerForSchoolwork: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ read30MinsADay       : int  0 0 1 1 1 1 0 0 0 1 ...
##  $ minutesPerWeekEnglish: int  240 240 240 270 270 350 350 360 350 360 ...
##  $ studentsInEnglish    : int  30 30 30 35 30 25 27 28 25 27 ...
##  $ schoolHasLibrary     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ publicSchool         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ urban                : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ schoolSize           : int  808 808 808 808 808 899 899 899 899 899 ...
##  $ readingScore         : num  355 454 405 665 605 ...
##  - attr(*, "na.action")= 'omit' Named int [1:580] 2 3 4 6 12 16 17 19 22 23 ...
##   ..- attr(*, "names")= chr [1:580] "2" "3" "4" "6" ...
lmScore <- lm(readingScore ~ ., data = pisaTrain)

summary(lmScore)
## 
## Call:
## lm(formula = readingScore ~ ., data = pisaTrain)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -247.44  -48.86    1.86   49.77  217.18 
## 
## Coefficients:
##                                                 Estimate Std. Error t value
## (Intercept)                                   143.766333  33.841226   4.248
## grade                                          29.542707   2.937399  10.057
## male                                          -14.521653   3.155926  -4.601
## raceethAmerican Indian/Alaska Native          -67.277327  16.786935  -4.008
## raceethAsian                                   -4.110325   9.220071  -0.446
## raceethBlack                                  -67.012347   5.460883 -12.271
## raceethHispanic                               -38.975486   5.177743  -7.528
## raceethMore than one race                     -16.922522   8.496268  -1.992
## raceethNative Hawaiian/Other Pacific Islander  -5.101601  17.005696  -0.300
## preschool                                      -4.463670   3.486055  -1.280
## expectBachelors                                55.267080   4.293893  12.871
## motherHS                                        6.058774   6.091423   0.995
## motherBachelors                                12.638068   3.861457   3.273
## motherWork                                     -2.809101   3.521827  -0.798
## fatherHS                                        4.018214   5.579269   0.720
## fatherBachelors                                16.929755   3.995253   4.237
## fatherWork                                      5.842798   4.395978   1.329
## selfBornUS                                     -3.806278   7.323718  -0.520
## motherBornUS                                   -8.798153   6.587621  -1.336
## fatherBornUS                                    4.306994   6.263875   0.688
## englishAtHome                                   8.035685   6.859492   1.171
## computerForSchoolwork                          22.500232   5.702562   3.946
## read30MinsADay                                 34.871924   3.408447  10.231
## minutesPerWeekEnglish                           0.012788   0.010712   1.194
## studentsInEnglish                              -0.286631   0.227819  -1.258
## schoolHasLibrary                               12.215085   9.264884   1.318
## publicSchool                                  -16.857475   6.725614  -2.506
## urban                                          -0.110132   3.962724  -0.028
## schoolSize                                      0.006540   0.002197   2.977
##                                               Pr(>|t|)    
## (Intercept)                                   2.24e-05 ***
## grade                                          < 2e-16 ***
## male                                          4.42e-06 ***
## raceethAmerican Indian/Alaska Native          6.32e-05 ***
## raceethAsian                                   0.65578    
## raceethBlack                                   < 2e-16 ***
## raceethHispanic                               7.29e-14 ***
## raceethMore than one race                      0.04651 *  
## raceethNative Hawaiian/Other Pacific Islander  0.76421    
## preschool                                      0.20052    
## expectBachelors                                < 2e-16 ***
## motherHS                                       0.32001    
## motherBachelors                                0.00108 ** 
## motherWork                                     0.42517    
## fatherHS                                       0.47147    
## fatherBachelors                               2.35e-05 ***
## fatherWork                                     0.18393    
## selfBornUS                                     0.60331    
## motherBornUS                                   0.18182    
## fatherBornUS                                   0.49178    
## englishAtHome                                  0.24153    
## computerForSchoolwork                         8.19e-05 ***
## read30MinsADay                                 < 2e-16 ***
## minutesPerWeekEnglish                          0.23264    
## studentsInEnglish                              0.20846    
## schoolHasLibrary                               0.18749    
## publicSchool                                   0.01226 *  
## urban                                          0.97783    
## schoolSize                                     0.00294 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 73.81 on 2385 degrees of freedom
## Multiple R-squared:  0.3251, Adjusted R-squared:  0.3172 
## F-statistic: 41.04 on 28 and 2385 DF,  p-value: < 2.2e-16

RMSE?

What is the training-set root-mean squared error (RMSE) of lmScore?

lmScore$residuals
##             2             4             5             8            10 
##   36.31787211  -37.51605148  177.01187510 -126.77397761  -34.73973862 
##            12            14            15            16            17 
##   10.14267892   75.75542736  -55.89032825 -139.54354721  119.08321504 
##            18            19            20            22            23 
##  -66.52046990    8.63531209   -7.52648922   23.72319105    0.88260460 
##            24            25            26            27            28 
##  -45.62525483  -89.85661256   83.02843828  -12.84770452   84.60587680 
##            32            34            35            37            38 
##   65.36940977   -6.22218538   48.70770989   18.48115249   -3.31047121 
##            39            41            43            45            46 
##   21.54795278  -55.07285292   69.24180067  -79.50116079    7.75050492 
##            47            48            50            51            52 
##   34.51041974   55.29728852   44.12451045  -13.36162980  -93.33446606 
##            55            56            60            63            64 
##   96.87617639   58.06520171   66.43937544  -70.43294270  -53.36549600 
##            65            67            69            72            73 
##   -3.48872921   12.76479675 -159.90209303  -35.72666271   31.57567105 
##            74            75            76            78            81 
##   54.14567243  104.09705410    8.28203027  161.94638490  -38.50267426 
##            84            86            87            88            91 
##   -8.66301052  -22.54719902  112.47110662   16.84323140   68.90706521 
##            92            93            94            96            98 
##    5.13611687   52.64118258  -10.06174103  -49.15935243  -32.52087660 
##           100           101           105           106           108 
##   83.66733324 -146.53431499 -241.48934454    4.91665336   56.46303695 
##           109           113           115           116           117 
##   40.58349883  -37.24847895   17.58375527   19.31208030 -113.94238772 
##           118           120           121           123           125 
##   -8.88604135   25.97646262 -172.40067398   98.76252815   -5.46353639 
##           126           128           130           131           132 
##   84.09914317 -210.82840924 -105.66286358  132.71596754  -15.84528431 
##           133           134           136           137           140 
##   -0.23922325  114.98960886 -178.47462583   52.65959796  -64.38743490 
##           141           143           145           147           151 
##   70.78621159  167.43499719  -91.33335425   -5.91170689   48.89601250 
##           152           153           154           157           158 
##  -17.11816783  -18.96561327  -46.16953240   34.15578771   54.83678483 
##           162           164           165           166           167 
##  103.09449182  -29.81480533   10.25581508   -6.93385782  -39.38314279 
##           168           169           170           171           173 
##  -60.23855093   23.26897510  128.39675660  -51.56540430   96.80204021 
##           174           176           177           179           180 
##  -15.18885989  110.59582985  -44.65614946   36.74500074   -2.66591281 
##           183           185           186           187           189 
##  -63.61779935   53.50476370   57.85049005   59.37974111 -105.53628053 
##           191           192           196           197           198 
##   99.41543189   21.71580603  -36.73259710  146.44040987   48.15744117 
##           199           200           201           203           205 
##   71.77257697   63.99002031   20.44220312  -41.50839875    5.53275280 
##           206           207           208           209           210 
##  -16.40252659   46.58617286   49.72800711   72.61397207 -115.84887642 
##           211           212           213           214           215 
##   12.08885624  -70.55087514 -117.48568915  -84.36491766 -102.05302174 
##           216           217           218           219           221 
##  104.04202915   61.13522649  -40.07783292  -94.75282613   22.93543839 
##           222           223           224           225           226 
##   20.82751419  -78.89645624  -11.91477756   27.21161063   -5.22077464 
##           228           229           230           232           234 
##    3.88042725   -5.97244869   -7.89630439  131.80517545   -0.15651413 
##           238           239           240           241           242 
##   -2.15171386   64.57694025   48.69876527 -161.94749820   -2.69858880 
##           243           246           247           248           251 
##  104.85689846  117.51589027   44.83397030   32.28799238   10.86060865 
##           252           253           254           257           258 
##  -46.07642644    4.62171923   68.45015961   24.15220085 -206.54553019 
##           259           260           262           265           266 
##    5.16563130   -7.03015107   -4.86134293  -43.62964637  -56.22659202 
##           268           269           270           271           273 
## -185.00046642  181.40886306 -169.98059658  -12.52240177   17.00012757 
##           275           276           277           278           280 
##  -58.30516981  -20.60534708  -56.08922505  -72.97686849   77.86558472 
##           281           282           283           284           285 
##  -30.83528337  -99.16004608   40.78722846   -8.11743705  -33.51096294 
##           286           288           289           291           292 
##  -20.11294587   51.04820974  198.11285741  -64.60004095  121.23464326 
##           293           294           295           296           297 
##   91.17824732  -30.04303693  -88.47977316 -126.99286783   41.47434080 
##           298           299           301           302           303 
##   31.90595983  119.30487969   30.85423324  151.35591415   14.32081854 
##           305           308           312           313           315 
## -123.10230609  -67.58786172  -33.95346598    7.27829704   12.63180748 
##           319           321           322           325           327 
##  -60.99938804   75.00158344  -49.22303388   25.48693868  -63.19726081 
##           328           329           330           333           335 
##   41.55264981   11.51201901 -158.31810830   10.72047890  -75.63750718 
##           337           339           340           341           342 
##  -66.70511899  115.74211886  -27.45700658   95.06555909   80.74264358 
##           343           344           345           347           348 
##  -23.92279318  -82.41413942    2.55723957  -10.13887695  -21.23597431 
##           349           350           351           353           354 
##   15.43692523  -55.61705939  -50.82242797  -21.01844889  -16.96045847 
##           355           356           358           363           366 
##  151.58619932   16.31255619  130.85575629  -32.52943092  -30.25800763 
##           367           368           369           370           375 
##  -33.39223977  -27.22553462  -72.06448412    5.85837006  133.65672848 
##           376           379           383           386           387 
##  -13.42550032   30.24790027   36.60859827   18.58508451  -86.85900574 
##           389           390           392           393           395 
##   52.58823229   39.16618221  124.62458310   75.92767981   31.83691965 
##           396           397           398           399           400 
##   88.47701063 -113.38358367   67.09618588  -27.14416007 -113.51944602 
##           403           404           405           407           408 
##  168.58384505   16.85439908   99.29272741  -10.00289021  165.89791685 
##           409           411           412           413           416 
##   46.71335791  -50.99941091  -37.05676255   54.21406591   -3.30544639 
##           418           419           421           423           424 
##   41.19076020  -78.32231517    7.14111113  -80.82197293    2.66612334 
##           425           426           427           428           431 
##   21.55935394 -118.41546642   78.46411310   59.74757277   21.57915589 
##           432           433           434           435           436 
## -165.32273716  131.02678718  -39.83912026   20.64022158   -0.13674453 
##           438           439           443           444           445 
##   85.35453697   11.07076419   93.22164419  -30.11185615   43.70636921 
##           446           447           448           449           452 
##  -12.72781591 -126.97543026   26.51014581   23.22077808   61.03806470 
##           453           454           455           457           458 
##  -13.63343340  -70.61356114   29.27218719   19.20161639    3.60367353 
##           459           461           462           463           464 
##  -62.29082338    5.21128297  -39.65706262  -20.90843407   82.17748637 
##           465           466           467           469           470 
##   22.52826968   38.93478890  -93.29352567   23.76664599  -41.31253300 
##           474           475           476           478           479 
##   29.70960038  125.76420536  -63.48949768   34.44634498  -25.40178019 
##           481           482           483           485           486 
##   59.15622035  -16.88642487  118.51626665  -68.11912613   62.44471870 
##           487           488           489           491           493 
##  109.22039792 -172.58851612  -16.73079663   27.37296601  -21.97287197 
##           494           496           498           500           501 
##   21.08406726  -13.71048238   94.33458074   73.81514050  -18.93671564 
##           502           504           506           507           508 
## -152.86085311   31.05221852   47.39558011  -59.60838642  -76.29150805 
##           510           511           512           517           518 
##   17.50178209  -27.43596883  -60.57104012 -166.43372210  -57.24232083 
##           519           521           524           525           526 
## -116.49500057   24.92419042   10.06699354   40.53386084  -19.68627186 
##           530           531           532           534           536 
##   67.56799094  152.82850981    0.89489809  -94.11119133   36.89681625 
##           537           540           541           543           544 
##  -35.98195163 -115.60724146    0.20210622   23.35474138  -48.87174604 
##           545           546           547           548           549 
##   19.96912779   23.96163298   66.49065496   32.56051636   -9.72132600 
##           550           552           554           555           556 
## -129.10609851   -1.39372632   58.47489771   14.40994318   99.67624266 
##           557           558           559           561           562 
##  149.97717920  -45.86713150  142.60286673   58.52247155  -80.01646666 
##           563           564           565           566           567 
##   17.52715072  -27.20846079  -26.46207047   14.60388632   48.33068167 
##           569           571           572           574           575 
##  -76.20562093  -29.69490773 -119.90601619    0.10457971   50.11671945 
##           576           579           580           582           587 
##  -44.78434284   20.66737591   45.47362107    3.95806635   37.07351642 
##           589           590           592           593           594 
##   -5.40396185 -119.44990748  -99.81781711  -75.36791811  -42.76076949 
##           597           599           600           601           602 
##   51.53458985   47.42856062  -50.60243893  105.66662253  -15.18253926 
##           603           604           605           607           608 
##   -3.78389326   -8.52299669 -131.31217222  -17.31328815  -51.21514185 
##           612           615           616           617           621 
##  -92.51079042 -125.07549425  -50.59382922   51.27872325  -54.12715803 
##           622           623           625           626           627 
##  -57.78279472   86.93285080 -105.20826215   62.79579233  -31.43112759 
##           628           629           633           634           635 
##  120.15112497    5.14200131  -34.39397345   14.47960676   37.18634559 
##           637           638           640           644           645 
##  -82.74569906 -160.46927997   -6.94487755  -11.12093705  -42.68022834 
##           646           647           648           650           651 
##   74.57449983  -21.92423887  -28.29176088   36.14612045  -39.32356863 
##           652           655           656           658           659 
##   58.22965251  -53.86099072  -63.16834601    6.32355279  -56.07909516 
##           661           662           664           669           671 
##  102.55009028  117.23300618   75.85492351 -110.45723532   -1.27601566 
##           672           673           674           675           676 
##  -88.80476583  -80.39648897    4.90177901  -52.85484241   -0.59186978 
##           677           678           679           683           685 
##  -25.49219547 -148.91490797  104.53269604  133.23702825  101.25960447 
##           686           687           688           690           691 
##   46.50303234   23.34009009  125.28066905  -19.53479975  -52.79990571 
##           692           693           696           697           698 
##   40.69600837  -10.96791718   44.42569685   97.91429948   70.82130403 
##           701           702           703           705           706 
##   57.07133501  -17.67384468  -29.90513927   11.62398055  -96.31738843 
##           707           708           709           710           711 
##  -21.83819073  -49.11758685  -22.81172496  106.94528681  -28.99383391 
##           713           716           718           721           722 
## -186.62795544 -130.70421997  -46.65671538   25.17226119 -167.08604468 
##           724           725           726           727           728 
##   10.11726317  113.64416257  -78.59313096   66.29640856   -6.63437783 
##           730           731           732           734           735 
##  -38.83736710  -26.78838876  -38.25261691  -75.77579694   85.73088367 
##           736           738           739           740           741 
##    8.87409239   43.49269767   99.70375594  -16.18747933   70.97910121 
##           742           744           745           747           749 
##  114.75210892  -31.88860726   44.71518490  -34.12028923   11.88172385 
##           750           751           752           753           755 
##  -19.37215506    2.66968510  107.10241787    1.14647415    7.71757359 
##           756           757           758           759           760 
## -107.87754041 -113.80326067   41.96318811   81.44587001   34.78988622 
##           761           763           764           765           767 
##   95.07428644   71.34019304   19.55796351  172.70343581   84.38032356 
##           768           769           770           771           772 
##  -29.68187219  -31.09271886  -84.87625817  -14.83763318  191.09120883 
##           773           774           775           778           779 
##   42.86620493  130.66903539   -4.64114889  -61.92353900  -64.26665823 
##           785           786           788           789           790 
##  -42.46655112  -38.23384154   -9.90907315   38.16719009    1.63353211 
##           792           794           797           798           799 
##  -29.77057415  -70.72310074  -13.38060038   34.72569012  -85.14674773 
##           801           802           803           805           806 
##    0.84946547   15.58802747  -56.75740580   44.37281283   14.84856397 
##           807           808           809           811           813 
## -121.59068851   67.31197287  -19.71728091  -61.31131349   75.66540588 
##           815           817           818           819           820 
##  -11.89230389   -4.25822254    0.36848090   47.34313247    1.70520499 
##           823           824           825           826           828 
##    8.56982203  -19.35189986  -22.82712861   26.70595759   67.29147063 
##           830           831           835           836           839 
##  -16.10950145  -38.20539380   31.79280022   61.90552792 -123.88649728 
##           841           842           843           844           845 
##  -29.29067902   44.74326926   21.29859506  -31.37777299  114.59894979 
##           846           847           849           850           853 
##   19.20043539  -79.00483446  -16.88553984   47.69966214    0.18686818 
##           855           856           858           859           862 
##  185.31782893  -33.64718902   -9.08592538 -121.71426653  124.98611438 
##           863           864           866           867           868 
##   98.88806891  -47.89726190   19.25044546   24.51994824  -57.74773858 
##           869           871           872           874           875 
##  -37.26549894  -78.69933414   40.50752779  -54.14861478   55.38529315 
##           878           880           885           886           887 
##  -32.55468472    4.34633891  -47.28644846   -1.14981348   -2.81909509 
##           890           893           894           895           896 
## -105.14539684    2.51278442   15.04865182  -25.04896029  -72.70514304 
##           897           898           899           901           903 
##  -45.19433730   11.67551342  102.62431404 -122.16693233   51.22825666 
##           905           906           907           908           910 
##   46.78908576  117.74715193 -111.14410937   43.92360067  -42.14949758 
##           911           913           914           915           916 
##  105.92045191  -60.40019514  -56.64145487  -80.65832369  116.57507762 
##           917           918           919           921           923 
##  107.40574892  -19.49250832  -77.24476720  -84.55196530 -114.44844857 
##           924           925           927           928           929 
##   37.31654122  -90.40379677   -3.32977808   18.49989390  -52.40497380 
##           930           932           933           935           936 
##   22.09188443  -76.63213185 -123.87313117   30.25154014   40.37117700 
##           937           938           940           941           942 
##   -8.20621163   49.78252616   11.94033951   77.11998124   63.81697925 
##           947           951           952           953           954 
##   91.81283808   18.82096447  -14.78948355  -30.44412259  151.41540930 
##           955           956           957           959           960 
##  176.63210955   -0.57903653   73.14837829   10.59480869 -107.95053786 
##           963           964           965           967           968 
##  -53.88793804  106.33210561 -107.45068872   94.57317905  -47.69581429 
##           970           971           973           974           977 
##    9.73335197  -51.78760372   -4.15899437  -82.93363395  -71.14314629 
##           978           980           981           982           984 
##  -29.91488167   85.53817014  115.84868364   86.35934886    3.86637897 
##           986           987           989           990           992 
##   -5.91903508  -52.11301618   43.16588335  -64.16613041  -23.16281099 
##           994           995           996           997           999 
##  117.29792574  121.71090365   37.91865240  -18.25247388   24.68852348 
##          1003          1004          1006          1007          1008 
##   31.57499580  -28.50986477   23.95068006   26.50422806 -107.89939936 
##          1009          1010          1011          1012          1013 
##   12.34636880   -7.43732173  -18.05376435    2.37547272  -43.07494915 
##          1015          1016          1017          1018          1020 
##   44.03596570   13.14448132  -69.23641812  112.66876954  -66.99120291 
##          1022          1025          1026          1028          1029 
## -113.38540647   13.85314897 -111.66741178   54.14181957   53.00357098 
##          1030          1032          1033          1034          1035 
##    1.83842022   49.90235576   21.17762582   98.14697910   60.29191386 
##          1036          1037          1038          1040          1041 
##  124.06950686   20.68224801   -4.12308307   42.29431765   36.72707468 
##          1042          1043          1045          1046          1047 
##  -73.86911671  146.06745390  114.48883468   82.39698178   51.16416291 
##          1051          1052          1054          1056          1058 
##  -34.94586008  -65.00529779   13.01777662 -181.21402520 -135.78656192 
##          1059          1063          1064          1065          1066 
##   41.53815204  -55.25798261   34.22000277  151.97155827   31.09952735 
##          1067          1068          1069          1070          1071 
##  -77.80692982   25.19685555  139.88584345 -164.08147615   50.29214675 
##          1073          1075          1077          1081          1084 
##   65.39495879  -11.16405154 -172.67640592 -126.26264926   92.61406285 
##          1086          1087          1088          1089          1090 
##   55.97055173 -148.35586401   43.94394924   86.11150823   40.45461308 
##          1091          1092          1093          1094          1095 
##   51.97079305 -105.85689806  -16.58320887  -79.90203005   62.12093493 
##          1096          1097          1098          1099          1100 
##   90.15492733   46.60350711 -141.49223485 -128.48348907  -63.23922835 
##          1102          1104          1105          1106          1107 
##  -76.42065472  -47.25340676   32.04514593  -32.31463938   52.97203964 
##          1108          1109          1110          1112          1114 
##    7.16569118   44.58458017 -128.17470018  -40.51733243   93.34700549 
##          1117          1119          1120          1122          1123 
## -128.69049102    9.99330647  120.62138005  -69.49795415   47.92494386 
##          1124          1125          1126          1127          1128 
## -107.39631501 -143.26177936  -44.39915824   56.37808044   41.12164046 
##          1132          1133          1135          1136          1138 
##  -54.36000789   -4.73615844   58.21803552   -5.66130729   83.68540944 
##          1142          1143          1144          1145          1146 
##  -10.36948970   68.94772348  -48.72358427  -22.08495235  -13.76224241 
##          1147          1150          1151          1153          1154 
## -126.55026113   44.45887345   40.98172507   39.74395062  -67.15116278 
##          1157          1158          1162          1163          1164 
##  -51.88745122   -4.65104672    8.81117096   52.75668495  -34.94982069 
##          1165          1166          1168          1170          1172 
##   72.04597233    2.60677026   14.52420282   51.37233372  138.48800659 
##          1173          1174          1176          1177          1178 
##  -70.48614131 -116.09631493  -28.23841859  105.97254175   18.68647404 
##          1179          1181          1182          1186          1187 
##   -0.80585825   81.62357828  -26.41234832  -96.92442573  -38.46006329 
##          1189          1190          1191          1192          1193 
##  -10.24779125   51.02963163   51.46062527  -35.69583068  -66.53789983 
##          1195          1198          1200          1201          1202 
##   -0.43721727    6.95919741  -38.52130629   -9.67630406  -67.93344213 
##          1204          1205          1210          1211          1213 
##  108.99194947   70.59565735   15.99038938  -28.34137109  -32.98035093 
##          1215          1216          1217          1218          1219 
##  100.58449112    2.16841425   -7.41280064 -103.61114084    7.55855639 
##          1221          1223          1225          1226          1228 
##  -18.07819374   64.63505842  -15.24468966   27.66589412   95.22793294 
##          1230          1231          1232          1233          1235 
##  -53.76798001   74.37561087   31.59032639   11.69625728   -0.55834700 
##          1236          1237          1238          1239          1240 
##  166.94097815   72.85148586   98.99809728   -2.61680323  -93.83001135 
##          1243          1245          1246          1247          1248 
##   30.27913876   82.52404484   50.74161880    7.56386084   21.13523232 
##          1249          1251          1253          1254          1255 
##   22.88563879   28.47765198  -45.91924374   -7.21653157  -29.60743997 
##          1258          1259          1260          1261          1262 
##   59.74421926  101.39802352   58.45216708 -143.19086149  -63.12731017 
##          1263          1265          1267          1268          1269 
##   71.90708644  -65.17521511   18.85758805  -11.49922934  -77.68214530 
##          1270          1271          1272          1273          1274 
## -106.20357374  -76.78856046  -30.00021982   34.74916032  -52.72723518 
##          1277          1278          1279          1280          1282 
##  101.72830681  119.44100230  -60.50979104  -88.89537162   75.76409589 
##          1284          1285          1288          1289          1290 
##   54.94605832   -0.12549430  -70.37639019   14.26688378  -54.57050478 
##          1291          1292          1293          1294          1295 
##  205.06691370   14.87357097  -45.38064118  -57.16468158  -53.95310037 
##          1296          1297          1298          1299          1300 
##  -18.79110076    8.20945067   56.92727021  -52.84757136   35.59369503 
##          1301          1303          1304          1305          1307 
##  -24.37164384  -17.54386281  -58.87299875  146.79753834   65.15084715 
##          1308          1311          1312          1314          1315 
##  -87.63645820   12.44083276   88.67826530   64.83738291    3.16020303 
##          1316          1317          1318          1320          1322 
##  -92.88458719   38.39071164   41.79682795   87.35230846  -66.67920369 
##          1324          1325          1326          1327          1328 
## -119.72419083   97.88463395   59.89983442  -44.81228804   82.28378507 
##          1329          1330          1331          1334          1335 
##   36.62046252    1.56450505   42.49340366  117.86957135  -46.95058946 
##          1336          1337          1338          1339          1340 
##  115.74484162  142.34259310   49.58040452 -119.76940803  -38.14369414 
##          1342          1343          1344          1345          1347 
##  119.30993192  -11.31823522  -41.18398075  -11.22749718  138.95756217 
##          1349          1350          1353          1354          1355 
##  -58.54791189   13.36432676  -54.82758400   -0.74027036  -29.65411553 
##          1356          1357          1358          1359          1361 
##  -10.11199071  -17.97273507   13.19397722  -63.34591130  -73.26128881 
##          1362          1363          1364          1367          1368 
##  -20.25024108   10.08123558   -7.09205242  -86.04523259   95.62911926 
##          1369          1371          1372          1373          1375 
##   50.54737572  217.18038996    9.15634388   -3.62995744  132.92962739 
##          1376          1378          1380          1381          1382 
## -173.60542406   23.04929777   26.27917586  111.87573251  -13.80826466 
##          1383          1385          1386          1388          1391 
##   37.73661607  111.30293910  -24.43373624  110.82121131  -87.14595592 
##          1392          1394          1395          1398          1399 
##   26.80556405    3.16079438  -29.92574651   29.21203365  -47.52961329 
##          1401          1402          1403          1404          1405 
##  -13.95430104   93.82849502   33.11848317 -129.99309933  114.52623586 
##          1406          1408          1409          1411          1415 
##   64.57785919   83.14952991   94.54651481  121.48797672   41.68624340 
##          1416          1417          1419          1423          1424 
##   85.41535485  103.71476283    1.90518148  -76.65263358  -20.17449623 
##          1425          1426          1427          1428          1429 
## -131.08378533  -24.89462548   10.24237265  -28.81146923  114.33638404 
##          1430          1432          1434          1435          1438 
##  -24.11927605   24.68010030 -128.92024250  146.72090434  111.50362812 
##          1439          1442          1443          1444          1445 
##   41.59208684  -44.53906479   86.13938174   74.00323079   13.07834093 
##          1446          1447          1450          1451          1452 
##   68.97110563   81.61049019   13.75374079  125.56973895  -32.00456117 
##          1453          1454          1455          1456          1457 
##  -59.53309933  -15.37374467  -49.85099817   -6.02880537  -24.10006916 
##          1459          1460          1463          1464          1465 
##  -12.66408608  -79.05568465   18.41987438  -31.91297202  -50.53605773 
##          1466          1468          1469          1470          1471 
##   71.72461576  -25.08692507   31.17332570   25.60123808   -7.32420904 
##          1473          1474          1476          1477          1480 
##   64.91500964 -115.65161079   41.43343324   -6.81758213  -64.69969167 
##          1481          1483          1485          1488          1491 
##  107.17586613  109.34161088  -58.08059870  -88.76875076   34.89324042 
##          1492          1493          1494          1495          1497 
##   20.95094673  -63.38081393  -16.94874906   70.48403090   57.27411724 
##          1501          1502          1503          1504          1505 
## -109.70069771    8.89165276  -54.81237606 -123.69819213   36.08082141 
##          1506          1507          1509          1510          1511 
##  -67.35253913  -79.14632421 -159.98711612  -62.59341621  -66.93405137 
##          1512          1513          1515          1519          1521 
## -114.84121953   73.05051164  -15.44498865   -8.00986570  -87.43748859 
##          1522          1524          1525          1526          1527 
##   43.72946324  -33.29804827   40.86055288 -174.29622691  -72.18473081 
##          1529          1530          1531          1532          1533 
##  -39.59325455    6.38658636   85.32251917   76.74701983  121.81942183 
##          1538          1539          1540          1541          1542 
##  -74.56121822  -73.89009112  -58.43307382  -69.38454509   -8.15751966 
##          1544          1545          1546          1548          1549 
##  -48.07005197   41.61906012   34.08290161   98.78079339  118.61874990 
##          1551          1552          1553          1557          1559 
##   75.46783663   52.75899249 -125.06675580   60.83540944  -59.80542657 
##          1561          1563          1564          1566          1567 
##   -9.94720566  132.68851025   56.45077198   60.51350387  106.75210880 
##          1568          1570          1573          1574          1575 
##   99.34095807 -113.13520270  -14.72417754   -2.89465630  -33.53604576 
##          1577          1578          1579          1580          1581 
##  -45.09963188   16.28021461  -93.63454026  -48.82505431  -16.17129325 
##          1583          1584          1585          1587          1594 
##   -0.19187065   62.85682494   78.75508549   35.17703403 -136.42815156 
##          1596          1598          1599          1600          1601 
##   39.48320966   24.13110944  -76.34963210  -51.30559209   80.83238470 
##          1602          1604          1605          1608          1609 
##   32.39885511 -110.45375821  131.18962578   36.67876991   53.44932693 
##          1610          1611          1612          1616          1618 
##    5.72094192   77.39520984  -19.50544551 -150.32710163 -156.59110451 
##          1619          1620          1621          1623          1625 
##  -22.30990278   14.32881748  105.52129573  -44.29635401  -36.54111030 
##          1627          1628          1629          1630          1631 
## -131.52724707  -14.57228342   30.27010337   50.18153882  -28.55998393 
##          1632          1634          1636          1638          1639 
##   28.85907129  -27.80762137 -112.30629111   80.20196910  108.05209430 
##          1640          1641          1642          1643          1644 
##  -48.29490944  -63.83027409  -98.21496499  -25.14082216  -33.70091238 
##          1645          1649          1650          1651          1652 
##  -49.90781326   24.32805846   54.80651696  -78.30417434   45.44740168 
##          1656          1657          1659          1660          1662 
##   -6.26310357   42.75196163 -131.29076261   30.92693464  -84.20431855 
##          1663          1664          1665          1666          1667 
##  -30.24103195  -68.70444141   33.84307619   50.25205574   66.53317361 
##          1668          1669          1670          1672          1673 
##  -10.32885777   16.83573308  -11.38678063   42.50965937    1.88239196 
##          1674          1676          1677          1678          1682 
##   50.72879236   11.05238947  -71.80247901  -78.56075583   71.28114022 
##          1684          1685          1686          1687          1689 
##  -45.30160767    6.99122356 -116.12789057  -62.05891985   61.04286329 
##          1691          1692          1695          1696          1698 
## -104.56653727   34.39875830   37.59128763 -139.41635528   51.65086065 
##          1699          1701          1702          1707          1708 
##    5.63783777   90.56157990   83.64536273  -43.22256056  -89.89296224 
##          1714          1715          1716          1717          1718 
##    0.01101409  -42.46741975  -56.56279830   25.48497394  -53.94864772 
##          1719          1720          1721          1722          1723 
##   72.44150458  -33.21189496   31.24364444   31.68737355   16.29690758 
##          1724          1725          1727          1728          1729 
##   66.34511289  -13.50911751 -137.97589309  -54.52063972   49.96488293 
##          1730          1731          1732          1733          1734 
##  -41.70353674 -110.98919447   18.26500384 -152.14098542  101.81799496 
##          1736          1738          1740          1742          1743 
##  124.10968228    8.53714536   -2.58564921  -51.47279840 -135.86490792 
##          1744          1747          1748          1749          1750 
##    9.31963767   56.48006818  -93.48559161  -73.71781557   83.18451175 
##          1751          1752          1753          1754          1755 
##  -19.43722991    0.76702842   40.81708059   -2.95744997  -52.18243023 
##          1756          1757          1759          1762          1763 
##   44.14615013  -63.24952141 -247.43523893  -18.03552306   73.33407235 
##          1764          1766          1767          1768          1769 
##   88.64803869   -8.36904399  -72.22132957  -17.64484470   67.59513211 
##          1770          1771          1772          1776          1777 
##  -49.61333757    1.78523026  -15.93887471 -113.97142849 -103.96894412 
##          1778          1779          1780          1782          1783 
##   36.50438743  -76.88161329   56.68759866   -7.89678134   68.59988190 
##          1792          1793          1794          1795          1797 
##  -66.37280682  101.21751480   56.97424958   71.64124322   93.23454094 
##          1799          1801          1802          1803          1804 
##  -50.73755801  101.01469597   20.74623648   24.54623688   39.80140982 
##          1807          1809          1810          1811          1812 
##   10.64522569   45.37840659   12.67052841  -28.81863802 -138.24625705 
##          1814          1816          1817          1818          1823 
##  -79.79064251  123.09680393  -20.68575940   22.02130390   22.88957294 
##          1824          1825          1826          1827          1828 
##  -91.32391939  -67.87948011  -26.95373537  -52.95038797    1.20496543 
##          1830          1831          1833          1834          1835 
##   97.13234652   -9.73134125   74.61780253  -20.32176848   31.65873014 
##          1836          1837          1838          1839          1841 
##   45.83643235  -11.47853641 -102.51732663  118.61724136  -47.78962950 
##          1842          1844          1845          1846          1847 
##  -10.07866353 -103.57398032   29.09066867   -4.18115290   43.15044083 
##          1848          1849          1850          1851          1852 
##  -36.55009552   20.15772326 -110.77480977   -0.22921545  -68.86496581 
##          1854          1855          1856          1857          1858 
##  -91.35362869  -35.99724606 -155.38904343   32.28121992  -79.06783609 
##          1859          1860          1862          1863          1864 
##   13.85663968  -32.47884377   -3.03222270  -31.98227477  -51.98700338 
##          1867          1868          1869          1871          1872 
##   80.54235732  117.08769085  172.72301355  -61.32245243  152.72372799 
##          1875          1876          1877          1879          1880 
##   30.92129805   26.97879654   52.25932447   19.10290807   38.23304190 
##          1881          1882          1883          1884          1885 
##   89.59286869  -81.32719536  125.23953079  -82.40121943  -12.85382635 
##          1886          1887          1889          1892          1893 
##   75.28715598  -46.35377698 -162.06407322   68.09942412   78.38323712 
##          1894          1895          1896          1897          1898 
##   16.29197620   38.07052132  -11.89427747    7.66069858   10.10895566 
##          1899          1900          1901          1904          1905 
##  114.70573416  -34.47127263  -96.95116782    6.58925285 -123.49720921 
##          1906          1908          1911          1916          1917 
##    3.89509147  -32.03157386   23.45833852  135.53613202  -39.30271273 
##          1920          1921          1922          1923          1924 
##  -45.24461256  108.71075931    2.74090720   35.45093889  -31.66934715 
##          1926          1927          1930          1932          1934 
##   22.90339355   20.22331144  -11.63826989  -48.62665003 -146.99543480 
##          1935          1939          1940          1942          1944 
##  -27.10921412   -1.08136872  -16.07462997  -28.35040593 -140.75719713 
##          1946          1948          1949          1950          1951 
##   36.79085953  -65.51588086  -46.14480894   -3.93079549   -1.10422877 
##          1952          1953          1954          1955          1956 
##   31.93288894   28.86327966  -58.50599835  -11.18192498   53.82563636 
##          1957          1959          1960          1961          1962 
##   37.28327539   73.74490243   63.95092243   24.56253281  -59.80330404 
##          1963          1964          1965          1966          1968 
##  -74.68125639    0.14724106   31.49556516   86.27235753  -34.07620723 
##          1969          1973          1974          1975          1976 
## -103.25911289  127.77510937  144.90952466   75.88455761  -77.48364431 
##          1978          1979          1980          1981          1982 
##   23.62183156  -53.45924076   -2.29548368   87.15949954  140.21529955 
##          1986          1988          1989          1990          1993 
##   21.53191833  -76.37611092    2.62549251  -41.94962580   10.62282736 
##          1997          1998          2001          2002          2004 
##    3.20234983  -55.12934608   39.53409039   43.14594742  -33.81985608 
##          2009          2011          2012          2014          2015 
##   40.75649643  -81.28368581  -64.47132124    0.95285042  121.59480712 
##          2017          2019          2020          2021          2023 
##   15.10293961    3.89550863   11.02832172  -12.15883955  -48.82914516 
##          2024          2026          2028          2030          2031 
##   17.80296232 -105.78715410  -37.12611782  -67.75791297   -4.81823999 
##          2032          2033          2034          2035          2036 
##    2.63402802  -21.68348822 -152.44785238   10.77431259  -10.80492742 
##          2037          2038          2039          2040          2041 
##  -24.85414763  -48.32840010   19.37226975   23.31906863   -6.35281314 
##          2042          2043          2045          2048          2050 
## -147.04380134    0.79664929  -87.84082049  -34.00200727  -51.82671995 
##          2051          2053          2054          2056          2057 
## -103.93418528 -113.68172009  -52.70784024   46.23181514   59.03482172 
##          2058          2059          2061          2062          2063 
##   83.51376723   82.09349696  -17.19687867   66.39163327  -36.17882698 
##          2066          2067          2068          2070          2071 
##  -33.86048149  -14.79869890 -195.11933862   12.77211671  -53.59347440 
##          2072          2074          2078          2081          2082 
## -116.43922907  -20.00043170   22.40766127  -59.43369364   11.17029274 
##          2083          2084          2085          2086          2087 
##   85.62598836  -55.18622232 -179.43927439 -109.44506569  116.13882978 
##          2088          2090          2092          2093          2094 
##   91.99851479   17.50799278   15.80766557  -72.98689743   42.70613848 
##          2095          2096          2097          2099          2100 
##   52.07635641    4.91677395  -62.26701123  -85.18438374   67.45994015 
##          2101          2102          2103          2104          2105 
##   75.73700925   33.88523285   48.61704938    7.35294747    2.59525413 
##          2106          2107          2108          2110          2111 
##    1.35484387   56.28003188  -17.58539787   39.41577049   36.16730955 
##          2114          2115          2116          2118          2121 
##   55.76294176  202.00162053   32.83455621   27.83247416  124.46712672 
##          2122          2124          2125          2126          2127 
##    4.46163087   32.69053150   27.49658724  -27.00405909  -14.24191843 
##          2129          2130          2131          2132          2133 
##   24.28097797  -37.88588466   97.22496257   -1.06109473  146.30360739 
##          2134          2137          2138          2139          2140 
##  -75.62576708   73.58291737   13.79119955  -96.17869841 -117.39998282 
##          2141          2145          2146          2147          2148 
##  -80.78059967   -0.57859660  101.14460690   40.21253202  148.04479821 
##          2149          2150          2151          2152          2153 
##   22.49801743   50.45308046 -106.39801720   70.39374056   30.04186666 
##          2154          2156          2157          2158          2159 
##  129.86994984   85.21536446  131.20743925  136.62014762   31.44642890 
##          2162          2163          2165          2167          2172 
##   23.90102119   -6.97847790   51.17806456  -17.41593953  -77.13726642 
##          2175          2177          2178          2179          2181 
##    8.48763679 -110.55459878   28.51322075   76.53279689  -78.45372612 
##          2184          2185          2187          2188          2189 
##  -36.16116317  -38.28861906  -22.27690672  -56.84348090   35.89025155 
##          2191          2193          2194          2195          2196 
## -122.73052852  -14.50380120  -88.96226845  -81.45545657   16.69323757 
##          2201          2205          2207          2208          2210 
##  118.74500430   47.74660120   18.58864953  -17.78632560  -26.38000049 
##          2211          2212          2215          2216          2218 
##   28.70411221  -10.50812734  -60.63433134   22.63481282    0.83671868 
##          2220          2224          2225          2227          2228 
##   -5.18816701 -128.84793175   43.19608355  -85.88825616  123.96457169 
##          2233          2236          2237          2238          2241 
##  -12.49567665  113.70278980   81.93967758   12.31227642   71.27061932 
##          2242          2244          2245          2246          2247 
##  -20.56086549   71.72576598   90.91647865    2.74750164    7.34515109 
##          2248          2250          2251          2252          2253 
##   40.66273589   45.58009453   15.69008168  -41.36693273  -77.83129849 
##          2254          2255          2256          2258          2260 
## -104.41593992  -12.41394447    9.86793103 -101.34135383 -130.79799055 
##          2263          2264          2265          2266          2267 
##   18.49343426  144.87545745 -132.05203732   84.17109818  -46.23005396 
##          2268          2270          2272          2273          2274 
##   13.02743325 -161.63534858   92.42858485  -23.61790418   -1.28351779 
##          2275          2276          2278          2279          2280 
##   99.79512072   25.58426079  -62.85243639   58.11023246   62.85480506 
##          2281          2282          2284          2285          2287 
##  -77.09302011 -101.70858471   46.32426828  106.53447311   -9.52089132 
##          2289          2290          2292          2294          2295 
##  106.18270688 -108.13889063  128.75643441   13.13630883   55.64635259 
##          2296          2300          2301          2302          2303 
##   10.79332996  -65.56578966   35.17451182  -37.23912397   71.90387548 
##          2304          2305          2306          2309          2310 
##  -12.37770993   82.36707200 -108.05408798   17.14000162   10.29764885 
##          2311          2312          2313          2316          2318 
##    4.43402863  -64.74084203  -97.86722395   19.55822620  124.94465534 
##          2319          2321          2324          2326          2328 
## -126.65478506   18.32582106  -10.16695311  -11.83610626 -145.22802427 
##          2330          2331          2332          2333          2334 
##  -63.35997882 -100.40567855 -138.98485638   -0.92886322 -151.93644957 
##          2335          2337          2339          2340          2341 
##  -25.61285986  -67.39170445   43.07963233 -191.28113523  -27.04539363 
##          2342          2343          2344          2346          2348 
##  -83.36096166  133.46808968    1.25083008  -54.17513316   76.39085837 
##          2349          2350          2351          2353          2355 
## -217.51820373  -86.60088724   11.88973713   73.97094899 -112.14334369 
##          2356          2359          2360          2361          2365 
##  -42.94358500   -8.93174839  -36.96534172  139.99312864   -8.54792150 
##          2366          2367          2369          2371          2372 
##   14.49564429   53.89329699    1.35699835   52.08667206 -121.45015874 
##          2374          2375          2376          2377          2379 
##  -40.02869937  -22.31838952   54.46422182  -30.91039531   44.71964382 
##          2381          2382          2386          2387          2392 
##    6.15437761    1.57382633  -28.06517653  -61.21606473  -20.54049607 
##          2394          2398          2399          2400          2401 
##   50.59521255  113.81272838   74.31419504   16.68256654 -118.59569069 
##          2402          2403          2405          2408          2410 
##  -69.08293361  125.58362755 -173.88107026  -43.60591447  -18.96336636 
##          2411          2413          2414          2415          2416 
##   18.07392923   24.65843503 -103.27693971   -1.61910668  -58.06123943 
##          2417          2419          2420          2421          2422 
##  -46.07620595   23.87384476  -50.31699755   55.24505641 -157.42333879 
##          2423          2424          2426          2427          2428 
##   32.37315803  -53.80976741  -43.46447564 -120.38566969   84.86120458 
##          2429          2430          2431          2432          2433 
##   -9.08774819  102.97407772   76.52428815  -36.52111081  -11.38978994 
##          2434          2437          2438          2439          2440 
##   83.81734575   36.22665121  -63.56405945   25.43901519   83.08565081 
##          2441          2442          2443          2444          2447 
##  -38.78709652  -14.08212355 -147.83169815  -51.30745465  -40.43931320 
##          2448          2449          2451          2453          2454 
##   57.53316605   81.06114394   -2.64758213  -12.72839413  -16.20561542 
##          2455          2460          2463          2464          2466 
##    3.11489753 -218.10861412   -5.22694673   43.99674884   31.76921647 
##          2467          2469          2471          2472          2473 
##  117.73479821  -81.29154875   52.02642736  102.71739342  -49.01888819 
##          2474          2476          2477          2478          2480 
##  -72.69952908   -1.69228114   84.89814471   24.20451060   37.92383323 
##          2483          2484          2485          2486          2488 
##  -11.51558160 -170.35822553   -7.73696432   75.84676033  -79.35075749 
##          2490          2491          2492          2494          2495 
##  -89.98140773   56.10002502   67.28613031   -1.65097336   31.58811772 
##          2496          2497          2499          2500          2501 
##  -26.45908144  -19.03505694  -52.81484755  161.12396787   38.81628512 
##          2502          2504          2505          2506          2510 
## -100.89152466   25.65799278   77.55031184  -37.08682767   65.03742606 
##          2512          2513          2514          2515          2516 
##  -32.62047501  -67.77961068   38.57374256  -51.23576337  -18.17603922 
##          2517          2521          2522          2524          2527 
##   31.51562065  -26.57119635  -12.03839118   18.22568705   72.67882822 
##          2528          2530          2531          2532          2533 
##  -64.04427560  -13.05821651   30.51374734   48.74582620   41.53111096 
##          2534          2535          2536          2538          2539 
##   64.25655513   41.93942245   24.73359765  183.42345398  -66.40674129 
##          2540          2541          2542          2543          2544 
##  -39.09780098  -76.74460232   -4.93575690  -98.56955926  103.19044850 
##          2545          2546          2547          2548          2549 
##   24.53222654   64.31947330   68.51457644  -14.21331836  -72.50105066 
##          2551          2552          2554          2557          2558 
##  -48.30867091   64.40888511   35.97646646  -95.17873414   16.27054297 
##          2560          2561          2563          2564          2565 
##  116.32027086 -106.30395856    9.68817186  -36.22283677  -57.19807776 
##          2566          2567          2568          2570          2571 
##  -31.55036028   35.75290266  -40.40340699  -15.03410222 -134.25029158 
##          2572          2574          2575          2576          2579 
##   14.46188660   20.55672741    5.73427794   48.29199671  -85.29716642 
##          2580          2581          2582          2583          2584 
##  -25.49573109  -33.12258042  -71.09347218   40.88868172  -56.48736616 
##          2585          2586          2587          2588          2590 
##  102.44799802   18.14197061   63.19779658 -129.89771054  -56.42243486 
##          2593          2594          2595          2596          2597 
##  -26.57029242  -38.75514826  -21.78665673   62.48720527  104.80437667 
##          2598          2600          2602          2605          2606 
##   -1.15681087 -158.53150754  -15.96188792   58.44757828  -41.41069332 
##          2607          2609          2611          2613          2615 
##  109.20685079   47.77883228   70.07790958  127.09069784  -37.47222831 
##          2617          2618          2619          2620          2622 
##  -17.81559827  -42.33282916  -35.08066977  -67.72807230    6.50178371 
##          2623          2624          2625          2626          2627 
## -145.68668721   86.12474093  -24.11672802 -153.85527292 -104.33693373 
##          2628          2629          2632          2633          2639 
##  -98.05461061  -76.58660358  -60.03248050  -32.07456136   81.77109920 
##          2640          2642          2643          2644          2646 
## -114.44782923 -187.10805233   39.75938816   21.52984826 -108.45838878 
##          2647          2648          2649          2650          2653 
##  116.14818888   40.28397486  -70.96316480    5.16838294  -28.63620617 
##          2654          2655          2656          2657          2660 
## -119.81103119   62.71236074    3.93623817  123.75349953    8.19072495 
##          2661          2663          2664          2667          2668 
##  -46.03207523  -75.36633025  -47.69172774   33.82753900   -0.66222800 
##          2669          2672          2674          2675          2676 
##  -52.03873267 -127.39231415  -63.04404295   85.64820725  -34.57114177 
##          2677          2681          2682          2683          2684 
##  110.10719208    8.98068865   18.06196940  -71.03945070  -39.31740071 
##          2688          2689          2690          2691          2693 
##  -19.37246989   71.74355710  -28.22747064  129.14094965  -48.81603751 
##          2694          2696          2699          2701          2702 
##   70.75362778   92.11423602  -11.32139536 -160.92156287  110.95009466 
##          2704          2705          2708          2712          2716 
##   26.41317969 -101.42253034   37.13592208 -201.11210744  -38.89815191 
##          2717          2718          2719          2720          2722 
##  -85.65286151  -84.68192610   57.14428203   31.69321943  -89.66838764 
##          2725          2727          2728          2729          2732 
##   22.11410225   45.20909532   22.01626331   97.29451338   82.15040144 
##          2733          2735          2737          2738          2739 
##  110.77718929   64.32553297    1.93890897   30.18073521  166.12881114 
##          2742          2743          2745          2746          2748 
## -116.30702409   52.92649725   28.59366452    8.38036010   15.92520645 
##          2750          2751          2752          2754          2755 
##   -0.12561928   76.91997193   -7.34560385  -32.43100570  139.90778719 
##          2756          2757          2758          2760          2763 
##  -16.26765856  122.77577965   -5.00147734  -86.22944559  135.37676241 
##          2765          2768          2769          2770          2771 
##  -64.69577194   29.86374819  -40.04468311   90.99922237   88.17535111 
##          2772          2773          2774          2775          2776 
##  -65.53392523   66.35306757  -46.83091720   24.75913988   88.47608806 
##          2777          2778          2780          2781          2782 
##   97.51765809   48.76625810   -2.88016616  210.75774707  -64.45594383 
##          2785          2786          2787          2790          2791 
##   42.37419563   -1.08602988  -98.17688704  -31.28135490   97.94043247 
##          2792          2794          2796          2797          2803 
##    9.96021899   38.42463463  -32.05985144  145.95418049  -69.78515398 
##          2804          2805          2807          2809          2810 
## -105.97094501   33.67402034   12.33358299  -47.60188221  -80.46945334 
##          2811          2812          2813          2814          2815 
##    7.39940569   -6.48201060   56.85483412   19.81192128  -65.05620062 
##          2816          2818          2819          2820          2821 
##   -5.18737349   24.16774798    3.67113475   72.43560213  -19.55131020 
##          2824          2825          2826          2827          2830 
##  106.32918001   39.60159495  -89.95580344   -3.09240735   16.89111795 
##          2831          2832          2833          2834          2837 
##   -0.75919887   81.84951510  126.39629987  -48.96589663 -107.79879336 
##          2838          2839          2840          2841          2843 
##  -23.68056438  -50.72277882   77.22033143    1.30549470  141.79492610 
##          2844          2845          2846          2848          2849 
##  -26.08615698   53.42462605   21.17982513  -78.39361044  117.58951594 
##          2850          2851          2852          2854          2855 
##  -28.63924679  -45.91234136  -83.15574656   26.11304830    7.14164108 
##          2856          2857          2859          2860          2862 
##  -72.49978752  -54.26423402   15.02601495  -98.03737015   11.02408054 
##          2863          2864          2865          2869          2871 
##  -83.47617098  -87.50983603    7.22838649   68.04703921  141.11797276 
##          2872          2874          2877          2878          2879 
##  -51.72013737  -56.04559588  135.73026534 -110.52602935  -12.24079970 
##          2881          2882          2884          2885          2887 
##   74.10982779  -51.32540170  -72.09284323  -11.45326885   59.76678667 
##          2888          2889          2890          2891          2892 
##  -83.08797582    3.33608982  -16.46833214  153.71176369  -44.48275992 
##          2893          2895          2896          2899          2905 
##  -94.02717501   29.61582192   70.70872684   74.15869595  -95.51699022 
##          2906          2907          2908          2909          2910 
##   14.15177486   22.47124574 -106.92871042 -140.49187140  -70.39203019 
##          2912          2913          2914          2915          2916 
##   14.00801465   22.29333198   21.14187932  101.90174740  -16.94265150 
##          2917          2918          2919          2920          2921 
##   37.58242919  -41.32472691    3.09868213    0.07276469   61.34614682 
##          2922          2925          2926          2927          2929 
##   53.19840167   38.24224772  -80.50612093   76.56909323   -3.27240116 
##          2930          2931          2933          2934          2935 
##    3.54847220   82.98114055   27.97164445   25.88227585   44.60120733 
##          2936          2937          2938          2941          2942 
##    4.12783878  104.44455079  -23.98506983    8.23818888 -151.21986255 
##          2944          2948          2949          2950          2952 
##   17.29752511   52.91391045   -0.46369824   49.10687556 -145.18041633 
##          2953          2954          2955          2956          2957 
##  -29.94636029   74.21516126  106.38679437   65.50177477  -13.23375612 
##          2958          2959          2961          2962          2963 
##   70.50063452 -131.30518703   43.90221078   77.39393005  -79.33137797 
##          2964          2965          2967          2968          2970 
##   77.48058462   25.42716832   91.17615782   35.27276057  -35.94971234 
##          2972          2976          2977          2978          2980 
##  -61.83384781   -1.12911900   62.26012112   39.68146582   84.16584537 
##          2981          2982          2985          2986          2988 
## -172.85974294  -50.36400114 -100.39109312   66.86510701   19.42111949 
##          2989          2990          2993          2994          2996 
##  -31.21567694  -57.90877381  -49.45139847   36.66893400  129.42082245 
##          2998          2999          3001          3003          3004 
##  192.45905636  -44.19050624  -31.16328549   39.59421055   97.16973416 
##          3005          3006          3010          3012          3013 
##  -70.96444385   -8.78560407  136.41783375  117.92252852   18.47893809 
##          3014          3017          3018          3019          3022 
##  -55.47103017  -83.01860310   31.61988796  -34.04627554  -48.66290416 
##          3023          3024          3025          3026          3027 
##    3.23290353   38.68175156  118.04380922   56.66995544  -57.17765098 
##          3028          3030          3032          3034          3036 
##  -17.50620537  -45.85565145   55.69026261   96.76011856   48.98079020 
##          3037          3039          3040          3043          3044 
##  -36.06826966   -8.88729635  -38.27882966   73.53483379    2.28117175 
##          3045          3046          3047          3048          3049 
##    9.15939412  -11.04385862  109.32412329   13.12148450   43.92272425 
##          3053          3054          3055          3056          3057 
##  -27.73456521   38.46774198  -89.49974813  -43.36277083   68.67327224 
##          3058          3059          3060          3061          3062 
##  -21.78994563 -169.57077623   31.32202113  -28.48798906  157.24952799 
##          3064          3065          3066          3067          3069 
##  -58.67889746   -3.29194161   44.15314609   51.03292243   80.88739006 
##          3070          3072          3074          3075          3076 
##    7.30971247   49.86111541   38.82714087   60.09761164  -65.42786737 
##          3077          3078          3079          3080          3084 
## -103.28040973  -99.43801695   18.59560461   92.43558997 -153.59202683 
##          3086          3088          3091          3092          3093 
##  161.88829672 -118.22400800   80.89403447   -7.34244274  135.82047524 
##          3095          3097          3099          3101          3102 
##  -31.46529259  193.60582701 -231.57897253   -8.77482642  -53.68689616 
##          3103          3107          3109          3110          3111 
##  -61.56422025  -34.00302211   58.63749278  -63.47156160   13.12023017 
##          3112          3113          3114          3115          3116 
##   74.89616666  -35.15265309    4.27505636   35.27834853  -35.79735104 
##          3120          3121          3123          3124          3125 
##   14.13333562  -58.41135685  -27.80520452   -1.55719680   17.68948488 
##          3127          3128          3130          3132          3133 
##  -34.03050874 -104.48699109  -76.67921747   32.36119636   15.68393728 
##          3134          3135          3136          3137          3138 
## -133.49118065   67.30730890   37.38983014   79.54351353  -47.09781050 
##          3139          3140          3143          3144          3146 
##   43.21654332   72.61743189   48.44352642   -5.55179046  142.34229612 
##          3148          3149          3150          3152          3156 
##   -0.88337911 -108.55619230   66.25275888  -45.92961068   32.46449917 
##          3157          3158          3160          3161          3162 
##   84.72601329 -136.17343964   20.89486031   28.44232926  -22.74917889 
##          3163          3164          3165          3166          3171 
##  141.05557217   24.61193443   57.50818710  -10.53510436   56.16629181 
##          3172          3173          3174          3175          3176 
## -146.90349530  101.09044706  -21.82726403  -11.21160695  -15.75927719 
##          3177          3178          3179          3180          3181 
##    2.97977874  -28.07278390  -41.17646863 -126.12766576  -62.47283820 
##          3182          3184          3186          3188          3191 
##  -75.10560005   86.96100747   36.74130095    0.50482107   -1.23330669 
##          3192          3193          3194          3198          3199 
##  -12.52604566   13.88994899   88.56368236   67.90670511    8.57314471 
##          3200          3202          3204          3207          3208 
##  -76.63699242   18.38578397   15.14929631   37.93429326   36.79895251 
##          3210          3211          3212          3213          3216 
##  -49.10491536   66.21025933   72.77609570  -67.22025834  -64.77378841 
##          3218          3221          3222          3226          3227 
##  -67.47354651  105.06865521   65.66870000   47.32293102 -155.31887016 
##          3228          3229          3230          3232          3234 
##  -87.97506714   59.88601744   79.47598819   30.10907306  -15.59507077 
##          3235          3236          3237          3238          3239 
##   60.20334124  100.45629867    7.79127515   15.90126892   17.74691852 
##          3240          3241          3242          3243          3245 
##   32.96234980 -111.10016764  -36.77951845  -58.83588394   80.25509814 
##          3246          3247          3248          3249          3250 
##  -45.82301023  -99.42909857    4.25805091  -12.39466050   45.45579935 
##          3251          3252          3253          3255          3256 
##  -98.03921443   45.94713835   -6.02424675   81.21678718 -135.16470020 
##          3257          3258          3259          3260          3261 
##   17.82906964   11.45636872   68.43043598  100.87557341   31.49854417 
##          3265          3266          3267          3269          3271 
##   17.84206196  -11.84663659   40.63796051   89.99672281 -215.81717392 
##          3272          3275          3276          3278          3280 
##   40.14279028   66.92385797   72.93447457  -10.17277279   73.33089001 
##          3281          3282          3283          3284          3285 
## -169.98627602  -31.57488375   70.21548132  -94.85975885    2.52112718 
##          3286          3287          3288          3290          3291 
##  -62.62577861  -70.62955033 -142.21838927   17.87876308  153.96602197 
##          3292          3295          3297          3299          3300 
## -132.07931255  -16.17541978  -30.90275882  -26.71589275   65.09276042 
##          3301          3303          3304          3305          3306 
##   56.36293632   31.20821716  -84.10458511   23.08422806  -46.40531370 
##          3307          3308          3310          3312          3313 
##   13.65225573  -28.57651373  -14.95281923  -10.19402496   42.43015577 
##          3315          3317          3318          3319          3320 
##   69.21619228    4.07216650    0.25751692  133.55543848  -60.86602798 
##          3321          3323          3324          3325          3327 
##   94.43225015   45.93147587  -12.84286495    3.59488451  -21.24199428 
##          3331          3332          3333          3335          3337 
## -116.29847041 -107.01266493  -62.98304665  -30.41496024   58.65430844 
##          3338          3340          3342          3343          3344 
##  -30.59293553   34.72217256 -166.52131516  -21.22359154 -126.20277738 
##          3347          3350          3351          3352          3353 
##  120.89819200  -33.26813185   57.31767616   11.66863843 -168.33443950 
##          3354          3356          3357          3359          3360 
##  -49.00193749 -127.25973125   71.03634492  -33.71783082    4.25168991 
##          3365          3366          3367          3368          3370 
##   34.63429248  -83.45855578  -26.92586643   30.84353091  -17.80461761 
##          3371          3375          3376          3377          3380 
##  -37.85068683   11.52096286   52.14437107   49.44464199   78.95677892 
##          3381          3382          3383          3385          3386 
##   77.39335216  -63.05724561  -92.01292759   32.60475516   58.67016658 
##          3387          3389          3390          3391          3393 
##   42.28193203  -47.29933192   83.61310396   97.67839943 -206.90639519 
##          3394          3395          3397          3398          3399 
##  164.23506068  -57.48958822 -122.83667346  -88.95750924  129.92737691 
##          3402          3403          3407          3408          3409 
##  -47.87743257  -81.18138197  -44.74784560   92.61345023  -22.84111946 
##          3410          3411          3412          3414          3415 
##   84.80248857   59.06691526   44.38343908   20.87291820   84.27423287 
##          3417          3419          3420          3425          3426 
##   51.89591797  153.67484435  148.31738115   50.40470162   75.36607909 
##          3427          3428          3430          3432          3433 
##  -33.21673214   21.05135009  -25.22466375   20.56653026   50.22891007 
##          3435          3436          3437          3439          3440 
##  -39.41465767 -231.21266826   45.09945999  -67.68847611   28.59038890 
##          3441          3442          3445          3447          3449 
##   44.43236043  -77.33296659    8.29457415  -56.96394364  -18.99141864 
##          3450          3452          3454          3455          3456 
##  167.18220585  -52.11848248  -79.46463393   66.44930735   24.75314409 
##          3459          3460          3461          3462          3463 
##  203.00350610   40.67209412 -104.84886263   22.43098615  115.96544373 
##          3464          3465          3467          3468          3469 
##   64.05528019  -67.80333508   19.93790403  158.82026089  -18.50900804 
##          3470          3472          3473          3475          3479 
##   56.13268933   75.35863902   25.13226382   45.90008992 -132.79441158 
##          3482          3483          3485          3486          3488 
##  -23.40159749   13.22521838 -109.54723474   45.15945609  118.89758285 
##          3490          3491          3493          3494          3495 
##    9.16064016  -61.17900604 -114.38555407  -79.83886653 -124.63740262 
##          3498          3502          3503          3504          3505 
##  -67.68539590 -202.52748468   33.32338110  -10.09688491  -73.21397899 
##          3506          3508          3510          3511          3513 
##  122.11188069  109.49097510   77.56993007   20.69876039 -130.84343532 
##          3514          3517          3520          3521          3526 
##   45.51527849  -68.32113877  101.90745678   12.01509346   93.90963000 
##          3527          3528          3531          3532          3533 
##   19.72059517  -58.07203761   79.76859341   28.62799924   -7.69132860 
##          3536          3537          3538          3539          3540 
##  -99.90063296 -168.41577723   69.16818888  -88.38450003   72.37103453 
##          3541          3542          3543          3544          3546 
## -131.69474438 -112.58899443   90.19301757   48.76782212 -189.41261101 
##          3548          3549          3550          3551          3553 
##   49.55883043  -32.00655643   35.02084229   30.01341404  -81.98550662 
##          3559          3560          3561          3562          3563 
##  -81.34755788  -51.42304119   66.19962942   10.44945109   -5.95144419 
##          3565          3567          3568          3569          3571 
## -184.37188995  -12.08513829  -45.10160497  -80.35313276  -15.13839988 
##          3573          3576          3577          3578          3580 
## -112.96489476 -109.07535587  -36.08806557  102.06141895   -8.96246806 
##          3582          3583          3584          3585          3586 
##  -27.64277600  -87.41209917   59.24241445  -40.23420453   57.35833510 
##          3588          3591          3593          3594          3596 
##   15.02839138  -39.05423604  -24.03239490   -4.10719458  -14.65402609 
##          3597          3598          3599          3600          3603 
##   35.49430230   -9.07491924 -136.73885862   75.74881959  132.68561508 
##          3604          3605          3606          3609          3612 
##   61.80560461   80.61911789  -71.96844883  -11.35975159  -83.92607768 
##          3613          3614          3615          3616          3618 
##   77.74935795  -75.50896176  -82.63048526  -85.16222908  -24.90692088 
##          3619          3620          3621          3622          3623 
##   13.14622035  -90.27817317   77.41464562  -83.29736207   33.39253620 
##          3624          3625          3626          3628          3629 
##  -55.57408547  -42.44761461  -92.33594669   52.46404267  -94.77250424 
##          3631          3632          3633          3636          3638 
## -104.80861360  105.30688034   68.34500681 -137.13614744  -42.77113340 
##          3640          3641          3643          3645          3646 
##   68.47115959 -127.78364818  -32.66147133   33.07570230  -73.40799729 
##          3648          3650          3654          3655          3656 
##   56.05384070  -26.96547365   -8.34856124  -24.38091974   85.68133728 
##          3658          3660          3661          3663 
##   -2.69405972   -3.08990547    7.97710526   -5.51336867
# I would calculate RMSE by sqrt(sum(res$residuals^2) / res$df): https://stackoverflow.com/questions/43123462/how-to-obtain-rmse-out-of-lm-result
RMSE <- sqrt(sum(lmScore$residuals^2) / lmScore$df.residual)
RMSE
## [1] 73.81024

Predict

Using the “predict” function and supplying the “newdata” argument, use the lmScore model to predict the reading scores of students in pisaTest. Call this vector of predictions “predTest”. Do not change the variables in the model (for example, do not remove variables that we found were not significant in the previous part of this problem). Use the summary function to describe the test set predictions.

What is the range between the maximum and minimum predicted reading score on the test set?

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
predicTed <- predict(lmScore, newdata = pisaTest)

summary(predicTed)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   353.2   482.0   524.0   516.7   555.7   637.7
max(predicTed) - min(predicTed)
## [1] 284.4683
# SSE on testing dataset

SSE_1 <- sum((predicTed - pisaTest$readingScore)^2)
SSE_1
## [1] 5762082
RMSE_1 <- sqrt(mean((predicTed - pisaTest$readingScore)^2))
RMSE_1
## [1] 76.29079
baseline <- mean(pisaTrain$readingScore)
baseline
## [1] 517.9629
SST <- sum((pisaTest$readingScore - mean(pisaTrain$readingScore))^2)
SST
## [1] 7802354
R_sq <- 1 - SSE_1/SST
R_sq
## [1] 0.2614944

Flue epidemics chapter

Before applying analytics tools on the training set, we first need to understand the data at hand. Load “FluTrain.csv” into a data frame called FluTrain. Looking at the time period 2004-2011, which week corresponds to the highest percentage of ILI-related physician visits? Select the day of the month corresponding to the start of this week.

FluTrain <- read.csv("FluTrain.csv")
str(FluTrain)
## 'data.frame':    417 obs. of  3 variables:
##  $ Week   : chr  "2004-01-04 - 2004-01-10" "2004-01-11 - 2004-01-17" "2004-01-18 - 2004-01-24" "2004-01-25 - 2004-01-31" ...
##  $ ILI    : num  2.42 1.81 1.71 1.54 1.44 ...
##  $ Queries: num  0.238 0.22 0.226 0.238 0.224 ...
hi_per <- subset(FluTrain, ILI == max(ILI))
hi_per
##                        Week      ILI Queries
## 303 2009-10-18 - 2009-10-24 7.618892       1

Understanding the data

Let us now understand the data at an aggregate level. Plot the histogram of the dependent variable, ILI. What best describes the distribution of values of ILI?

hist(FluTrain$ILI)

# plot nat log of ILI vs queries

When handling a skewed dependent variable, it is often useful to predict the logarithm of the dependent variable instead of the dependent variable itself – this prevents the small number of unusually large or small observations from having an undue influence on the sum of squared errors of predictive models. In this problem, we will predict the natural log of the ILI variable, which can be computed in R using the log() function.

Plot the natural logarithm of ILI versus Queries. What does the plot suggest?.

plot(x = FluTrain$Queries, y = log(FluTrain$ILI))

Linear Regression Model!

Based on the plot we just made, it seems that a linear regression model could be a good modeling choice. Based on our understanding of the data from the previous subproblem, which model best describes our estimation problem?

FluTrend1 <- lm(log(ILI) ~ Queries, data = FluTrain)
summary(FluTrend1)
## 
## Call:
## lm(formula = log(ILI) ~ Queries, data = FluTrain)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.76003 -0.19696 -0.01657  0.18685  1.06450 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.49934    0.03041  -16.42   <2e-16 ***
## Queries      2.96129    0.09312   31.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2995 on 415 degrees of freedom
## Multiple R-squared:  0.709,  Adjusted R-squared:  0.7083 
## F-statistic:  1011 on 1 and 415 DF,  p-value: < 2.2e-16
cor(x = FluTrain$Queries, y = log(FluTrain$ILI))
## [1] 0.8420333
0.8420333^2
## [1] 0.7090201

Test set

The csv file FluTest.csv provides the 2012 weekly data of the ILI-related search queries and the observed weekly percentage of ILI-related physician visits. Load this data into a data frame called FluTest.

Normally, we would obtain test-set predictions from the model FluTrend1 using the code

PredTest1 = predict(FluTrend1, newdata=FluTest)

However, the dependent variable in our model is log(ILI), so PredTest1 would contain predictions of the log(ILI) value. We are instead interested in obtaining predictions of the ILI value. We can convert from predictions of log(ILI) to predictions of ILI via exponentiation, or the exp() function. The new code, which predicts the ILI value, is

PredTest1 = exp(predict(FluTrend1, newdata=FluTest))

What is our estimate for the percentage of ILI-related physician visits for the week of March 11, 2012? (HINT: You can either just output FluTest$Week to find which element corresponds to March 11, 2012, or you can use the “which” function in R. To learn more about the which function, type ?which in your R console.)

FluTest <- read.csv("FluTest.csv")
PredTest1 <- exp(predict(FluTrend1, newdata = FluTest))

FluTest$Week
##  [1] "2012-01-01 - 2012-01-07" "2012-01-08 - 2012-01-14"
##  [3] "2012-01-15 - 2012-01-21" "2012-01-22 - 2012-01-28"
##  [5] "2012-01-29 - 2012-02-04" "2012-02-05 - 2012-02-11"
##  [7] "2012-02-12 - 2012-02-18" "2012-02-19 - 2012-02-25"
##  [9] "2012-02-26 - 2012-03-03" "2012-03-04 - 2012-03-10"
## [11] "2012-03-11 - 2012-03-17" "2012-03-18 - 2012-03-24"
## [13] "2012-03-25 - 2012-03-31" "2012-04-01 - 2012-04-07"
## [15] "2012-04-08 - 2012-04-14" "2012-04-15 - 2012-04-21"
## [17] "2012-04-22 - 2012-04-28" "2012-04-29 - 2012-05-05"
## [19] "2012-05-06 - 2012-05-12" "2012-05-13 - 2012-05-19"
## [21] "2012-05-20 - 2012-05-26" "2012-05-27 - 2012-06-02"
## [23] "2012-06-03 - 2012-06-09" "2012-06-10 - 2012-06-16"
## [25] "2012-06-17 - 2012-06-23" "2012-06-24 - 2012-06-30"
## [27] "2012-07-01 - 2012-07-07" "2012-07-08 - 2012-07-14"
## [29] "2012-07-15 - 2012-07-21" "2012-07-22 - 2012-07-28"
## [31] "2012-07-29 - 2012-08-04" "2012-08-05 - 2012-08-11"
## [33] "2012-08-12 - 2012-08-18" "2012-08-19 - 2012-08-25"
## [35] "2012-08-26 - 2012-09-01" "2012-09-02 - 2012-09-08"
## [37] "2012-09-09 - 2012-09-15" "2012-09-16 - 2012-09-22"
## [39] "2012-09-23 - 2012-09-29" "2012-09-30 - 2012-10-06"
## [41] "2012-10-07 - 2012-10-13" "2012-10-14 - 2012-10-20"
## [43] "2012-10-21 - 2012-10-27" "2012-10-28 - 2012-11-03"
## [45] "2012-11-04 - 2012-11-10" "2012-11-11 - 2012-11-17"
## [47] "2012-11-18 - 2012-11-24" "2012-11-25 - 2012-12-01"
## [49] "2012-12-02 - 2012-12-08" "2012-12-09 - 2012-12-15"
## [51] "2012-12-16 - 2012-12-22" "2012-12-23 - 2012-12-29"
which(FluTest$Week == "2012-03-11 - 2012-03-17")
## [1] 11
PredTest1[which(FluTest$Week == "2012-03-11 - 2012-03-17")]
##       11 
## 2.187378

Relative Error

What is the relative error betweeen the estimate (our prediction) and the observed value for the week of March 11, 2012? Note that the relative error is calculated as

(Observed ILI - Estimated ILI)/Observed ILI

Obs_ILI <- FluTest$ILI[which(FluTest$Week == "2012-03-11 - 2012-03-17")]
Est_ILI <- PredTest1[which(FluTest$Week == "2012-03-11 - 2012-03-17")]
Rel_error <- (Obs_ILI - Est_ILI)/ Obs_ILI
Rel_error
##         11 
## 0.04623827

RMSE

What is the Root Mean Square Error (RMSE) between our estimates and the actual observations for the percentage of ILI-related physician visits, on the test set?

rmse_2 <- sqrt(mean((PredTest1 - FluTest$ILI)^2))
rmse_2
## [1] 0.7490645

Time series in the model

To do so, we will use the “zoo” package, which provides a number of helpful methods for time series models. While many functions are built into R, you need to add new packages to use some functions. New packages can be installed and loaded easily in R, and we will do this many times in this class. Run the following two commands to install and load the zoo package. In the first command, you will be prompted to select a CRAN mirror to use for your download. Select a mirror near you geographically.

install.packages(“zoo”)

library(zoo)

After installing and loading the zoo package, run the following commands to create the ILILag2 variable in the training set:

ILILag2 = lag(zoo(FluTrain$ILI), -2, na.pad=TRUE)

FluTrain$ILILag2 = coredata(ILILag2)

In these commands, the value of -2 passed to lag means to return 2 observations before the current one; a positive value would have returned future observations. The parameter na.pad=TRUE means to add missing values for the first two weeks of our dataset, where we can’t compute the data from 2 weeks earlier.

How many values are missing in the new ILILag2 variable?

library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
ILILag2 <- lag(zoo(FluTrain$ILI), -2, na.pad=TRUE)
FluTrain$ILILag2 <- coredata(ILILag2)

plot(x = log(FluTrain$ILI), y = log(FluTrain$ILILag2))

# Training a time series model

Train a linear regression model on the FluTrain dataset to predict the log of the ILI variable using the Queries variable as well as the log of the ILILag2 variable. Call this model FluTrend2.

Which coefficients are significant at the p=0.05 level in this regression model? (Select all that apply.)

FluTrend2 <- lm(log(ILI) ~ Queries + log(ILILag2), data = FluTrain)
summary(FluTrend2)
## 
## Call:
## lm(formula = log(ILI) ~ Queries + log(ILILag2), data = FluTrain)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52209 -0.11082 -0.01819  0.08143  0.76785 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.24064    0.01953  -12.32   <2e-16 ***
## Queries       1.25578    0.07910   15.88   <2e-16 ***
## log(ILILag2)  0.65569    0.02251   29.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1703 on 412 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.9063, Adjusted R-squared:  0.9059 
## F-statistic:  1993 on 2 and 412 DF,  p-value: < 2.2e-16
summary(FluTrend1)
## 
## Call:
## lm(formula = log(ILI) ~ Queries, data = FluTrain)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.76003 -0.19696 -0.01657  0.18685  1.06450 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.49934    0.03041  -16.42   <2e-16 ***
## Queries      2.96129    0.09312   31.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2995 on 415 degrees of freedom
## Multiple R-squared:  0.709,  Adjusted R-squared:  0.7083 
## F-statistic:  1011 on 1 and 415 DF,  p-value: < 2.2e-16

Problem 5.1 Time series on Test set

So far, we have only added the ILILag2 variable to the FluTrain data frame. To make predictions with our FluTrend2 model, we will also need to add ILILag2 to the FluTest data frame (note that adding variables before splitting into a training and testing set can prevent this duplication of effort).

Modify the code from the previous subproblem to add an ILILag2 variable to the FluTest data frame. How many missing values are there in this new variable?

ILILag2.1 <- lag(zoo(FluTest$ILI), -2, na.pad=TRUE)
FluTest$ILILag2.1 <- coredata(ILILag2.1)

Evaluating time series

Fill in the missing values for ILILag2 in FluTest. In terms of syntax, you could set the value of ILILag2 in row “x” of the FluTest data frame to the value of ILI in row “y” of the FluTrain data frame with “FluTest\(ILILag2[x] = FluTrain\)ILI[y]”. Use the answer to the previous questions to determine the appropriate values of “x” and “y”. It may be helpful to check the total number of rows in FluTrain using str(FluTrain) or nrow(FluTrain).

What is the new value of the ILILag2 variable in the first row of FluTest?

str(FluTrain)
## 'data.frame':    417 obs. of  4 variables:
##  $ Week   : chr  "2004-01-04 - 2004-01-10" "2004-01-11 - 2004-01-17" "2004-01-18 - 2004-01-24" "2004-01-25 - 2004-01-31" ...
##  $ ILI    : num  2.42 1.81 1.71 1.54 1.44 ...
##  $ Queries: num  0.238 0.22 0.226 0.238 0.224 ...
##  $ ILILag2: num  NA NA 2.42 1.81 1.71 ...
nrow(FluTrain)
## [1] 417
#Thus

FluTest$ILILag2.1[1] = FluTrain$ILI[416]
FluTrain$ILI[416]
## [1] 1.852736
FluTest$ILILag2.1[2] = FluTrain$ILI[417]
FluTrain$ILI[417]
## [1] 2.12413

Time series perform test

Obtain test set predictions of the ILI variable from the FluTrend2 model, again remembering to call the exp() function on the result of the predict() function to obtain predictions for ILI instead of log(ILI).

What is the test-set RMSE of the FluTrend2 model?

colnames(FluTest)[colnames(FluTest) == 'ILILag2.1'] <- 'ILILag2'

PredTest2 <- exp(predict(FluTrend2, newdata = FluTest))

RMSE_2 <- sqrt(mean((PredTest2 - FluTest$ILI)^2))
RMSE_2
## [1] 0.2942029

In this problem, we used a simple time series model with a single lag term. ARIMA models are a more general form of the model we built, which can include multiple lag terms as well as more complicated combinations of previous values of the dependent variable. If you’re interested in learning more, check out ?arima or the available online tutorials for these sorts of models.

ASSIGNMENT OFFICIALLY ENDS HERE – Optional tasks 2 more! if you wish