QUESTION 1

United Nations (Data file: UN11) The data in the file UN11 contains several variables, including ppgdp, the gross national product per person in U.S. dollars, and fertility, the birth rate per 1000 females, both from the year 2009. The data are for 199 localities, mostly UN member countries, but also other areas such as Hong Kong that are not independent countries. The data were collected from the United Nations (2011). We will study the dependence of fertility on ppgdp.

1.1.1. Identify the predictor and the response.

Predictor (x): PPGDP

Response (y): Fertility

1.1.2 Draw the scatterplot of fertility on the vertical axis versus ppgdp on the horizontal axis and summarize the information in this graph.

library(alr4)
?UN11

plot(x=UN11$ppgdp, y=UN11$fertility)

The scatterplot seems to suggest that the higher the PPGDP, the lower the number of children per woman.

Does a straight-line mean function seem to be plausible for a summary of this graph?

ggplot(data=UN11, aes(x=ppgdp, y=fertility))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)

Yes, straight line does suggest there is function.

1.1.3 Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Does the simple linear regression model seem plausible for a summary of this graph? If you use a different base of logarithms, the shape of the graph won’t change, but the values on the axes will change.

UN11_log_ppgdp<- log(UN11$ppgdp)
UN11_log_fertility<- log(UN11$fertility)

plot(x=UN11_log_ppgdp, y=UN11_log_fertility)

The scatterplot using logs still suggests that number of children per woman decreases as PPGDP increases,but this new model represents the data more cleanly.

QUESTION 2

Annual income, in dollars, is an explanatory variable in a regression analysis. For a British version of the report on the analysis, all responses are converted to British pounds sterling (1 pound equals about 1.33 dollars, as of 2016).

(a) How, if at all, does the slope of the prediction equation change?

usdollar<- (1:10)
pound<- seq(1.33,13.3, length.out = 10)

slope<-(usdollar/pound)
slope

 [1] 0.7518797 0.7518797 0.7518797 0.7518797 0.7518797 0.7518797
 [7] 0.7518797 0.7518797 0.7518797 0.7518797

I didn’t initially know how to solve this, so I put together sample of values for dollars and pounds. I calculated the slope and can see that it will not change.

(b) How, if at all, does the correlation change?

cor.test(usdollar,pound)


    Pearson's product-moment correlation

data:  usdollar and pound
t = 189812531, df = 8, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 1 1
sample estimates:
cor 
  1

The correlation test of the sample data shows that the correlation is 1- perfect correlation. It makes sense that the correlation will never change.

QUESTION 3

Waterrunoff in the Sierras (Data file: water) Can Southern California’s water supply in future years be predicted from past data? One factor affecting water availability is stream runoff. If runoff could be predicted, engineers, planners, and policy makers could do their jobs more efficiently. The data file contains 43 years’ worth of precipitation measurements taken at six sites in the Sierra Nevada mountains (labeled APMAM, APSAB, APSLAKE, OPBPC, OPRC, and OPSLAKE) and stream runoff volume at a site near Bishop, California, labeled BSAAM. Draw the scatterplot matrix for these data and summarize the information available from these plots.

?water

summary(water)

      Year          APMAM            APSAB           APSLAKE     
 Min.   :1948   Min.   : 2.700   Min.   : 1.450   Min.   : 1.77  
 1st Qu.:1958   1st Qu.: 4.975   1st Qu.: 3.390   1st Qu.: 3.36  
 Median :1969   Median : 7.080   Median : 4.460   Median : 4.62  
 Mean   :1969   Mean   : 7.323   Mean   : 4.652   Mean   : 4.93  
 3rd Qu.:1980   3rd Qu.: 9.115   3rd Qu.: 5.685   3rd Qu.: 5.83  
 Max.   :1990   Max.   :18.080   Max.   :11.960   Max.   :13.02  
     OPBPC             OPRC           OPSLAKE           BSAAM       
 Min.   : 4.050   Min.   : 4.350   Min.   : 4.600   Min.   : 41785  
 1st Qu.: 7.975   1st Qu.: 7.875   1st Qu.: 8.705   1st Qu.: 59857  
 Median : 9.550   Median :11.110   Median :12.140   Median : 69177  
 Mean   :12.836   Mean   :12.002   Mean   :13.522   Mean   : 77756  
 3rd Qu.:16.545   3rd Qu.:14.975   3rd Qu.:16.920   3rd Qu.: 92206  
 Max.   :43.370   Max.   :24.850   Max.   :33.070   Max.   :146345

head(water)

  Year APMAM APSAB APSLAKE OPBPC  OPRC OPSLAKE  BSAAM
1 1948  9.13  3.58    3.91  4.10  7.43    6.47  54235
2 1949  5.28  4.82    5.20  7.55 11.11   10.26  67567
3 1950  4.20  3.77    3.67  9.52 12.20   11.35  66161
4 1951  4.60  4.46    3.93 11.14 15.15   11.13  68094
5 1952  7.15  4.99    4.88 16.34 20.05   22.81 107080
6 1953  9.70  5.65    4.91  8.88  8.15    7.41  67594

pairs(~APMAM + APSAB + APSLAKE + OPBPC + OPRC + OPSLAKE, data=water)

plot(y=water$BSAAM,x=water$APMAM)

plot(y=water$BSAAM,x=water$APSAB)

plot(y=water$BSAAM,x=water$APSLAKE)

plot(y=water$BSAAM,x=water$OPBPC)

plot(y=water$BSAAM,x=water$OPRC)

plot(y=water$BSAAM,x=water$OPSLAKE)

I successfully created a scatterplot matrix above, but I had trouble reading the data so I also ran each of the scatterplots separately. There seems to be very strong correlation for OPBPC, OPRC and OPSLAKE between snowfall in inches and stream runoff. For the other sites, more snowfall does seem to indicate more stream runoff, but the correlation is not as great.

QUESTION 4

Professor ratings (Data file: Rateprof) In the website and online forum RateMyProfessors.com, students rate and comment on their instructors. Launched in 1999, the site includes millions of ratings on thousands of instructors. The data file includes the summaries of the ratings of 364 instructors at a large campus in the Midwest (Bleske-Rechek and Fritsch, 2011). Each instructor included in the data had at least 10 ratings over a several year period. Students provided ratings of 1–5 on quality, helpfulness, clarity, easiness of instructor’s courses, and raterInterest in the subject matter covered in the instructor’s courses. The data file provides the averages of these five ratings. Use R to reproduce the scatterplot matrix in Figure 1.13 in the ALR book (page 20). Provide a brief description of the relationships between the five ratings. (The variables don’t have to be in the same order)

summary(Rateprof)

    gender       numYears        numRaters       numCourses    
 female:159   Min.   : 1.000   Min.   :10.00   Min.   : 1.000  
 male  :207   1st Qu.: 6.000   1st Qu.:15.00   1st Qu.: 3.000  
              Median :10.000   Median :24.00   Median : 4.000  
              Mean   : 8.347   Mean   :28.58   Mean   : 4.251  
              3rd Qu.:11.000   3rd Qu.:37.00   3rd Qu.: 5.000  
              Max.   :11.000   Max.   :86.00   Max.   :12.000  
                                                               
 pepper       discipline          dept        quality     
 no :320   Hum     :134   English   : 49   Min.   :1.409  
 yes: 46   SocSci  : 66   Math      : 34   1st Qu.:2.936  
           STEM    :103   Biology   : 20   Median :3.612  
           Pre-prof: 63   Chemistry : 20   Mean   :3.575  
                          Psychology: 20   3rd Qu.:4.250  
                          Spanish   : 20   Max.   :4.981  
                          (Other)   :203                  
  helpfulness       clarity         easiness     raterInterest  
 Min.   :1.364   Min.   :1.333   Min.   :1.391   Min.   :1.098  
 1st Qu.:3.069   1st Qu.:2.871   1st Qu.:2.548   1st Qu.:2.934  
 Median :3.662   Median :3.600   Median :3.148   Median :3.305  
 Mean   :3.631   Mean   :3.525   Mean   :3.135   Mean   :3.310  
 3rd Qu.:4.351   3rd Qu.:4.214   3rd Qu.:3.692   3rd Qu.:3.692  
 Max.   :5.000   Max.   :5.000   Max.   :4.900   Max.   :4.909  
                                                                
   sdQuality       sdHelpfulness      sdClarity        sdEasiness    
 Min.   :0.09623   Min.   :0.0000   Min.   :0.0000   Min.   :0.3162  
 1st Qu.:0.87508   1st Qu.:0.9902   1st Qu.:0.9085   1st Qu.:0.9045  
 Median :1.15037   Median :1.2860   Median :1.1712   Median :1.0247  
 Mean   :1.05610   Mean   :1.1719   Mean   :1.0970   Mean   :1.0196  
 3rd Qu.:1.28730   3rd Qu.:1.4365   3rd Qu.:1.3328   3rd Qu.:1.1485  
 Max.   :1.67739   Max.   :1.8091   Max.   :1.8091   Max.   :1.6293  
                                                                     
 sdRaterInterest 
 Min.   :0.3015  
 1st Qu.:1.0848  
 Median :1.2167  
 Mean   :1.1965  
 3rd Qu.:1.3326  
 Max.   :1.7246

head(Rateprof)

  gender numYears numRaters numCourses pepper discipline
1   male        7        11          5     no        Hum
2   male        6        11          5     no        Hum
3   male       10        43          2     no        Hum
4   male       11        24          5     no        Hum
5   male       11        19          7     no        Hum
6   male       10        15          9     no        Hum
               dept  quality helpfulness  clarity easiness
1           English 4.636364    4.636364 4.636364 4.818182
2 Religious Studies 4.318182    4.545455 4.090909 4.363636
3               Art 4.790698    4.720930 4.860465 4.604651
4           English 4.250000    4.458333 4.041667 2.791667
5           Spanish 4.684211    4.684211 4.684211 4.473684
6           Spanish 4.233333    4.266667 4.200000 4.533333
  raterInterest sdQuality sdHelpfulness sdClarity sdEasiness
1      3.545455 0.5518564     0.6741999 0.5045250  0.4045199
2      4.000000 0.9020179     0.9341987 0.9438798  0.5045250
3      3.432432 0.4529343     0.6663898 0.4129681  0.5407021
4      3.181818 0.9325048     0.9315329 0.9990938  0.5882300
5      4.214286 0.6500112     0.8200699 0.5823927  0.6117753
6      3.916667 0.8632717     1.0327956 0.7745967  0.6399405
  sdRaterInterest
1       1.1281521
2       1.0744356
3       1.2369438
4       1.3322506
5       0.9749613
6       0.6685579

?Rateprof

pairs(~quality + clarity + helpfulness + easiness + raterInterest, data=Rateprof)

Please note I used watched this video helped me create this scatterplot matrix https://www.youtube.com/watch?v=AY9PYzJtCNA

It looks like quality, clarity and helpfulness are related. It looks like professors who excel at one of these things, excel at all three. Perhaps the quality of the professor is very good when the professor exercises great clarity and helpfulness. It does not look like there is any relationship between a professor being easy and quality, clarity or helpfulness. It also looks like there is not a relationship between rate of interest and quality, clarity of helpfulness of a professor.

QUESTION 5

For the student.survey data file in the smss package

#install.packages(“smss”) #install.packages(“alr4”) #install.packages(“car”) #install.packages(“effects”) #install.packages(“carData”) #install.packages(“r package”, repos = “http://cran.us.r-project.org”)

install.packages("r package", repos = "http://cran.us.r-project.org")


library(smss)

data(student.survey)

summary(student.survey)

      subj       ge           ag              hi       
 Min.   : 1.00   f:31   Min.   :22.00   Min.   :2.000  
 1st Qu.:15.75   m:29   1st Qu.:24.00   1st Qu.:3.000  
 Median :30.50          Median :26.50   Median :3.350  
 Mean   :30.50          Mean   :29.17   Mean   :3.308  
 3rd Qu.:45.25          3rd Qu.:31.00   3rd Qu.:3.625  
 Max.   :60.00          Max.   :71.00   Max.   :4.000  
                                                       
       co              dh             dr               tv        
 Min.   :2.600   Min.   :   0   Min.   : 0.200   Min.   : 0.000  
 1st Qu.:3.175   1st Qu.: 205   1st Qu.: 1.450   1st Qu.: 3.000  
 Median :3.500   Median : 640   Median : 2.000   Median : 6.000  
 Mean   :3.453   Mean   :1232   Mean   : 3.818   Mean   : 7.267  
 3rd Qu.:3.725   3rd Qu.:1350   3rd Qu.: 5.000   3rd Qu.:10.000  
 Max.   :4.000   Max.   :8000   Max.   :20.000   Max.   :37.000  
                                                                 
       sp               ne               ah             ve         
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Mode :logical  
 1st Qu.: 3.000   1st Qu.: 2.000   1st Qu.: 0.000   FALSE:60       
 Median : 5.000   Median : 3.000   Median : 0.500                  
 Mean   : 5.483   Mean   : 4.083   Mean   : 1.433                  
 3rd Qu.: 7.000   3rd Qu.: 5.250   3rd Qu.: 2.000                  
 Max.   :16.000   Max.   :14.000   Max.   :11.000                  
                                                                   
 pa                         pi                re         ab         
 d:21   very liberal         : 8   never       :15   Mode :logical  
 i:24   liberal              :24   occasionally:29   FALSE:60       
 r:15   slightly liberal     : 6   most weeks  : 7                  
        moderate             :10   every week  : 9                  
        slightly conservative: 6                                    
        conservative         : 4                                    
        very conservative    : 2                                    
     aa              ld         
 Mode :logical   Mode :logical  
 FALSE:59        FALSE:44       
 NA's :1         NA's :16

head(student.survey)

  subj ge ag  hi  co   dh   dr tv sp ne ah    ve pa           pi
1    1  m 32 2.2 3.5    0  5.0  3  5  0  0 FALSE  r conservative
2    2  f 23 2.1 3.5 1200  0.3 15  7  5  6 FALSE  d      liberal
3    3  f 27 3.3 3.0 1300  1.5  0  4  3  0 FALSE  d      liberal
4    4  f 35 3.5 3.2 1500  8.0  5  5  6  3 FALSE  i     moderate
5    5  m 23 3.1 3.5 1600 10.0  6  6  3  0 FALSE  i very liberal
6    6  m 39 3.5 3.5  350  3.0  4  5  7  0 FALSE  d      liberal
            re    ab    aa    ld
1   most weeks FALSE FALSE FALSE
2 occasionally FALSE FALSE    NA
3   most weeks FALSE FALSE    NA
4 occasionally FALSE FALSE FALSE
5        never FALSE FALSE FALSE
6 occasionally FALSE FALSE    NA

?student.survey

conduct regression analyses relating (i) y = political ideology and x = religiosity,

Initially I received a lot of errors when I tried to plot this data. So I used various ways to remove na data, opting for na.omit Then I realized that the data was not numerical, so I recoded the data. “Very Conservative” to “Very Liberal” became 1 through 7 for “Political Ideology”. “Never” to “Every Week” became 0-3 for “how often you attend religious services.”

Then I could plot the data.

#install.packages(“dplyr”)

install.packages("r package", repos = "http://cran.us.r-project.org")

data("student.survey")

is.na(student.survey)

       subj    ge    ag    hi    co    dh    dr    tv    sp    ne
 [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[31,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[32,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[34,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[35,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[36,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[38,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[39,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[40,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[41,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[42,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[43,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[44,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[45,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[46,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[47,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[48,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[50,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[51,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[52,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[53,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[54,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[55,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[56,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[57,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[58,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[59,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[60,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
         ah    ve    pa    pi    re    ab    aa    ld
 [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
[18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[31,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[32,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[34,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[35,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[36,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[38,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[39,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[40,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[41,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[42,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[43,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[44,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[45,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[46,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[47,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[48,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[50,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[51,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[52,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[53,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[54,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[55,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[56,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[57,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[58,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[59,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[60,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

student.survey<- na.omit(student.survey)
summary(student.survey)

      subj       ge           ag              hi       
 Min.   : 1.00   f:22   Min.   :22.00   Min.   :2.200  
 1st Qu.:17.00   m:21   1st Qu.:24.00   1st Qu.:3.000  
 Median :31.00          Median :27.00   Median :3.300  
 Mean   :31.19          Mean   :29.23   Mean   :3.305  
 3rd Qu.:46.50          3rd Qu.:31.00   3rd Qu.:3.650  
 Max.   :60.00          Max.   :71.00   Max.   :4.000  
                                                       
       co              dh             dr               tv        
 Min.   :2.600   Min.   :   0   Min.   : 0.200   Min.   : 0.000  
 1st Qu.:3.200   1st Qu.: 180   1st Qu.: 1.500   1st Qu.: 2.000  
 Median :3.500   Median : 630   Median : 2.000   Median : 5.000  
 Mean   :3.493   Mean   :1333   Mean   : 4.186   Mean   : 6.756  
 3rd Qu.:3.800   3rd Qu.:1650   3rd Qu.: 5.000   3rd Qu.: 8.000  
 Max.   :4.000   Max.   :8000   Max.   :20.000   Max.   :37.000  
                                                                 
       sp               ne               ah             ve         
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Mode :logical  
 1st Qu.: 3.000   1st Qu.: 2.000   1st Qu.: 0.000   FALSE:43       
 Median : 5.000   Median : 3.000   Median : 1.000                  
 Mean   : 6.023   Mean   : 3.953   Mean   : 1.465                  
 3rd Qu.: 7.500   3rd Qu.: 5.000   3rd Qu.: 2.000                  
 Max.   :16.000   Max.   :14.000   Max.   :11.000                  
                                                                   
 pa                         pi                re         ab         
 d:11   very liberal         : 6   never       :12   Mode :logical  
 i:19   liberal              :14   occasionally:18   FALSE:43       
 r:13   slightly liberal     : 4   most weeks  : 5                  
        moderate             : 8   every week  : 8                  
        slightly conservative: 6                                    
        conservative         : 3                                    
        very conservative    : 2                                    
     aa              ld         
 Mode :logical   Mode :logical  
 FALSE:43        FALSE:43

head(student.survey)

   subj ge ag  hi  co   dh   dr tv sp ne ah    ve pa               pi
1     1  m 32 2.2 3.5    0  5.0  3  5  0  0 FALSE  r     conservative
4     4  f 35 3.5 3.2 1500  8.0  5  5  6  3 FALSE  i         moderate
5     5  m 23 3.1 3.5 1600 10.0  6  6  3  0 FALSE  i     very liberal
7     7  m 24 3.6 3.7    0  0.2  5 12  4  2 FALSE  i          liberal
8     8  f 31 3.0 3.0 5000  1.5  5  3  3  1 FALSE  i          liberal
10   10  m 28 4.0 3.1  900  2.0  1  1  2  1 FALSE  i slightly liberal
             re    ab    aa    ld
1    most weeks FALSE FALSE FALSE
4  occasionally FALSE FALSE FALSE
5         never FALSE FALSE FALSE
7  occasionally FALSE FALSE FALSE
8  occasionally FALSE FALSE FALSE
10        never FALSE FALSE FALSE

library(dplyr)


student.survey$pi<- recode(student.survey$pi,
                  "1" = "very conservative",
                  "2" = "conservative",
                  "3" = "slightly conservative",
                  "4" = "moderate",
                  "5" = "slightly liberal",
                  "6" = "liberal",
                  "7" = "very liberal")

student.survey$re<- recode(student.survey$re,
                  "0" = "never",
                  "1" = "occasionally",
                  "2" = "most weeks",
                  "3" = "every week")

plot(x = student.survey$re, y = student.survey$pi)

It was not clear exactly what this model meant, so I tried another way.

(a) Use graphical ways to portray the individual variables and their relationship.

ggplot(data=student.survey, aes(x=re, y=pi))+
  geom_point()

(b) Interpret descriptive statistics for summarizing the individual variables and their relationship.

This model is unusual, but it does seem to indicate that at as the frequency of attending religious services grew, so did the number associated with political ideology.

(c) Summarize and interpret results of inferential analyses.

People who attend church often are more likely to leave conservative, than people who attend church never or only occasionally.

conduct regression analyses relating (ii) y = high school GPA and x = hours of TV watching.

plot(x = student.survey$tv, y = student.survey$hi)

(a) Use graphical ways to portray the individual variables and their relationship.

ggplot(data=student.survey, aes(x=tv, y=hi))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)

(b) Interpret descriptive statistics for summarizing the individual variables and their relationship.

I do not see a significant relationship here.

(c) Summarize and interpret results of inferential analyses.

While a few outliers- who watch a lot of TV seem to have below average High School GPAs, most people in this study fall between a 3.0 and a 4.0 GPA and average number of hours of tv does not seem to affect GPA.

QUESTION 6

For a class of 100 students, the teacher takes the 10 students who perform poorest on the midterm exam and enrolls them in a special tutoring program. The overall class mean is 70 on both the midterm and final, but the mean for the specially tutored students increases from 50 to 60. Use the concept of regression toward the mean to explain why this is not sufficient evidence to imply that the tutoring program was successful.

Maybe the 10 students who performed the poorest on the midterm would have had an increase from 50 to 60 for the final, regardless of the tutoring. Perhaps they would have been so aware and concerned that they had performed poorly on the midterm, that they would have made an extra effort to do better on the next (and final) exam. While the students that did better or even just average for the class- would not have felt the same drive to try harder.

https://www.youtube.com/watch?v=1tSqSMOyNFE

There is always a level of chance. “You should not expect them to be as unlucky when you test them a second time…their scores should improve just based on random chance.”