Installing packages and loading data

library(smss)
library(car)

## Loading required package: carData

library(alr4)

## Loading required package: effects

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# loading data
data(UN11)
head(UN11)

##                 region  group fertility   ppgdp lifeExpF pctUrban
## Afghanistan       Asia  other     5.968   499.0    49.49       23
## Albania         Europe  other     1.525  3677.2    80.40       53
## Algeria         Africa africa     2.142  4473.0    75.00       67
## Angola          Africa africa     5.135  4321.9    53.17       59
## Anguilla     Caribbean  other     2.000 13750.1    81.10      100
## Argentina   Latin Amer  other     2.172  9162.1    79.89       93

1.1

Fertility is the predictor while ppgdp is the response variable

1.2

# Scatter plot of fertility vs ppgdp
plot(UN11$fertility,UN11$ppgdp,
     main='Fertility Versus PPGDP',
     xlab='ppgdp',ylab='Fertility')

A Straight line function would not be plausible for a summary of this graph. The model appears more curvilinear.

1.3

# Scatterplot using natural logs
plot(log(UN11$fertility),log(UN11$ppgdp),
     main='Fertility Versus PPGDP Using Natural Log',
     xlab='ppgdp',ylab='Fertility')

A linear model would be plausible for a summary given the shape of this graph.

# Plotting linear model of fertility vs ppgdp
plot(lm(formula = log(fertility) ~ log(ppgdp), data = UN11))

2

How, if at all, does the slope of the prediction equation change?

The slope of the prediction equation would remain the same but the intercept will change.

How, if at all, does the correlation change?

Correlation would not change after converting USD to pounds.

3

# loading data
data(water)
head(water)

##   Year APMAM APSAB APSLAKE OPBPC  OPRC OPSLAKE  BSAAM
## 1 1948  9.13  3.58    3.91  4.10  7.43    6.47  54235
## 2 1949  5.28  4.82    5.20  7.55 11.11   10.26  67567
## 3 1950  4.20  3.77    3.67  9.52 12.20   11.35  66161
## 4 1951  4.60  4.46    3.93 11.14 15.15   11.13  68094
## 5 1952  7.15  4.99    4.88 16.34 20.05   22.81 107080
## 6 1953  9.70  5.65    4.91  8.88  8.15    7.41  67594

# Creating matrix of sites with runoff
pairs(water[2:8], pch = 19)

Based on this matrix of percipitation measurements in various sites vs runoff volume, a linear relationship between percipitation and runoff appears to occur in OPSLAKE, OPRC, and OPBPC. This means the other sites, APSLAKE, APSAB and APMAM are at greater risk of drought.

4

# Loading data
data("Rateprof")
head(Rateprof)

##   gender numYears numRaters numCourses pepper discipline              dept
## 1   male        7        11          5     no        Hum           English
## 2   male        6        11          5     no        Hum Religious Studies
## 3   male       10        43          2     no        Hum               Art
## 4   male       11        24          5     no        Hum           English
## 5   male       11        19          7     no        Hum           Spanish
## 6   male       10        15          9     no        Hum           Spanish
##    quality helpfulness  clarity easiness raterInterest sdQuality sdHelpfulness
## 1 4.636364    4.636364 4.636364 4.818182      3.545455 0.5518564     0.6741999
## 2 4.318182    4.545455 4.090909 4.363636      4.000000 0.9020179     0.9341987
## 3 4.790698    4.720930 4.860465 4.604651      3.432432 0.4529343     0.6663898
## 4 4.250000    4.458333 4.041667 2.791667      3.181818 0.9325048     0.9315329
## 5 4.684211    4.684211 4.684211 4.473684      4.214286 0.6500112     0.8200699
## 6 4.233333    4.266667 4.200000 4.533333      3.916667 0.8632717     1.0327956
##   sdClarity sdEasiness sdRaterInterest
## 1 0.5045250  0.4045199       1.1281521
## 2 0.9438798  0.5045250       1.0744356
## 3 0.4129681  0.5407021       1.2369438
## 4 0.9990938  0.5882300       1.3322506
## 5 0.5823927  0.6117753       0.9749613
## 6 0.7745967  0.6399405       0.6685579

# Creating matrix of quality, helpfulness, clarity, easiness and interest by raters
pairs(Rateprof[8:12], pch = 1)

A strong positive linear correlation appears to occur amongst quanlity, helpfulness and clarity alike. Moderate correlations occur between easiness and clarity, easiness and helpfulness and easiness and quality. Rater interest appears to have a weak correlation across the board, especially considering easiness.

5

# Loading data
data("student.survey")
head(student.survey)

##   subj ge ag  hi  co   dh   dr tv sp ne ah    ve pa           pi           re
## 1    1  m 32 2.2 3.5    0  5.0  3  5  0  0 FALSE  r conservative   most weeks
## 2    2  f 23 2.1 3.5 1200  0.3 15  7  5  6 FALSE  d      liberal occasionally
## 3    3  f 27 3.3 3.0 1300  1.5  0  4  3  0 FALSE  d      liberal   most weeks
## 4    4  f 35 3.5 3.2 1500  8.0  5  5  6  3 FALSE  i     moderate occasionally
## 5    5  m 23 3.1 3.5 1600 10.0  6  6  3  0 FALSE  i very liberal        never
## 6    6  m 39 3.5 3.5  350  3.0  4  5  7  0 FALSE  d      liberal occasionally
##      ab    aa    ld
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE    NA
## 3 FALSE FALSE    NA
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE    NA

?student.survey
str(student.survey)

## 'data.frame':    60 obs. of  18 variables:
##  $ subj: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ ge  : Factor w/ 2 levels "f","m": 2 1 1 1 2 2 2 1 2 2 ...
##  $ ag  : int  32 23 27 35 23 39 24 31 34 28 ...
##  $ hi  : num  2.2 2.1 3.3 3.5 3.1 3.5 3.6 3 3 4 ...
##  $ co  : num  3.5 3.5 3 3.2 3.5 3.5 3.7 3 3 3.1 ...
##  $ dh  : int  0 1200 1300 1500 1600 350 0 5000 5000 900 ...
##  $ dr  : num  5 0.3 1.5 8 10 3 0.2 1.5 2 2 ...
##  $ tv  : num  3 15 0 5 6 4 5 5 7 1 ...
##  $ sp  : int  5 7 4 5 6 5 12 3 5 1 ...
##  $ ne  : int  0 5 3 6 3 7 4 3 3 2 ...
##  $ ah  : int  0 6 0 3 0 0 2 1 0 1 ...
##  $ ve  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ pa  : Factor w/ 3 levels "d","i","r": 3 1 1 2 2 1 2 2 2 2 ...
##  $ pi  : Ord.factor w/ 7 levels "very liberal"<..: 6 2 2 4 1 2 2 2 1 3 ...
##  $ re  : Ord.factor w/ 4 levels "never"<"occasionally"<..: 3 2 3 2 1 2 2 2 2 1 ...
##  $ ab  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ aa  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ ld  : logi  FALSE NA NA FALSE FALSE NA ...

a

# renaming variables in dataframe 
student.survey.gg <- rename(student.survey, Religiosity = re, Political_Ideology = pi)

# Bar graph of religiosity and politcal ideology 
ggplot(data = student.survey.gg, aes(x = Religiosity, fill = Political_Ideology)) +
  geom_bar(position = "fill") + 
  ggtitle("Religiosity vs Political Ideology")

# renaming variables in dataframe 
student.survey.gg <- rename(student.survey, Hours_of_TV = tv, Highschool_GPA = hi)

# point graph of hours of TV and highschool GPA 
ggplot(data = student.survey.gg, aes(x = Hours_of_TV, y = Highschool_GPA)) +
  geom_point() + 
  ggtitle("Hours of Television vs Highscool GPA")

b

# Linear model for political ideology and religiosity
summary(lm(as.numeric(pi) ~ as.numeric(re), 
         data = student.survey))

## 
## Call:
## lm(formula = as.numeric(pi) ~ as.numeric(re), data = student.survey)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.81243 -0.87160  0.09882  1.12840  3.09882 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.9308     0.4252   2.189   0.0327 *  
## as.numeric(re)   0.9704     0.1792   5.416 1.22e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.345 on 58 degrees of freedom
## Multiple R-squared:  0.3359, Adjusted R-squared:  0.3244 
## F-statistic: 29.34 on 1 and 58 DF,  p-value: 1.221e-06

# Linear model for hours of television and GPA
summary(lm(hi ~ tv, data = student.survey))

## 
## Call:
## lm(formula = hi ~ tv, data = student.survey)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2583 -0.2456  0.0417  0.3368  0.7051 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.441353   0.085345  40.323   <2e-16 ***
## tv          -0.018305   0.008658  -2.114   0.0388 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4467 on 58 degrees of freedom
## Multiple R-squared:  0.07156,    Adjusted R-squared:  0.05555 
## F-statistic: 4.471 on 1 and 58 DF,  p-value: 0.03879

Religiosity is positively and statisticaly significant at the 0.01 level, therefor, an increase in relgiosity is associated with an increase in conservatisism. Hours of television is negatively and statisticaly significant at the .05 level, meaning as hours of television increase, GPA decreaseses.

HW_3_quant

Keith Bell

11/28/2022

Installing packages and loading data

1.1

1.2

1.3

2

3

4

5

a

b