library(smss)
library(car)
## Loading required package: carData
library(alr4)
## Loading required package: effects
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# loading data
data(UN11)
head(UN11)
## region group fertility ppgdp lifeExpF pctUrban
## Afghanistan Asia other 5.968 499.0 49.49 23
## Albania Europe other 1.525 3677.2 80.40 53
## Algeria Africa africa 2.142 4473.0 75.00 67
## Angola Africa africa 5.135 4321.9 53.17 59
## Anguilla Caribbean other 2.000 13750.1 81.10 100
## Argentina Latin Amer other 2.172 9162.1 79.89 93
Fertility is the predictor while ppgdp is the response variable
# Scatter plot of fertility vs ppgdp
plot(UN11$fertility,UN11$ppgdp,
main='Fertility Versus PPGDP',
xlab='ppgdp',ylab='Fertility')
A Straight line function would not be plausible for a summary of this graph. The model appears more curvilinear.
# Scatterplot using natural logs
plot(log(UN11$fertility),log(UN11$ppgdp),
main='Fertility Versus PPGDP Using Natural Log',
xlab='ppgdp',ylab='Fertility')
A linear model would be plausible for a summary given the shape of this graph.
# Plotting linear model of fertility vs ppgdp
plot(lm(formula = log(fertility) ~ log(ppgdp), data = UN11))
The slope of the prediction equation would remain the same but the intercept will change.
Correlation would not change after converting USD to pounds.
# loading data
data(water)
head(water)
## Year APMAM APSAB APSLAKE OPBPC OPRC OPSLAKE BSAAM
## 1 1948 9.13 3.58 3.91 4.10 7.43 6.47 54235
## 2 1949 5.28 4.82 5.20 7.55 11.11 10.26 67567
## 3 1950 4.20 3.77 3.67 9.52 12.20 11.35 66161
## 4 1951 4.60 4.46 3.93 11.14 15.15 11.13 68094
## 5 1952 7.15 4.99 4.88 16.34 20.05 22.81 107080
## 6 1953 9.70 5.65 4.91 8.88 8.15 7.41 67594
# Creating matrix of sites with runoff
pairs(water[2:8], pch = 19)
Based on this matrix of percipitation measurements in various sites vs runoff volume, a linear relationship between percipitation and runoff appears to occur in OPSLAKE, OPRC, and OPBPC. This means the other sites, APSLAKE, APSAB and APMAM are at greater risk of drought.
# Loading data
data("Rateprof")
head(Rateprof)
## gender numYears numRaters numCourses pepper discipline dept
## 1 male 7 11 5 no Hum English
## 2 male 6 11 5 no Hum Religious Studies
## 3 male 10 43 2 no Hum Art
## 4 male 11 24 5 no Hum English
## 5 male 11 19 7 no Hum Spanish
## 6 male 10 15 9 no Hum Spanish
## quality helpfulness clarity easiness raterInterest sdQuality sdHelpfulness
## 1 4.636364 4.636364 4.636364 4.818182 3.545455 0.5518564 0.6741999
## 2 4.318182 4.545455 4.090909 4.363636 4.000000 0.9020179 0.9341987
## 3 4.790698 4.720930 4.860465 4.604651 3.432432 0.4529343 0.6663898
## 4 4.250000 4.458333 4.041667 2.791667 3.181818 0.9325048 0.9315329
## 5 4.684211 4.684211 4.684211 4.473684 4.214286 0.6500112 0.8200699
## 6 4.233333 4.266667 4.200000 4.533333 3.916667 0.8632717 1.0327956
## sdClarity sdEasiness sdRaterInterest
## 1 0.5045250 0.4045199 1.1281521
## 2 0.9438798 0.5045250 1.0744356
## 3 0.4129681 0.5407021 1.2369438
## 4 0.9990938 0.5882300 1.3322506
## 5 0.5823927 0.6117753 0.9749613
## 6 0.7745967 0.6399405 0.6685579
# Creating matrix of quality, helpfulness, clarity, easiness and interest by raters
pairs(Rateprof[8:12], pch = 1)
A strong positive linear correlation appears to occur amongst quanlity, helpfulness and clarity alike. Moderate correlations occur between easiness and clarity, easiness and helpfulness and easiness and quality. Rater interest appears to have a weak correlation across the board, especially considering easiness.
# Loading data
data("student.survey")
head(student.survey)
## subj ge ag hi co dh dr tv sp ne ah ve pa pi re
## 1 1 m 32 2.2 3.5 0 5.0 3 5 0 0 FALSE r conservative most weeks
## 2 2 f 23 2.1 3.5 1200 0.3 15 7 5 6 FALSE d liberal occasionally
## 3 3 f 27 3.3 3.0 1300 1.5 0 4 3 0 FALSE d liberal most weeks
## 4 4 f 35 3.5 3.2 1500 8.0 5 5 6 3 FALSE i moderate occasionally
## 5 5 m 23 3.1 3.5 1600 10.0 6 6 3 0 FALSE i very liberal never
## 6 6 m 39 3.5 3.5 350 3.0 4 5 7 0 FALSE d liberal occasionally
## ab aa ld
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE NA
## 3 FALSE FALSE NA
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE NA
?student.survey
str(student.survey)
## 'data.frame': 60 obs. of 18 variables:
## $ subj: int 1 2 3 4 5 6 7 8 9 10 ...
## $ ge : Factor w/ 2 levels "f","m": 2 1 1 1 2 2 2 1 2 2 ...
## $ ag : int 32 23 27 35 23 39 24 31 34 28 ...
## $ hi : num 2.2 2.1 3.3 3.5 3.1 3.5 3.6 3 3 4 ...
## $ co : num 3.5 3.5 3 3.2 3.5 3.5 3.7 3 3 3.1 ...
## $ dh : int 0 1200 1300 1500 1600 350 0 5000 5000 900 ...
## $ dr : num 5 0.3 1.5 8 10 3 0.2 1.5 2 2 ...
## $ tv : num 3 15 0 5 6 4 5 5 7 1 ...
## $ sp : int 5 7 4 5 6 5 12 3 5 1 ...
## $ ne : int 0 5 3 6 3 7 4 3 3 2 ...
## $ ah : int 0 6 0 3 0 0 2 1 0 1 ...
## $ ve : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ pa : Factor w/ 3 levels "d","i","r": 3 1 1 2 2 1 2 2 2 2 ...
## $ pi : Ord.factor w/ 7 levels "very liberal"<..: 6 2 2 4 1 2 2 2 1 3 ...
## $ re : Ord.factor w/ 4 levels "never"<"occasionally"<..: 3 2 3 2 1 2 2 2 2 1 ...
## $ ab : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ aa : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ ld : logi FALSE NA NA FALSE FALSE NA ...
# renaming variables in dataframe
student.survey.gg <- rename(student.survey, Religiosity = re, Political_Ideology = pi)
# Bar graph of religiosity and politcal ideology
ggplot(data = student.survey.gg, aes(x = Religiosity, fill = Political_Ideology)) +
geom_bar(position = "fill") +
ggtitle("Religiosity vs Political Ideology")
# renaming variables in dataframe
student.survey.gg <- rename(student.survey, Hours_of_TV = tv, Highschool_GPA = hi)
# point graph of hours of TV and highschool GPA
ggplot(data = student.survey.gg, aes(x = Hours_of_TV, y = Highschool_GPA)) +
geom_point() +
ggtitle("Hours of Television vs Highscool GPA")
# Linear model for political ideology and religiosity
summary(lm(as.numeric(pi) ~ as.numeric(re),
data = student.survey))
##
## Call:
## lm(formula = as.numeric(pi) ~ as.numeric(re), data = student.survey)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.81243 -0.87160 0.09882 1.12840 3.09882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9308 0.4252 2.189 0.0327 *
## as.numeric(re) 0.9704 0.1792 5.416 1.22e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.345 on 58 degrees of freedom
## Multiple R-squared: 0.3359, Adjusted R-squared: 0.3244
## F-statistic: 29.34 on 1 and 58 DF, p-value: 1.221e-06
# Linear model for hours of television and GPA
summary(lm(hi ~ tv, data = student.survey))
##
## Call:
## lm(formula = hi ~ tv, data = student.survey)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2583 -0.2456 0.0417 0.3368 0.7051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.441353 0.085345 40.323 <2e-16 ***
## tv -0.018305 0.008658 -2.114 0.0388 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4467 on 58 degrees of freedom
## Multiple R-squared: 0.07156, Adjusted R-squared: 0.05555
## F-statistic: 4.471 on 1 and 58 DF, p-value: 0.03879
Religiosity is positively and statisticaly significant at the 0.01 level, therefor, an increase in relgiosity is associated with an increase in conservatisism. Hours of television is negatively and statisticaly significant at the .05 level, meaning as hours of television increase, GPA decreaseses.