DACSS 603, Spring 2022
United Nations (Data file: UN11) The data in the file UN11 contains several variables, including ppgdp, the gross national product per person in U.S. dollars, and fertility, the birth rate per 1000 females, both from the year 2009. The data are for 199 localities, mostly UN member countries, but also other areas such as Hong Kong that are not independent countries. The data were collected from the United Nations (2011). We will study the dependence of fertility on ppgdp.
1.1.1. Identify the predictor and the response.
Predictor (x): PPGDP
Response (y): Fertility
1.1.2 Draw the scatterplot of fertility on the vertical axis versus ppgdp on the horizontal axis and summarize the information in this graph.
The scatterplot seems to suggest that the higher the PPGDP, the lower the number of children per woman.
Does a straight-line mean function seem to be plausible for a summary of this graph?
ggplot(data=UN11, aes(x=ppgdp, y=fertility))+
geom_point()+
geom_smooth(method="lm", se=FALSE)
Yes, straight line does suggest there is function.
1.1.3 Draw the scatterplot of log(fertility) versus log(ppgdp) using natural logarithms. Does the simple linear regression model seem plausible for a summary of this graph? If you use a different base of logarithms, the shape of the graph won’t change, but the values on the axes will change.
UN11_log_ppgdp<- log(UN11$ppgdp)
UN11_log_fertility<- log(UN11$fertility)
plot(x=UN11_log_ppgdp, y=UN11_log_fertility)
The scatterplot using logs still suggests that number of children per woman decreases as PPGDP increases,but this new model represents the data more cleanly.
Annual income, in dollars, is an explanatory variable in a regression analysis. For a British version of the report on the analysis, all responses are converted to British pounds sterling (1 pound equals about 1.33 dollars, as of 2016).
(a) How, if at all, does the slope of the prediction equation change?
usdollar<- (1:10)
pound<- seq(1.33,13.3, length.out = 10)
slope<-(usdollar/pound)
slope
[1] 0.7518797 0.7518797 0.7518797 0.7518797 0.7518797 0.7518797
[7] 0.7518797 0.7518797 0.7518797 0.7518797
I didn’t initially know how to solve this, so I put together sample of values for dollars and pounds. I calculated the slope and can see that it will not change.
(b) How, if at all, does the correlation change?
cor.test(usdollar,pound)
Pearson's product-moment correlation
data: usdollar and pound
t = 189812531, df = 8, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
1 1
sample estimates:
cor
1
The correlation test of the sample data shows that the correlation is 1- perfect correlation. It makes sense that the correlation will never change.
Waterrunoff in the Sierras (Data file: water) Can Southern California’s water supply in future years be predicted from past data? One factor affecting water availability is stream runoff. If runoff could be predicted, engineers, planners, and policy makers could do their jobs more efficiently. The data file contains 43 years’ worth of precipitation measurements taken at six sites in the Sierra Nevada mountains (labeled APMAM, APSAB, APSLAKE, OPBPC, OPRC, and OPSLAKE) and stream runoff volume at a site near Bishop, California, labeled BSAAM. Draw the scatterplot matrix for these data and summarize the information available from these plots.
?water
summary(water)
Year APMAM APSAB APSLAKE
Min. :1948 Min. : 2.700 Min. : 1.450 Min. : 1.77
1st Qu.:1958 1st Qu.: 4.975 1st Qu.: 3.390 1st Qu.: 3.36
Median :1969 Median : 7.080 Median : 4.460 Median : 4.62
Mean :1969 Mean : 7.323 Mean : 4.652 Mean : 4.93
3rd Qu.:1980 3rd Qu.: 9.115 3rd Qu.: 5.685 3rd Qu.: 5.83
Max. :1990 Max. :18.080 Max. :11.960 Max. :13.02
OPBPC OPRC OPSLAKE BSAAM
Min. : 4.050 Min. : 4.350 Min. : 4.600 Min. : 41785
1st Qu.: 7.975 1st Qu.: 7.875 1st Qu.: 8.705 1st Qu.: 59857
Median : 9.550 Median :11.110 Median :12.140 Median : 69177
Mean :12.836 Mean :12.002 Mean :13.522 Mean : 77756
3rd Qu.:16.545 3rd Qu.:14.975 3rd Qu.:16.920 3rd Qu.: 92206
Max. :43.370 Max. :24.850 Max. :33.070 Max. :146345
head(water)
Year APMAM APSAB APSLAKE OPBPC OPRC OPSLAKE BSAAM
1 1948 9.13 3.58 3.91 4.10 7.43 6.47 54235
2 1949 5.28 4.82 5.20 7.55 11.11 10.26 67567
3 1950 4.20 3.77 3.67 9.52 12.20 11.35 66161
4 1951 4.60 4.46 3.93 11.14 15.15 11.13 68094
5 1952 7.15 4.99 4.88 16.34 20.05 22.81 107080
6 1953 9.70 5.65 4.91 8.88 8.15 7.41 67594
pairs(~APMAM + APSAB + APSLAKE + OPBPC + OPRC + OPSLAKE, data=water)
plot(y=water$BSAAM,x=water$APMAM)
plot(y=water$BSAAM,x=water$APSAB)
plot(y=water$BSAAM,x=water$APSLAKE)
plot(y=water$BSAAM,x=water$OPBPC)
plot(y=water$BSAAM,x=water$OPRC)
plot(y=water$BSAAM,x=water$OPSLAKE)
I successfully created a scatterplot matrix above, but I had trouble reading the data so I also ran each of the scatterplots separately. There seems to be very strong correlation for OPBPC, OPRC and OPSLAKE between snowfall in inches and stream runoff. For the other sites, more snowfall does seem to indicate more stream runoff, but the correlation is not as great.
Professor ratings (Data file: Rateprof) In the website and online forum RateMyProfessors.com, students rate and comment on their instructors. Launched in 1999, the site includes millions of ratings on thousands of instructors. The data file includes the summaries of the ratings of 364 instructors at a large campus in the Midwest (Bleske-Rechek and Fritsch, 2011). Each instructor included in the data had at least 10 ratings over a several year period. Students provided ratings of 1–5 on quality, helpfulness, clarity, easiness of instructor’s courses, and raterInterest in the subject matter covered in the instructor’s courses. The data file provides the averages of these five ratings. Use R to reproduce the scatterplot matrix in Figure 1.13 in the ALR book (page 20). Provide a brief description of the relationships between the five ratings. (The variables don’t have to be in the same order)
summary(Rateprof)
gender numYears numRaters numCourses
female:159 Min. : 1.000 Min. :10.00 Min. : 1.000
male :207 1st Qu.: 6.000 1st Qu.:15.00 1st Qu.: 3.000
Median :10.000 Median :24.00 Median : 4.000
Mean : 8.347 Mean :28.58 Mean : 4.251
3rd Qu.:11.000 3rd Qu.:37.00 3rd Qu.: 5.000
Max. :11.000 Max. :86.00 Max. :12.000
pepper discipline dept quality
no :320 Hum :134 English : 49 Min. :1.409
yes: 46 SocSci : 66 Math : 34 1st Qu.:2.936
STEM :103 Biology : 20 Median :3.612
Pre-prof: 63 Chemistry : 20 Mean :3.575
Psychology: 20 3rd Qu.:4.250
Spanish : 20 Max. :4.981
(Other) :203
helpfulness clarity easiness raterInterest
Min. :1.364 Min. :1.333 Min. :1.391 Min. :1.098
1st Qu.:3.069 1st Qu.:2.871 1st Qu.:2.548 1st Qu.:2.934
Median :3.662 Median :3.600 Median :3.148 Median :3.305
Mean :3.631 Mean :3.525 Mean :3.135 Mean :3.310
3rd Qu.:4.351 3rd Qu.:4.214 3rd Qu.:3.692 3rd Qu.:3.692
Max. :5.000 Max. :5.000 Max. :4.900 Max. :4.909
sdQuality sdHelpfulness sdClarity sdEasiness
Min. :0.09623 Min. :0.0000 Min. :0.0000 Min. :0.3162
1st Qu.:0.87508 1st Qu.:0.9902 1st Qu.:0.9085 1st Qu.:0.9045
Median :1.15037 Median :1.2860 Median :1.1712 Median :1.0247
Mean :1.05610 Mean :1.1719 Mean :1.0970 Mean :1.0196
3rd Qu.:1.28730 3rd Qu.:1.4365 3rd Qu.:1.3328 3rd Qu.:1.1485
Max. :1.67739 Max. :1.8091 Max. :1.8091 Max. :1.6293
sdRaterInterest
Min. :0.3015
1st Qu.:1.0848
Median :1.2167
Mean :1.1965
3rd Qu.:1.3326
Max. :1.7246
head(Rateprof)
gender numYears numRaters numCourses pepper discipline
1 male 7 11 5 no Hum
2 male 6 11 5 no Hum
3 male 10 43 2 no Hum
4 male 11 24 5 no Hum
5 male 11 19 7 no Hum
6 male 10 15 9 no Hum
dept quality helpfulness clarity easiness
1 English 4.636364 4.636364 4.636364 4.818182
2 Religious Studies 4.318182 4.545455 4.090909 4.363636
3 Art 4.790698 4.720930 4.860465 4.604651
4 English 4.250000 4.458333 4.041667 2.791667
5 Spanish 4.684211 4.684211 4.684211 4.473684
6 Spanish 4.233333 4.266667 4.200000 4.533333
raterInterest sdQuality sdHelpfulness sdClarity sdEasiness
1 3.545455 0.5518564 0.6741999 0.5045250 0.4045199
2 4.000000 0.9020179 0.9341987 0.9438798 0.5045250
3 3.432432 0.4529343 0.6663898 0.4129681 0.5407021
4 3.181818 0.9325048 0.9315329 0.9990938 0.5882300
5 4.214286 0.6500112 0.8200699 0.5823927 0.6117753
6 3.916667 0.8632717 1.0327956 0.7745967 0.6399405
sdRaterInterest
1 1.1281521
2 1.0744356
3 1.2369438
4 1.3322506
5 0.9749613
6 0.6685579
?Rateprof
pairs(~quality + clarity + helpfulness + easiness + raterInterest, data=Rateprof)
Please note I used watched this video helped me create this scatterplot matrix https://www.youtube.com/watch?v=AY9PYzJtCNA
It looks like quality, clarity and helpfulness are related. It looks like professors who excel at one of these things, excel at all three. Perhaps the quality of the professor is very good when the professor exercises great clarity and helpfulness. It does not look like there is any relationship between a professor being easy and quality, clarity or helpfulness. It also looks like there is not a relationship between rate of interest and quality, clarity of helpfulness of a professor.
For the student.survey data file in the smss package
#install.packages(“smss”) #install.packages(“alr4”) #install.packages(“car”) #install.packages(“effects”) #install.packages(“carData”) #install.packages(“r package”, repos = “http://cran.us.r-project.org”)
install.packages("r package", repos = "http://cran.us.r-project.org")
library(smss)
data(student.survey)
summary(student.survey)
subj ge ag hi
Min. : 1.00 f:31 Min. :22.00 Min. :2.000
1st Qu.:15.75 m:29 1st Qu.:24.00 1st Qu.:3.000
Median :30.50 Median :26.50 Median :3.350
Mean :30.50 Mean :29.17 Mean :3.308
3rd Qu.:45.25 3rd Qu.:31.00 3rd Qu.:3.625
Max. :60.00 Max. :71.00 Max. :4.000
co dh dr tv
Min. :2.600 Min. : 0 Min. : 0.200 Min. : 0.000
1st Qu.:3.175 1st Qu.: 205 1st Qu.: 1.450 1st Qu.: 3.000
Median :3.500 Median : 640 Median : 2.000 Median : 6.000
Mean :3.453 Mean :1232 Mean : 3.818 Mean : 7.267
3rd Qu.:3.725 3rd Qu.:1350 3rd Qu.: 5.000 3rd Qu.:10.000
Max. :4.000 Max. :8000 Max. :20.000 Max. :37.000
sp ne ah ve
Min. : 0.000 Min. : 0.000 Min. : 0.000 Mode :logical
1st Qu.: 3.000 1st Qu.: 2.000 1st Qu.: 0.000 FALSE:60
Median : 5.000 Median : 3.000 Median : 0.500
Mean : 5.483 Mean : 4.083 Mean : 1.433
3rd Qu.: 7.000 3rd Qu.: 5.250 3rd Qu.: 2.000
Max. :16.000 Max. :14.000 Max. :11.000
pa pi re ab
d:21 very liberal : 8 never :15 Mode :logical
i:24 liberal :24 occasionally:29 FALSE:60
r:15 slightly liberal : 6 most weeks : 7
moderate :10 every week : 9
slightly conservative: 6
conservative : 4
very conservative : 2
aa ld
Mode :logical Mode :logical
FALSE:59 FALSE:44
NA's :1 NA's :16
head(student.survey)
subj ge ag hi co dh dr tv sp ne ah ve pa pi
1 1 m 32 2.2 3.5 0 5.0 3 5 0 0 FALSE r conservative
2 2 f 23 2.1 3.5 1200 0.3 15 7 5 6 FALSE d liberal
3 3 f 27 3.3 3.0 1300 1.5 0 4 3 0 FALSE d liberal
4 4 f 35 3.5 3.2 1500 8.0 5 5 6 3 FALSE i moderate
5 5 m 23 3.1 3.5 1600 10.0 6 6 3 0 FALSE i very liberal
6 6 m 39 3.5 3.5 350 3.0 4 5 7 0 FALSE d liberal
re ab aa ld
1 most weeks FALSE FALSE FALSE
2 occasionally FALSE FALSE NA
3 most weeks FALSE FALSE NA
4 occasionally FALSE FALSE FALSE
5 never FALSE FALSE FALSE
6 occasionally FALSE FALSE NA
?student.survey
conduct regression analyses relating (i) y = political ideology and x = religiosity,
Initially I received a lot of errors when I tried to plot this data. So I used various ways to remove na data, opting for na.omit Then I realized that the data was not numerical, so I recoded the data. “Very Conservative” to “Very Liberal” became 1 through 7 for “Political Ideology”. “Never” to “Every Week” became 0-3 for “how often you attend religious services.”
Then I could plot the data.
#install.packages(“dplyr”)
install.packages("r package", repos = "http://cran.us.r-project.org")
data("student.survey")
is.na(student.survey)
subj ge ag hi co dh dr tv sp ne
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[31,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[32,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[34,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[35,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[36,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[38,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[39,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[40,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[41,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[42,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[43,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[44,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[45,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[46,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[47,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[48,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[50,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[51,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[52,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[53,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[54,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[55,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[56,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[57,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[58,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[59,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[60,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
ah ve pa pi re ab aa ld
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[31,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[32,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[34,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[35,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[36,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[38,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[39,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[40,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[41,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[42,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[43,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[44,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[45,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[46,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[47,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[48,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[50,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[51,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[52,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[53,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[54,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[55,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[56,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[57,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[58,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[59,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[60,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
subj ge ag hi
Min. : 1.00 f:22 Min. :22.00 Min. :2.200
1st Qu.:17.00 m:21 1st Qu.:24.00 1st Qu.:3.000
Median :31.00 Median :27.00 Median :3.300
Mean :31.19 Mean :29.23 Mean :3.305
3rd Qu.:46.50 3rd Qu.:31.00 3rd Qu.:3.650
Max. :60.00 Max. :71.00 Max. :4.000
co dh dr tv
Min. :2.600 Min. : 0 Min. : 0.200 Min. : 0.000
1st Qu.:3.200 1st Qu.: 180 1st Qu.: 1.500 1st Qu.: 2.000
Median :3.500 Median : 630 Median : 2.000 Median : 5.000
Mean :3.493 Mean :1333 Mean : 4.186 Mean : 6.756
3rd Qu.:3.800 3rd Qu.:1650 3rd Qu.: 5.000 3rd Qu.: 8.000
Max. :4.000 Max. :8000 Max. :20.000 Max. :37.000
sp ne ah ve
Min. : 0.000 Min. : 0.000 Min. : 0.000 Mode :logical
1st Qu.: 3.000 1st Qu.: 2.000 1st Qu.: 0.000 FALSE:43
Median : 5.000 Median : 3.000 Median : 1.000
Mean : 6.023 Mean : 3.953 Mean : 1.465
3rd Qu.: 7.500 3rd Qu.: 5.000 3rd Qu.: 2.000
Max. :16.000 Max. :14.000 Max. :11.000
pa pi re ab
d:11 very liberal : 6 never :12 Mode :logical
i:19 liberal :14 occasionally:18 FALSE:43
r:13 slightly liberal : 4 most weeks : 5
moderate : 8 every week : 8
slightly conservative: 6
conservative : 3
very conservative : 2
aa ld
Mode :logical Mode :logical
FALSE:43 FALSE:43
head(student.survey)
subj ge ag hi co dh dr tv sp ne ah ve pa pi
1 1 m 32 2.2 3.5 0 5.0 3 5 0 0 FALSE r conservative
4 4 f 35 3.5 3.2 1500 8.0 5 5 6 3 FALSE i moderate
5 5 m 23 3.1 3.5 1600 10.0 6 6 3 0 FALSE i very liberal
7 7 m 24 3.6 3.7 0 0.2 5 12 4 2 FALSE i liberal
8 8 f 31 3.0 3.0 5000 1.5 5 3 3 1 FALSE i liberal
10 10 m 28 4.0 3.1 900 2.0 1 1 2 1 FALSE i slightly liberal
re ab aa ld
1 most weeks FALSE FALSE FALSE
4 occasionally FALSE FALSE FALSE
5 never FALSE FALSE FALSE
7 occasionally FALSE FALSE FALSE
8 occasionally FALSE FALSE FALSE
10 never FALSE FALSE FALSE
library(dplyr)
student.survey$pi<- recode(student.survey$pi,
"1" = "very conservative",
"2" = "conservative",
"3" = "slightly conservative",
"4" = "moderate",
"5" = "slightly liberal",
"6" = "liberal",
"7" = "very liberal")
student.survey$re<- recode(student.survey$re,
"0" = "never",
"1" = "occasionally",
"2" = "most weeks",
"3" = "every week")
plot(x = student.survey$re, y = student.survey$pi)
It was not clear exactly what this model meant, so I tried another way.
(a) Use graphical ways to portray the individual variables and their relationship.ggplot(data=student.survey, aes(x=re, y=pi))+
geom_point()
(b) Interpret descriptive statistics for summarizing the individual variables and their relationship.
This model is unusual, but it does seem to indicate that at as the frequency of attending religious services grew, so did the number associated with political ideology.
(c) Summarize and interpret results of inferential analyses.
People who attend church often are more likely to leave conservative, than people who attend church never or only occasionally.
conduct regression analyses relating (ii) y = high school GPA and x = hours of TV watching.
plot(x = student.survey$tv, y = student.survey$hi)
(a) Use graphical ways to portray the individual variables and their relationship.
ggplot(data=student.survey, aes(x=tv, y=hi))+
geom_point()+
geom_smooth(method="lm", se=FALSE)
(b) Interpret descriptive statistics for summarizing the individual variables and their relationship.
I do not see a significant relationship here.
(c) Summarize and interpret results of inferential analyses.
While a few outliers- who watch a lot of TV seem to have below average High School GPAs, most people in this study fall between a 3.0 and a 4.0 GPA and average number of hours of tv does not seem to affect GPA.
For a class of 100 students, the teacher takes the 10 students who perform poorest on the midterm exam and enrolls them in a special tutoring program. The overall class mean is 70 on both the midterm and final, but the mean for the specially tutored students increases from 50 to 60. Use the concept of regression toward the mean to explain why this is not sufficient evidence to imply that the tutoring program was successful.
Maybe the 10 students who performed the poorest on the midterm would have had an increase from 50 to 60 for the final, regardless of the tutoring. Perhaps they would have been so aware and concerned that they had performed poorly on the midterm, that they would have made an extra effort to do better on the next (and final) exam. While the students that did better or even just average for the class- would not have felt the same drive to try harder.
https://www.youtube.com/watch?v=1tSqSMOyNFE
There is always a level of chance. “You should not expect them to be as unlucky when you test them a second time…their scores should improve just based on random chance.”