This was our original survey.
myCSVLink = "https://docs.google.com/spreadsheet/pub?key=0AsXterp7HII8dF90VjZTOU1iQl9qV0JXV3RXd291UVE&output=csv"
d = fetchGoogle(myCSVLink)
### Put in NA for empty or blank answers
for (k in 1:length(d)) {
temp = d[[k]]
temp[temp %in% c("", " ", " ", " ")] = NA
d[k] = temp
}
The names of variables generated by Google Form are too verbose.
origNames = names(d)
origNames
## [1] "Timestamp"
## [2] "How.many.AP.IB.credits.did.you.have.coming.to.Macalester."
## [3] "Which.division.of.study.do.you.most.strongly.affiliate.yourself.with."
## [4] "What.is.your.gender."
## [5] "How.many.hours.do.you.typically.spend.per.week.working.an.on.or.off.campus.job.during.the.academic.school.year."
## [6] "Where.are.you.from."
## [7] "What.year.student.are.you.currently.at.Macalester."
## [8] "How.many.hours.do.you.spend.on.extracurricular.activities..varsity.sports..clubs..choir..etc..per.week.."
## [9] "How.many.credits.are.you.currently.taking.this.semester."
## [10] "Are.you.interested.in.taking.more.than.18.credits.if.there.s.no.extra.charge..If.so..how.many.credits.do.you.want.to.take."
## [11] "Academically..how.qualified.did.you.feel.coming.to.Macalester."
## [12] "Would.you.be.interested.to.graduate.in.less.than.4.years."
## [13] "Are.you.interested.in.taking.summer.or.J.term.courses.for.credits."
## [14] "If.you.are.interested.in.taking.more.than.18.credits.per.semester..which.reason.most.likely.applies.to.you."
## [15] "Do.you.agree.with.the.current.academic.credit.limit...18.credits.per.semester."
As you can see, there are 15 different variables. You're going to rename each of them. Use the following statements to do so:
names(d)[2] = "APIB"
names(d)[3] = "Major"
## fix the bogus natural science level
d[[3]] = as.character(d[[3]])
hoo = grepl("^Natural.+", d[[3]])
d[hoo, 3] = "Natural Sciences"
names(d)[4] = "Gender"
Make sure to change the number in each line and that you choose an appropriate mnemonic short name.
Use this style to complete the change of names, e.g.
names(d)[5] = "Work"
names(d)[6] = "Origin"
names(d)[7] = "Year"
names(d)[8] = "Extracurr"
names(d)[9] = "Currentcreds"
names(d)[10] = "Interest"
names(d)[11] = "Qualified"
names(d)[12] = "Graduate"
names(d)[13] = "Summerjterm"
names(d)[14] = "Reason"
names(d)[15] = "Agree"
# Put the rest of your commands here.
Often, the levels produced by Google Forms are too verbose for convenience. After all, they were designed for another purpose: to be informative to a human completing your survey. It's helpful to change the names to be more convenient for display. To do this, construct a vector that tells what should be the new level for each existing level. You need to be careful to get the spelling exactly right. Also, make sure to list every possible level from your form, even if there are some that nobody selected in your survey.
require(plyr) # just need to do once, like require(mosaic)
newLevels = c("Male"="M","Female"="F",
"Other"="O")
originLevels = c("Minnesota"="MN","Domestic, but other US state"="US",
"Out of US (international)"="INTL")
yearLevels = c("First Year"="Fir","Sophomore"="Sop",
"Junior"="Jun","Senior"="Sen")
reasonLevels = c("I would like the option to graduate in less than 4 years"="grad","I want to be on track when I come back from study abroad"="abr","I find the current semester credit limit load too easy"="eas","The current semester credit limit doesn’t allow me to take all the classes I find interesting"="int","I need to take more credits to make up the previous semester(s)"="mak")
graduateLevels = c("Yes, I definitely want to obtain a degree in less than 4 years"="Yes","Maybe, but at least Macalester should give me the option"="Maybe","No, it takes away from my experience at Macalester as a liberal arts college"="No")
summerjtermLevels = c("Yes, I would already have money set aside for summer/J-term courses"="Yes","Maybe, I at least want the option"="Maybe","No, I plan to do other things like internships/other paid jobs or travel"="No")
majorLevels = c("Social sciences"="SS","Natural Sciences"="NS",
"Fine arts"="FA","Humanities"="H","Don't Know Yet"="DKY")
Now you will assign these new levels to your variable:
d$Gender = revalue(d$Gender, newLevels)
d$Gender = factor(d$Gender, levels = newLevels)
d$Major = revalue(d$Major, majorLevels)
d$Major = factor(d$Major, levels = majorLevels)
d$Origin = revalue(d$Origin, originLevels)
d$Origion = factor(d$Origin, levels = originLevels)
d$Year = revalue(d$Year, yearLevels)
d$Year = factor(d$Year, levels = yearLevels)
d$Reason = revalue(d$Reason, reasonLevels)
d$Reason = factor(d$Reason, levels = reasonLevels)
d$Graduate = revalue(d$Graduate, graduateLevels)
d$Graduate = factor(d$Graduate, levels = graduateLevels)
d$Summerjterm = revalue(d$Summerjterm, summerjtermLevels)
d$Summerjterm = factor(d$Summerjterm, levels = summerjtermLevels)
This involves two commands for each variable. The first changes the names of the levels. The second does something a little more obscure. It makes sure that the full set of possible levels is available for graphics, models, etc.
You may also want to set the reference level explicitly. You can do this with a statement of this sort:
relevel(d$Gender, ref = "F")
## [1] F F F M F F F M F F M F F F F M F F F F M F F F F M F M F M F M F F F
## [36] M M M M M O M M F F F M F F M M F F M F F F F F M M M F M F F F F M F
## [71] F F F M F F F M F M F F F F F M F F F F M F F F M M F M F F F M F M M
## [106] M F F F F F M M M M F F F F F F F M M F F M F M F F F F F F F F M F F
## [141] O M F M M F F M M F M M M M M M M F F F F F F F M M F F F F F M F F F
## [176] F F M M F F F M M F F F F F F F F F F M F M F F M M F F F M
## Levels: F M O
relevel(d$Origion, ref = "MN")
## [1] INTL US US US US MN US US US US US US US US
## [15] US US US MN US US INTL US US US US INTL US US
## [29] US US INTL US US US US MN US US US US US INTL
## [43] US US US US INTL INTL MN MN US US INTL MN US US
## [57] US US US US US INTL US MN US US US US INTL INTL
## [71] INTL MN US US US US US INTL US INTL US US US US
## [85] MN INTL US INTL US US INTL US INTL INTL INTL MN US US
## [99] US INTL US US INTL INTL US INTL <NA> US US US US MN
## [113] US US US US INTL US MN US US US MN US MN US
## [127] INTL MN MN MN US US US US US US US US US INTL
## [141] US US US US US US INTL US US US MN MN US US
## [155] INTL MN US US US US INTL US US US US MN US MN
## [169] MN US US US US US US US US US US INTL US US
## [183] US US INTL US US MN US US US US US MN US INTL
## [197] US INTL US MN US MN US US US
## Levels: MN US INTL
relevel(d$Year, ref = "Fir")
## [1] Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen
## [15] Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen Sen
## [29] Sen Sen Sen Sen Sen Sen Jun Sen Sen Sen Sen Sen Sen Sop
## [43] Sop Sop Sop Sop Sop Sop Sop Sop Sop Fir Sop Sop Sop Sop
## [57] Sop Sop Sop Sop Fir <NA> Sop Sop Sop Sop Sen Fir Sop Sop
## [71] Sop Fir Jun Fir Sop Sop Sop Sen Fir Jun Jun Fir Sop Fir
## [85] Fir Sop Jun Fir Sen Fir Sop Sop Fir Fir Fir Sop Sop Sop
## [99] Fir Sop Sop Sop Sop Sop Fir Sop Fir Sop Jun Sen Sop Sop
## [113] Sop Fir Sop Sop Sop Jun Sop Sop Fir Sop Sop Fir Jun Fir
## [127] Jun Sop Sop Sop Jun Fir Sop Sop Jun Fir Sen Fir Sen Sop
## [141] Fir Sop Sop Fir Fir Fir Sop Fir Fir Jun Fir Fir Jun Sop
## [155] Jun Sop Jun Sop Fir Jun Sop Jun Jun Sop Sop Sop Sop Jun
## [169] Fir Fir Sop Sop Sop Sop Sop Sop Sop Jun Sop Fir Sop Sop
## [183] Fir Sop Sop Sop Sop Jun Sop Fir Sop Fir Sen Fir Fir Fir
## [197] Sop Fir Jun Sen Sop Sop Sop Jun Fir
## Levels: Fir Sop Jun Sen
relevel(d$Reason, ref = "grad")
## [1] <NA> int int int int int abr int <NA> int int int <NA> int
## [15] <NA> <NA> mak grad int <NA> grad int grad abr int int <NA> int
## [29] int int <NA> int int int int <NA> int abr <NA> grad int eas
## [43] int int <NA> abr int int grad int int int int int abr grad
## [57] int int <NA> mak int <NA> <NA> mak int int abr int int mak
## [71] int <NA> abr <NA> int <NA> abr grad int <NA> int int <NA> grad
## [85] int grad abr int <NA> int grad int int int int int int <NA>
## [99] int int int int int int int int int int int int abr abr
## [113] <NA> <NA> int mak int eas mak int int int int <NA> <NA> <NA>
## [127] int int int mak int mak int abr int abr int abr <NA> <NA>
## [141] int int <NA> <NA> <NA> abr int <NA> grad int abr abr int abr
## [155] int int int int int int int <NA> int int <NA> mak <NA> <NA>
## [169] int <NA> eas <NA> int int abr <NA> abr grad grad mak abr int
## [183] abr mak int <NA> int abr int abr int abr grad int int int
## [197] mak grad <NA> <NA> int int abr int grad
## Levels: grad abr eas int mak
relevel(d$Graduate, ref = "Yes")
## [1] No No Maybe Yes No No Maybe Maybe No Maybe No
## [12] Maybe Maybe Maybe No No No Maybe Maybe No Yes Maybe
## [23] Maybe Maybe No Maybe No Maybe Maybe No No Maybe Maybe
## [34] Maybe No Maybe No Maybe Maybe Yes Maybe No Maybe No
## [45] No Maybe No Maybe Yes No Maybe No No No No
## [56] Yes Maybe Maybe No Maybe Maybe No No No No Maybe
## [67] Maybe No No No No No Yes No No Maybe Maybe
## [78] Yes Maybe Maybe No No Maybe Maybe No Yes No Maybe
## [89] No Maybe Yes No Maybe No No Maybe Maybe Maybe Maybe
## [100] Maybe No Maybe No No No No Yes No Maybe Maybe
## [111] Maybe Maybe Maybe No No No Maybe No Yes Maybe No
## [122] Yes No Maybe No Maybe Yes Maybe Maybe Maybe Maybe No
## [133] No No No Maybe Maybe Maybe Maybe Maybe No No No
## [144] No Maybe No No No Yes Maybe No No Maybe Maybe
## [155] Maybe Maybe Maybe No Maybe Maybe No No Maybe No Maybe
## [166] Maybe Maybe Maybe Maybe No Maybe Yes No No Maybe No
## [177] Maybe Maybe Maybe Maybe No Maybe Maybe No No No Maybe
## [188] Maybe Maybe No No No Maybe No Maybe No No Maybe
## [199] Maybe No Yes Maybe No Maybe Yes
## Levels: Yes Maybe No
relevel(d$Summerjterm, ref = "Yes")
## [1] No Maybe No <NA> Maybe No Maybe No Maybe <NA> No
## [12] Maybe No Maybe No No <NA> No Maybe Maybe <NA> No
## [23] Maybe Maybe Maybe Maybe No Maybe Maybe Maybe No <NA> <NA>
## [34] <NA> <NA> Maybe No <NA> No Yes Maybe Yes Maybe No
## [45] Maybe Maybe No Maybe Maybe No Maybe Maybe Maybe No No
## [56] Yes Maybe Maybe Maybe No Yes No No Yes Maybe Yes
## [67] No No No No No Maybe Yes Yes Maybe Maybe Maybe
## [78] Maybe Maybe Maybe Maybe Yes Maybe Maybe Maybe Maybe Yes Maybe
## [89] Maybe Maybe Yes Maybe Maybe Maybe No Yes Yes Maybe Maybe
## [100] Maybe Maybe Yes Maybe No Maybe No No No No No
## [111] No Maybe No Maybe No Maybe Yes Maybe Maybe Maybe Maybe
## [122] Yes No Maybe Maybe Maybe No Maybe Yes Yes Maybe Maybe
## [133] Maybe Yes Maybe No Maybe Maybe Maybe Maybe No Maybe Maybe
## [144] Maybe No Maybe Maybe No Maybe Maybe Maybe Maybe Maybe Maybe
## [155] No No Yes Yes Yes No No Maybe Yes Maybe Yes
## [166] Maybe Maybe No Maybe Maybe No Maybe Maybe Maybe Maybe Maybe
## [177] No Maybe No Maybe Yes Yes Maybe Yes No No Maybe
## [188] Yes Maybe Maybe Maybe Yes Maybe Maybe Yes Maybe Maybe Yes
## [199] <NA> Maybe Maybe Yes Maybe Maybe Yes
## Levels: Yes Maybe No
relevel(d$Major, ref = "FA")
## [1] SS SS H NS SS SS SS NS SS SS SS SS NS NS SS NS SS
## [18] SS SS SS SS SS H NS FA SS SS SS SS H H SS SS H
## [35] SS SS SS SS SS NS SS NS NS NS SS NS NS NS H NS SS
## [52] DKY SS NS SS SS FA H H NS DKY NS NS SS NS SS SS SS
## [69] SS NS SS H SS H NS NS SS NS DKY SS NS H H DKY NS
## [86] SS NS SS H DKY SS H SS SS SS NS SS SS NS SS H H
## [103] SS SS SS SS SS SS SS H H SS NS NS NS NS SS FA NS
## [120] SS NS NS H NS NS DKY SS H NS NS SS DKY NS SS H NS
## [137] H SS H SS DKY H SS SS NS DKY NS NS NS H NS NS SS
## [154] DKY NS NS SS NS DKY H SS NS NS NS NS H NS NS DKY DKY
## [171] SS NS SS NS NS SS SS SS SS NS SS NS SS SS SS SS NS
## [188] SS SS H NS H NS H NS SS NS FA H FA NS NS FA SS
## [205] SS
## Levels: FA SS NS H DKY
Many of the survey questions are on a Likert Scale. You will want to simplify the names and also to tell R that there is a natural order. For example, the Web variable in our survey has a natural ordering.
Here's the renaming step:
likertLevels = c("Overqualified"="Over" ,
"Qualified"="Qual",
"It's my reach school"="Reach",
"Prefer not to say"="Notsay")
d$Qualified = revalue(d$Qualified, likertLevels)
d$Qualified = factor(d$Qualified, ordered=TRUE,levels=likertLevels)
likertLevels = c("< 12 credits"="<12" ,
"12-13 credits"="12-13",
"14-15 credits"="14-15",
"16-17 credits"="16-17","18 credits"="18","> 18 credits"=">18")
d$Currentcreds = revalue(d$Currentcreds, likertLevels)
d$Currentcreds = factor(d$Currentcreds, ordered=TRUE,levels=likertLevels)
likertLevels = c("0"="0" ,
"1-2"="1-2",
"3-4"="3-4",
"5+"="5+","Other:"="Other")
d$APIB = revalue(d$APIB, likertLevels)
## The following `from` values were not present in `x`: Other:
d$APIB = factor(d$APIB, ordered=TRUE,levels=likertLevels)
likertLevels = c("No, I'm good with 18 credits upper limit"="Good" ,
"Yes, 20-21 credits"="20-21",
"Yes, 22-23 credits"="22-23",
"Yes, 24 credits"="24","Yes, >24 credits( Woo good luck~)"=">24")
d$Interest = revalue(d$Interest, likertLevels)
d$Interest = factor(d$Interest, ordered=TRUE,levels=likertLevels)
likertLevels = c("0 hours"="0" ,
"1- 4 hours"="1-4",
"5-10 hours"="5-10",
"11-15 hours"="11-15","> 15 hours"=">15")
d$Work = revalue(d$Work, likertLevels)
d$Work = factor(d$Work, ordered=TRUE,levels=likertLevels)
likertLevels = c("0 hour (Not doing any activities)"="0" ,
"1-5 Hours"="1-5",
"6-10 Hours"="6-10",
"11-15 Hours"="11-15","16-20 Hours"="16-20","More than 20 Hours"=">20")
d$Extracurr = revalue(d$Extracurr, likertLevels)
## The following `from` values were not present in `x`: 0 hour (Not doing any
## activities)
d$Extracurr = factor(d$Extracurr, ordered=TRUE,levels=likertLevels)
When you construct the translation (here called likertLevels), make sure to order it in the natural way, from one end to the other.
Now, tell R that the variable is ordered:
head(d$Agree)
## [1] 2 -2 -1 0 1 1
head(d$Currentcreds)
## [1] 12-13 12-13 18 16-17 16-17 14-15
## Levels: <12 < 12-13 < 14-15 < 16-17 < 18 < >18
head(d$Qualified)
## [1] Qual Qual Qual Qual Qual Qual
## Levels: Over < Qual < Reach < Notsay
head(d$APIB)
## [1] 0 5+ 3-4 5+ 5+ 5+
## Levels: 0 < 1-2 < 3-4 < 5+ < Other
head(d$Interest)
## [1] <NA> 20-21 22-23 <NA> 20-21 20-21
## Levels: Good < 20-21 < 22-23 < 24 < >24
head(d$Work)
## [1] >15 11-15 5-10 0 0 11-15
## Levels: 0 < 1-4 < 5-10 < 11-15 < >15
head(d$Extracurr)
## [1] 1-5 1-5 1-5 16-20 6-10 1-5
## Levels: 0 < 1-5 < 6-10 < 11-15 < 16-20 < >20
with(d, class(Agree))
## [1] "integer"
As a liberal arts college, Macalester College has a class registration upper limit of 18-credits per semester. Students are allowed to participate in research, fellowships, and internships over the January term, but there are no formal classes available during this time except a summer physics program. There are also no summer classes available to students. This survey explores the attiudes and satisfaction of Macalester students with the upper 18-credits per semester class registration limit. We want to learn what our peers think of this limit, because from personal experience, students have voiced that he or she is taking 18 credits or wants to take more credits. We hope to gain some insight from this survey about the satisfaction and reasoning for a want to increase or keep the current semester credit limit. Ultimately, this could help provide a foundation to future considerations and changes in credit policies to accommodate more students.
We had several hypotheses in designing the survey, with two of them listed here:
1) Students who want to graduate earlier than 4 years have a negative attitude towards the current academic limit of 18 credits.
2) Natural science majors disagree the most with the current academic limit of 18 credits.
Don't be afraid to state hypotheses that you think are obvious. Even if it's obvious, you'll still want to try to demonstrate them from your data.
The survey consisted of 14 multiple-choice questions. We distributed the survey by posting a link to it on our personal facebooks and onto the facebook walls of each class group (class of 2014, class of 2015, etc.). We tried to be considerate and polite when asking for responses and said the survey should not take more than 5 minutes.
”Agree” is the response variable for both hypotheses. It shows the student’s attitude towards the current academic policy. It was originally a categorical variable with 5 levels: Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree. We transformed it into a quantitative variable from -2 to 2, with -2 being “Strongly Disagree” and 2 being “Strongly Agree”.
“Graduate” is a categorical variable that shows students’ desire to graduate earlier than 4 years. There are three levels for the Graduate variable: Yes, Maybe, No.
“Major” is a categorical variable denoting a student’s major. There are five levels: NS stands for Natural Sciences, SS for Social Sciences, FA for Fine Arts, H for Humanities and DKY for Do Not Know.
The majority students answering the survey were sophomores, following by similar amount of seniors and first years, and the number of juniors is relatively small:
barchart(tally(~Year, data = d, margins = FALSE, format = "count"), auto.key = TRUE)
The majority students answering the survey were social science(SS in the graph) and natural science majors(NS in the graph).
barchart(tally(~Major, data = d, margins = FALSE, format = "count"), auto.key = TRUE)
Hypothesis 1
From the two graphs below it seems that those who want to graduate early tend to disagree with the current academic policy of 18 credits.
mosaicplot(Agree ~ Graduate, data = d, las = 2, col = rainbow(5))
bwplot(Agree ~ Graduate, data = d)
Hypothesis 2
From the following two graphs it seems that natural science majors disagree the most with the current academic limit of 18 credits.
mosaicplot(Agree ~ Major, data = d, las = 2, col = rainbow(5))
bwplot(Agree ~ Major, data = d)
TASK 3:
(1)Does the students who want to graduate early tend to disagree with the credit upper limit? Here's a linear regression model of whether a student will agree or disagree with this policy(2=strongly agree and -2=strongly disagree) and whether they tend to graduate early:
mod1=lm(Agree ~ Graduate,data=d)
summary(mod1)
##
## Call:
## lm(formula = Agree ~ Graduate, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.044 -1.044 -0.044 0.956 2.765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.765 0.300 -2.55 0.012 *
## GraduateMaybe 0.538 0.326 1.65 0.100
## GraduateNo 0.809 0.327 2.47 0.014 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.24 on 202 degrees of freedom
## Multiple R-squared: 0.0325, Adjusted R-squared: 0.0229
## F-statistic: 3.4 on 2 and 202 DF, p-value: 0.0354
anova(mod1)
## Analysis of Variance Table
##
## Response: Agree
## Df Sum Sq Mean Sq F value Pr(>F)
## Graduate 2 10.4 5.21 3.4 0.035 *
## Residuals 202 309.9 1.53
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The coefficient on intercept(the first level on the categorical variable, which is GraduateYes) is -0.765. This suggests that students who want to graduate early tend to disagree with the credit upper limit policy. The coefficient on Graduate No is 0.809. This suggests that students who don’t to graduate early tend to agree with the credit upper limit policy. The p value for intercept and GraduateNo are both <0.05, thus we can reject the null hypothesis and demonstrate that the relationship between graduate early and agreement with credit policy has statistical significance. The Anova test gives us a p value <0.05 as well, indicating a statistical significance between graduate plan and agreement with credit policy.
(2)Does natural science majors tend to disagree with the current academic limit of 18 credits? Here's a linear regression model of whether a student will agree or disagree with this policy(2=strongly agree and -2=strongly disagree) and their majors:
mod2=lm(Agree ~ Major,data=d)
summary(mod2)
##
## Call:
## lm(formula = Agree ~ Major, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1667 -0.9195 0.0805 1.0805 2.3824
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0805 0.1340 -0.60 0.55
## MajorNS -0.3019 0.2023 -1.49 0.14
## MajorFA 0.2471 0.5274 0.47 0.64
## MajorH 0.2418 0.2614 0.92 0.36
## MajorDKY -0.2272 0.3716 -0.61 0.54
##
## Residual standard error: 1.25 on 200 degrees of freedom
## Multiple R-squared: 0.025, Adjusted R-squared: 0.00554
## F-statistic: 1.28 on 4 and 200 DF, p-value: 0.277
The coefficient of MajorNS is -0.3(which is the largest coefficient in scale comparing with other levels in majors), indicating that nature science majors tend to disagree the most with the major policies. However, the p value is 0.14, which is large enough to fail to reject the null hypothesis.
If your p-values are too large to reject the null, it's helpful to give some guidance to future researchers. Select a sample size that will give you a p-value of 0.01 and report that. To do this, you'll need to vary the sample size until you find one that works reliably. You don't have to show the calculations you do, just give the result. (Your instructor can check it out by using that sample size!) (1)For hypothesis 1-Graduate early with disagreement
largerSample = resample(d,size=600)
mod3=lm(Agree ~ Graduate,data=largerSample)
##
## Call:
## lm(formula = Agree ~ Graduate, data = largerSample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0591 -1.0591 0.0731 1.0731 2.5556
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.556 0.183 -3.03 0.0025 **
## GraduateMaybe 0.482 0.196 2.46 0.0143 *
## GraduateNo 0.615 0.199 3.09 0.0021 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.23 on 597 degrees of freedom
## Multiple R-squared: 0.016, Adjusted R-squared: 0.0127
## F-statistic: 4.86 on 2 and 597 DF, p-value: 0.00807
600 samples are enought to get a p value smaller than 0.01
(2)For hypothesis 2-Science Major with disagreement
largerSample = resample(d,size=1000)
mod4=lm(Agree ~ Major,data=largerSample)
##
## Call:
## lm(formula = Agree ~ Major, data = largerSample)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.111 -0.845 -0.598 1.155 2.402
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1553 0.0597 -2.60 0.0095 **
## MajorNS -0.2469 0.0884 -2.79 0.0053 **
## MajorFA 0.2664 0.2964 0.90 0.3690
## MajorH 0.2546 0.1197 2.13 0.0337 *
## MajorDKY -0.2068 0.1724 -1.20 0.2307
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.23 on 995 degrees of freedom
## Multiple R-squared: 0.0202, Adjusted R-squared: 0.0163
## F-statistic: 5.13 on 4 and 995 DF, p-value: 0.000426
1000 samples are enough to get a p value smaller than 0.01
(1)From our hypothesis1: Students who want to graduate earlier than 4 years have a negative attitude towards the current academic limit. We can conclude that there’s a statistical significant negative correlation between graduate early and agreement with the current academic limit. This conclusion is supported by the regression report of linear model fitted with Graduate early categorical variable and Agree quantitative variable(negative correlation for GraduateYes, positive correlation for GraduateNo, p<0.05 for both levels indicating rejection of null hypothesis). That seems reasonable because students who want to graduate early have more incentive to take more classes every semester and have the incentive to take credits over upper limit.
(2)From our hypothesis 2: natural science majors disagree the most with the current academic limit of 18 credits. We can conclude that there’s a negative correlation between natural science level in Major variable and agreement with the current academic limit. This conclusion is supported by the negative correlation in the regression report of linear model fitted with Major categorical variable and Agree quantitative variable. However, the p value is not small enough to reject the null hypothesis. If we expand our sample size to 1000, we can have a p value small enough to reject the null hypothesis and justify the statistical significance. That makes sense because science majors tend to have more major requirement classes, and thus they want to take more classes to fulfill their major requirement as early as possible, or take some classes for fun other than their major fields.
We have not collected enough number of data for Fine Art students, making it almost impossible to find the relationship between that major and the students’ preference of the credit limit. Therefore it might be Fine Arts students who are more likely to disagree with the 18-credits limit than natural science student.