Remember our in-class survey?
We've already looked at how to read in the data from Google and reformat it to make it easier to apply the operations we have learned in R.
You typically won't want to include this part in your report. I've consolidated all the commands here. If you look at the R fences in the source Rmd document, you'll see a statement echo=FALSE which signals to the system not to print out the commands. They get executed in the background.
## Error: could not find function "revalue"
## Error: could not find function "revalue"
## Error: could not find function "revalue"
As a liberal arts college, Macalester College has a class registration upper limit of 18-credits per semester. Students are allowed to participate in research, fellowships, and internships over the January term, but there are no formal classes available during this time except a summer physics program. There are also no summer classes available to students. This survey explores the attiudes and satisfaction of Macalester students with the upper 18-credits per semester class registration limit. We want to learn what our peers think of this limit, because from personal experience, students have voiced that he or she is taking 18 credits or wants to take more credits. We hope to gain some insight from this survey about the satisfaction and reasoning for a want to increase or keep the current semester credit limit. Ultimately, this could help provide a foundation to future considerations and changes in credit policies to accommodate more students.
Some of our initial hypotheses of the results of this survey are: students who would like to take summer/Jterm courses and students who want to graduate earlier than 4 years will want a greater number of credits per semester than 18 credits, NEED ANOTHER HYPOTHESIS, NEED ANOTHER HYPOTHESIS.
The survey consisted of 14 multiple-choice questions. We distributed the survey by posting a link to it on our personal facebooks and onto the facebook walls of each class group (class of 2014, class of 2015, etc.). We tried to be considerate and polite when asking for responses and said the survey should not take more than 5 minutes.
Give a short description of the important variables individually You don't need to be exhaustive, just orient the reader to the important bits.
The students answering the survey were primarily in the natural and social sciences:
barchart(tally(~Division, data = d, margins = FALSE, format = "count"), auto.key = TRUE)
Many students preceive themselves as lacking ability in computing (“Unable”)
barchart(tally(~StudyMore, data = d, margins = FALSE, format = "count"), auto.key = TRUE)
## Error: object 'StudyMore' not found
Make a simple graphic to display how students regard the importance of computing in data analysis.
# Your graphics statements go here
You may sometimes want to drop some ill-populated levels
Natural and social science students seem to be about the same regarding their inclination to study more.
dd = droplevels(subset(d, !Division %in% c("Art", "Hum")))
mosaicplot(Division ~ StudyMore, data = dd, las = 2, col = rainbow(5))
## Error: object 'StudyMore' not found
Make a simple graphic that's relevant to the two hypotheses given above
# Your graphics statements go here
Does the inclination to study more depend on the student's rating of the importance of data analysis? Here's a logistic regression model of whether a student will study more based on division and whether they think computation is important in data analysis:
mod = glm( StudyMore=="WillDo" ~ as.numeric(Data) + Division,
data=d, family="binomial")
## Error: object 'StudyMore' not found
The regression table:
## Error: error in evaluating the argument 'x' in selecting a method for
## function 'print': Error: could not find function "xtable"
The coefficient on “Data”“ (the importance of computation in data analysis) is positive. This suggests that students with a higher estimation of importance are more likely to be inclined to study more computation. But the p-value is so high that we cannot reject the null hypothesis.
Note that as.numeric(Data) was used so that the ordinal properties of the variable were considered. This treats the ordinal variable as quantitative.
You may want to modify your models. For instance, Division doesn't show up as significant in the above. You do not need to include every model you try in your write-up. But give a short summary, e.g. Division doesn't seem to be related to inclination to study more computing.
Build and interpret a model of whether students with a higher skill level are more likely to intend to study more computing.
If your p-values are too large to reject the null, it's helpful to give some guidance to future researchers. Select a sample size that will give you a p-value of 0.01 and report that. To do this, you'll need to vary the sample size until you find one that works reliably. You don't have to show the calculations you do, just give the result. (Your instructor can check it out by using that sample size!)
Here's an example of the calculation:
largerSample = resample(d,size=1000)
mod = glm( StudyMore=="WillDo" ~ as.numeric(Data) + Division,
data=largerSample, family="binomial")
## Error: object 'StudyMore' not found
## Error: object 'mod' not found
Remember, you don't have to include the table in your report, just the conclusion.
Repeat the above lines to find a small size that gives a p-value of about 0.01.
Summarize your conclusions briefly here. You don't need to present more statistical analysis; you've already done that.
Only students with computers in class were able to do the survey. Perhaps they find computing less important, and so the results may be biased toward students who have a stronger interest in computing.
Very few arts and humanities students are enrolled in the class.
State weaknesses in your methodology. This won't detract from your grade, indeed the opposite.