Your Survey Report

Wed May 1 11:12:10 2013

Daniel Kaplan

Math 155 Survey Project

Remember our in-class survey?

We've already looked at how to read in the data from Google and reformat it to make it easier to apply the operations we have learned in R.

Reading in the Data

You typically won't want to include this part in your report. I've consolidated all the commands here. If you look at the R fences in the source Rmd document, you'll see a statement echo=FALSE which signals to the system not to print out the commands. They get executed in the background.

Background

An important component of modern statistics is computational. Accordingly, we invest a considerable amount of time and effort — both for students and faculty — in using computation intensively in our courses. This survey attempts to examine student attitudes toward computation. (Of course, mainly, it's a simple survey for collecting data to illustrate data processing and writing up a survey report.)

*You should also include one or more hypotheses here … * We had several hypotheses in designing the survey: * Students who think that skills in data analysis are important, are more likely to be inclined to study more about computing, unless they think it's beyond their ability. * Students with a current higher skill level are more likely to be inclined to study more about computing.

Don't be afraid to state hypotheses that you think are obvious. Even if it's obvious, you'll still want to try to demonstrate them from your data.

Methods

Describe how you distributed your survey

This survey was displayed in class in Math 155 and students were given time to complete it during the class.

Description of the Variables

Give a short description of the important variables individually You don't need to be exhaustive, just orient the reader to the important bits.

The students answering the survey were primarily in the natural and social sciences:

barchart(tally(~Division, data = d, margins = FALSE, format = "count"), auto.key = TRUE)

plot of chunk unnamed-chunk-3

Many students preceive themselves as lacking ability in computing (“Unable”)

barchart(tally(~StudyMore, data = d, margins = FALSE, format = "count"), auto.key = TRUE)

plot of chunk unnamed-chunk-4

YOUR TASK 1

Make a simple graphic to display how students regard the importance of computing in data analysis.

# Your graphics statements go here

Graphical descriptions of relationships between variables

You may sometimes want to drop some ill-populated levels

Natural and social science students seem to be about the same regarding their inclination to study more.

dd = droplevels(subset(d, !Division %in% c("Art", "Hum")))
mosaicplot(Division ~ StudyMore, data = dd, las = 2, col = rainbow(5))

plot of chunk unnamed-chunk-6

YOUR TASK 2

Make a simple graphic that's relevant to the two hypotheses given above

# Your graphics statements go here

Modeling Analysis

Does the inclination to study more depend on the student's rating of the importance of data analysis? Here's a logistic regression model of whether a student will study more based on division and whether they think computation is important in data analysis:

mod = glm( StudyMore=="WillDo" ~ as.numeric(Data) + Division, 
           data=d, family="binomial")

The regression table:

Estimate Std. Error z value Pr(> |z|)
(Intercept) -16.9672 2399.5448 -0.01 0.9944
as.numeric(Data) 0.4011 0.4408 0.91 0.3629
DivisionSci 14.8617 2399.5450 0.01 0.9951
DivisionSS 14.3239 2399.5451 0.01 0.9952

The coefficient on “Data”“ (the importance of computation in data analysis) is positive. This suggests that students with a higher estimation of importance are more likely to be inclined to study more computation. But the p-value is so low that we cannot reject the null hypothesis.

Note that as.numeric(Data) was used so that the ordinal properties of the variable were considered. This treats the ordinal variable as quantitative.

You may want to modify your models. For instance, Division doesn't show up as significant in the above. You do not need to include every model you try in your write-up. But give a short summary, e.g. Division doesn't seem to be related to inclination to study more computing.

Task 3

Build and interpret a model of whether students with a higher skill level are more likely to intend to study more computing.

Sample Size

If your p-values are too large to reject the null, it's helpful to give some guidance to future researchers. Select a sample size that will give you a p-value of 0.01 and report that. To do this, you'll need to vary the sample size until you find one that works reliably. You don't have to show the calculations you do, just give the result. (Your instructor can check it out by using that sample size!)

Here's an example of the calculation:

largerSample = resample(d,size=1000)
mod = glm( StudyMore=="WillDo" ~ as.numeric(Data) + Division, 
           data=largerSample, family="binomial")
## 
## Call:
## glm(formula = StudyMore == "WillDo" ~ as.numeric(Data) + Division, 
##     family = "binomial", data = largerSample)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.025  -0.876  -0.778   1.338   1.862  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -15.9558   352.9858   -0.05     0.96    
## as.numeric(Data)   0.3897     0.0882    4.42    1e-05 ***
## DivisionSci       14.0277   352.9858    0.04     0.97    
## DivisionSS        13.3572   352.9859    0.04     0.97    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1150.4  on 957  degrees of freedom
## Residual deviance: 1106.5  on 954  degrees of freedom
##   (42 observations deleted due to missingness)
## AIC: 1115
## 
## Number of Fisher Scoring iterations: 14

Remember, you don't have to include the table in your report, just the conclusion.

Task 4

Repeat the above lines to find a small size that gives a p-value of about 0.01.

Conclusions

Summarize your conclusions briefly here. You don't need to present more statistical analysis; you've already done that.

Comments

Only students with computers in class were able to do the survey. Perhaps they find computing less important, and so the results may be biased toward students who have a stronger interest in computing.

Very few arts and humanities students are enrolled in the class.

State weaknesses in your methodology. This won't detract from your grade, indeed the opposite.