Problem Set 4 (without answers)

Seth J. Chandler

September 11, 2014

Introduction

This problem set will make sure you understand the concept of a p-value and that you know how to calculate a p-value for various statistical tests. It will also make sure you understand which tests are appropriate to use.

Conceptual

The “Chandler Test” for whether a sample comes from a uniform distribution of length n on the interval 0 to 1 is as follows. (1) Sort the values in your sample; call the result s. (2) Make a vector of values from 1/n to n in increments of 1/n; call the results q; this vector should have the same length as s. (3) subtract the values of s from q and square them; call this result d2. (4) compute the mean value of d2 and the maximum value of d2; take the average of those two results. This is your “c-statistic.”

Write a function cstat that computes the “c-statistic” for a sample. Test your function on the sample c(0.83, 0.78, 0.06, 0.51, 0.75, 0.9, 0.55, 0.95, 0.54, 0.35). You should get 0.02818. Hint: here are some built-in commands I used in my solution: sort, length, seq, mean, max.

#YOUR CODE HERE

Set a random seed of 9112014. Create 1000 samples of a draw of 20 from a uniform distribution on 0 to 1. Compute the “c-statistic” for each of these draws. Call the result cdist. Make a simple histogram of cdist. Hint: here are some commands I used in my answer: set.seed, replicate, runif, apply, hist.

#YOUR CODE HERE

Here is another draw of 20.

mysample<-c(0.76, 0.32, 0.27, 0.39, 0.61, 0.54, 0.44, 0.67, 0.33, 0.42, 0.37,
0.54, 0.64, 0.65, 0.72, 0.51, 0.51, 0.73, 0.31, 0.71)

You want to “prove” that it was not drawn from a uniform distribution on 0 to 1. How might you use your previous work to establish this. Can you say, if the sample had been drawn from a uniform distribution on 0 to 1, the probability that we could get a c-statistic as high as the one we have here is only x. What is x? (Your answer should include a number and an explanation of what that number means.) Hint: here are some R built-in commands I used in my answer: sapply, length. Another hint: the numeric component of your answer should be between 0.05 and 0.2.

#YOUR CODE HERE

The T-Test

Here are the scores of University of Houston students on the February administration of the Texas bar exam. I also include the scores of University of Texas students on the same exam.1

uh<-c(771, 785, 690, 777, 748, 692, 739, 780, 656, 713, 793, 803, 781, 
696, 686, 696, 794, 802, 711, 696)
ut<-c(719, 750, 743, 813, 718, 800, 698, 768, 857, 781, 785, 829, 905, 
727, 799, 801, 773, 809, 824, 742, 662, 731, 753, 716, 728, 687, 798)

The dean at the University of Houston would like to be able to say that, sure, the mean score of UT students is a little higher than UH students, but that is just statistical noise. What can you say about the data using the t-test?

#YOUR CODE HERE

Chi Squared Test and Fisher Exact Test

You believe that jurors in your county court are being struck by the prosecutor’s office on the basis of race. You find that there were 210 non-white jurors in a survey or whom 43 were subjected to a peremptory challenge (167 were not). There were 700 white jurors of whom 105 were subjected to a peremptory challenge (595 were not). Using R, do a chi-squared analysis and a Fisher Exact analysis and tell me whether the difference between use of peremptory challenges is “statistically significant.”

#YOUR CODE HERE

Kolmogorov Smirnov Test

Remember our University of Houston dean? Now he wants to say that not only do University of Houston students “do the same”" as UT students on the bar, but that their distribution of scores is exactly the same. Using R, run a Kolmogorov Smirnov test on the data to see what it shows.

#YOUR CODE HERE

Does the result of your use of R show that the distribution of scores among UH students and UT students is the same? If not,w hat can properly be said about the results?


  1. This data is entirely imaginary.