Lab 8

Name:Sonora Williams

Section: 01L

Date:November 5, 2013

Exercises

Load data & inference function:

source("http://stat.duke.edu/~kkl13/courses/sta102F13/labs/inference.R")
cont = read.csv("http://stat.duke.edu/~kkl13/courses/sta102F13/labs/contributions.csv")

Exercise 1:

Bachmann, Cain, Gingrich, Huntsman, Johnson, McCotter, Obama, Paul, Pawlenty, Perry, Roemer, Romney, and Santorum are all the candidate with contributers. Barack Obama had the greatest number of contributors with 7,454 contributors. Thaddeus McCotter had the fewest contributors with only one person caring enough to contribute info for him.

table(cont$cand_nm)

## 
##              Bachmann, Michele                   Cain, Herman 
##                             34                             64 
##                 Gingrich, Newt                  Huntsman, Jon 
##                            171                             16 
##             Johnson, Gary Earl           McCotter, Thaddeus G 
##                              8                              1 
##                  Obama, Barack                      Paul, Ron 
##                           7454                            445 
##              Pawlenty, Timothy                    Perry, Rick 
##                             15                             46 
## Roemer, Charles E. 'Buddy' III                   Romney, Mitt 
##                             14                           1579 
##                 Santorum, Rick 
##                            153

Exercise 2:

it is quite interesting that they all have some negative contributions, some more or less than others.

# subset for major Republican candidates
rep_mjr = subset(cont, (cont$cand_nm == "Romney, Mitt" | cont$cand_nm == "Paul, Ron" | 
    cont$cand_nm == "Gingrich, Newt" | cont$cand_nm == "Santorum, Rick"))
# subset for primary election
rep_mjr_pri = subset(rep_mjr, rep_mjr$election_tp == "P2012")
table(rep_mjr_pri$cand_nm)

## 
##              Bachmann, Michele                   Cain, Herman 
##                              0                              0 
##                 Gingrich, Newt                  Huntsman, Jon 
##                            165                              0 
##             Johnson, Gary Earl           McCotter, Thaddeus G 
##                              0                              0 
##                  Obama, Barack                      Paul, Ron 
##                              0                            445 
##              Pawlenty, Timothy                    Perry, Rick 
##                              0                              0 
## Roemer, Charles E. 'Buddy' III                   Romney, Mitt 
##                              0                            952 
##                 Santorum, Rick 
##                            151

pri = droplevels(rep_mjr_pri)
table(pri$cand_nm)

## 
## Gingrich, Newt      Paul, Ron   Romney, Mitt Santorum, Rick 
##            165            445            952            151

par(mfrow = c(2, 2))
boxplot(pri$contb_receipt_amt[pri$cand_nm == "Romney, Mitt"], main = "Romney")
boxplot(pri$contb_receipt_amt[pri$cand_nm == "Paul, Ron"], main = "Paul")
boxplot(pri$contb_receipt_amt[pri$cand_nm == "Gingrich, Newt"], main = "Ging")
boxplot(pri$contb_receipt_amt[pri$cand_nm == "Santorum, Rick"], main = "Santa")

plot of chunk unnamed-chunk-3

Exercise 3:

Romney has the highest total contribution of 519044.3.

neg_index = which(pri$contb_receipt_amt < 0)
pri$receipt_desc[neg_index]

##  [1]                          REDESIGNATION TO GENERAL
##  [3] REDESIGNATION TO GENERAL REATTRIBUTION TO SPOUSE 
##  [5] Refund                   REDESIGNATION TO GENERAL
##  [7] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
##  [9] REDESIGNATION TO GENERAL REATTRIBUTED BELOW      
## [11] REATTRIBUTION TO SPOUSE  REDESIGNATION TO GENERAL
## [13] REATTRIBUTION TO SPOUSE  REDESIGNATION TO GENERAL
## [15] REATTRIBUTION TO SPOUSE                          
## [17]                          Refund                  
## [19] Refund                   REDESIGNATION TO GENERAL
## [21] REDESIGNATION TO GENERAL                         
## [23]                          Refund                  
## [25] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
## [27]                          REATTRIBUTION TO SPOUSE 
## [29] Refund                   REDESIGNATION TO GENERAL
## [31] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
## [33] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
## [35] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
## [37] REDESIGNATION TO GENERAL
## 9 Levels:  REATTRIBUTED BELOW ... SEE REATTRIBUTION

sum(pri$contb_receipt_amt[pri$cand_nm == "Romney, Mitt"])

## [1] 519044

sum(pri$contb_receipt_amt[pri$cand_nm == "Paul, Ron"])

## [1] 67228

sum(pri$contb_receipt_amt[pri$cand_nm == "Gingrich, Newt"])

## [1] 23638

sum(pri$contb_receipt_amt[pri$cand_nm == "Santorum, Rick"])

## [1] 31747

Exercise 4:

Romney also has the highest average contribution 0f 545.2146.

mean(pri$contb_receipt_amt[pri$cand_nm == "Romney, Mitt"])

## [1] 545.2

mean(pri$contb_receipt_amt[pri$cand_nm == "Paul, Ron"])

## [1] 151.1

mean(pri$contb_receipt_amt[pri$cand_nm == "Gingrich, Newt"])

## [1] 143.3

mean(pri$contb_receipt_amt[pri$cand_nm == "Santorum, Rick"])

## [1] 210.2

Exercise 5:

Ho:the variation is no greater than that due to normal variation of characteristics and error in measurement Ha:the variation is greater than that due to normal variation of characteristics and error in measurement. The variation is dependent on the candidates as individuals.

Exercise 6:

The conditions of an ANOVA test are It must be reasonable to regard the groups of observations as random samples from their respective populations.(Samuels et.al.) The “I” samples must be independent of each otheter. The “I” population distributions mut be normal with equal standard deviations.

The data is pretty normal except there are signs of short tails for all of the normal probability plots. The conditions are met in all other respects.

qqnorm(pri$contb_receipt_amt[pri$cand_nm == "Romney, Mitt"], main = "Romney")
qqline(pri$contb_receipt_amt[pri$cand_nm == "Romney, Mitt"])

plot of chunk unnamed-chunk-6

qqnorm(pri$contb_receipt_amt[pri$cand_nm == "Paul, Ron"], main = "Paul")
qqline(pri$contb_receipt_amt[pri$cand_nm == "Paul, Ron"])

plot of chunk unnamed-chunk-6

qqnorm(pri$contb_receipt_amt[pri$cand_nm == "Gingrich, Newt"], main = "Gingrich")
qqline(pri$contb_receipt_amt[pri$cand_nm == "Gingrich, Newt"])

plot of chunk unnamed-chunk-6

qqnorm(pri$contb_receipt_amt[pri$cand_nm == "Santorum, Rick"], main = "Santorum")
qqline(pri$contb_receipt_amt[pri$cand_nm == "Santorum, Rick"])

plot of chunk unnamed-chunk-6

Exercise 7:

0.008333333 is the new significance level.

inference(data = pri$contb_receipt_amt, group = pri$cand_nm, est = "mean", type = "ht", 
    alternative = "greater", method = "theoretical")

## Response variable: numerical, Explanatory variable: categorical
## ANOVA
## 
## Summary statistics:
## n_Gingrich, Newt = 165, mean_Gingrich, Newt = 143.3, sd_Gingrich, Newt = 432.8
## n_Paul, Ron = 445, mean_Paul, Ron = 151.1, sd_Paul, Ron = 277.3
## n_Romney, Mitt = 952, mean_Romney, Mitt = 545.2, sd_Romney, Mitt = 968.5
## n_Santorum, Rick = 151, mean_Santorum, Rick = 210.2, sd_Santorum, Rick = 411.1

## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: data
##             Df   Sum Sq  Mean Sq F value Pr(>F)
## group        3 6.29e+07 20951773    36.5 <2e-16
## Residuals 1709 9.82e+08   574777               
## 
## Pairwise tests: t tests with pooled SD 
##                Gingrich, Newt Paul, Ron Romney, Mitt
## Paul, Ron              0.9100        NA           NA
## Romney, Mitt           0.0000    0.0000           NA
## Santorum, Rick         0.4328    0.4074            0

0.05/6

## [1] 0.008333

inference(data = pri$contb_receipt_amt, group = pri$cand_nm, est = "mean", type = "ht", 
    alternative = "greater", method = "theoretical", siglevel = 0.008333333)

## Response variable: numerical, Explanatory variable: categorical
## ANOVA
## 
## Summary statistics:
## n_Gingrich, Newt = 165, mean_Gingrich, Newt = 143.3, sd_Gingrich, Newt = 432.8
## n_Paul, Ron = 445, mean_Paul, Ron = 151.1, sd_Paul, Ron = 277.3
## n_Romney, Mitt = 952, mean_Romney, Mitt = 545.2, sd_Romney, Mitt = 968.5
## n_Santorum, Rick = 151, mean_Santorum, Rick = 210.2, sd_Santorum, Rick = 411.1

## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: data
##             Df   Sum Sq  Mean Sq F value Pr(>F)
## group        3 6.29e+07 20951773    36.5 <2e-16
## Residuals 1709 9.82e+08   574777               
## 
## Pairwise tests: t tests with pooled SD 
##                Gingrich, Newt Paul, Ron Romney, Mitt
## Paul, Ron              0.9100        NA           NA
## Romney, Mitt           0.0000    0.0000           NA
## Santorum, Rick         0.4328    0.4074            0

plot of chunk unnamed-chunk-7

Romney is still at both significance levels, the only one with significantly different average contributions from the other candidates.

General Election

# subset for general elections and Obama, Romney, and Johnson
pres_temp1 = subset(cont, cont$election_tp == "G2012")
pres_temp2 = subset(pres_temp1, (pres_temp1$cand_nm == "Obama, Barack" | pres_temp1$cand_nm == 
    "Romney, Mitt" | pres_temp1$cand_nm == "Johnson, Gary Earl"))
# droplevels
pres = droplevels(pres_temp2)
inference(data = pres_temp2$contb_receipt_amt, group = pres_temp2$cand_nm, est = "mean", 
    type = "ht", alternative = "greater", method = "theoretical")

## Response variable: numerical, Explanatory variable: categorical
## ANOVA
## 
## Summary statistics:
## n_Bachmann, Michele = NA, mean_Bachmann, Michele = NA, sd_Bachmann, Michele = NA
## n_Cain, Herman = NA, mean_Cain, Herman = NA, sd_Cain, Herman = NA
## n_Gingrich, Newt = NA, mean_Gingrich, Newt = NA, sd_Gingrich, Newt = NA
## n_Huntsman, Jon = NA, mean_Huntsman, Jon = NA, sd_Huntsman, Jon = NA
## n_Johnson, Gary Earl = 6, mean_Johnson, Gary Earl = 230, sd_Johnson, Gary Earl = 226.1
## n_McCotter, Thaddeus G = NA, mean_McCotter, Thaddeus G = NA, sd_McCotter, Thaddeus G = NA
## n_Obama, Barack = 2008, mean_Obama, Barack = 159.1, sd_Obama, Barack = 441.4
## n_Paul, Ron = NA, mean_Paul, Ron = NA, sd_Paul, Ron = NA
## n_Pawlenty, Timothy = NA, mean_Pawlenty, Timothy = NA, sd_Pawlenty, Timothy = NA
## n_Perry, Rick = NA, mean_Perry, Rick = NA, sd_Perry, Rick = NA
## n_Roemer, Charles E. 'Buddy' III = NA, mean_Roemer, Charles E. 'Buddy' III = NA, sd_Roemer, Charles E. 'Buddy' III = NA
## n_Romney, Mitt = 627, mean_Romney, Mitt = 500.1, sd_Romney, Mitt = 795.2
## n_Santorum, Rick = NA, mean_Santorum, Rick = NA, sd_Santorum, Rick = NA

## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: data
##             Df   Sum Sq  Mean Sq F value Pr(>F)
## group        2 5.56e+07 27782585    93.1 <2e-16
## Residuals 2638 7.87e+08   298380               
## 
## Pairwise tests: t tests with pooled SD 
##               Johnson, Gary Earl Obama, Barack
## Obama, Barack             0.7510            NA
## Romney, Mitt              0.2281             0

plot of chunk unnamed-chunk-8

Exercise 8:

Obama and Romney have are significantly different from the other candidates but not different from each other, average contribution wise.

inference(data = pres_temp2$contb_receipt_amt, group = pres_temp2$cand_nm, est = "mean", 
    type = "ht", alternative = "greater", method = "theoretical")

## Response variable: numerical, Explanatory variable: categorical
## ANOVA
## 
## Summary statistics:
## n_Bachmann, Michele = NA, mean_Bachmann, Michele = NA, sd_Bachmann, Michele = NA
## n_Cain, Herman = NA, mean_Cain, Herman = NA, sd_Cain, Herman = NA
## n_Gingrich, Newt = NA, mean_Gingrich, Newt = NA, sd_Gingrich, Newt = NA
## n_Huntsman, Jon = NA, mean_Huntsman, Jon = NA, sd_Huntsman, Jon = NA
## n_Johnson, Gary Earl = 6, mean_Johnson, Gary Earl = 230, sd_Johnson, Gary Earl = 226.1
## n_McCotter, Thaddeus G = NA, mean_McCotter, Thaddeus G = NA, sd_McCotter, Thaddeus G = NA
## n_Obama, Barack = 2008, mean_Obama, Barack = 159.1, sd_Obama, Barack = 441.4
## n_Paul, Ron = NA, mean_Paul, Ron = NA, sd_Paul, Ron = NA
## n_Pawlenty, Timothy = NA, mean_Pawlenty, Timothy = NA, sd_Pawlenty, Timothy = NA
## n_Perry, Rick = NA, mean_Perry, Rick = NA, sd_Perry, Rick = NA
## n_Roemer, Charles E. 'Buddy' III = NA, mean_Roemer, Charles E. 'Buddy' III = NA, sd_Roemer, Charles E. 'Buddy' III = NA
## n_Romney, Mitt = 627, mean_Romney, Mitt = 500.1, sd_Romney, Mitt = 795.2
## n_Santorum, Rick = NA, mean_Santorum, Rick = NA, sd_Santorum, Rick = NA

## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: data
##             Df   Sum Sq  Mean Sq F value Pr(>F)
## group        2 5.56e+07 27782585    93.1 <2e-16
## Residuals 2638 7.87e+08   298380               
## 
## Pairwise tests: t tests with pooled SD 
##               Johnson, Gary Earl Obama, Barack
## Obama, Barack             0.7510            NA
## Romney, Mitt              0.2281             0

plot of chunk unnamed-chunk-9

Exercise 9:

table(pres$cand_nm)

## 
## Johnson, Gary Earl      Obama, Barack       Romney, Mitt 
##                  6               2008                627

While Obama and Romney have oodles of contributions, poor Johnson only has six, which is far too few to meet the conditions, making the ANOVA test inreliable.

Exercise 10:

# subset for general elections and Obama, and Romney
pres_temp3 = subset(cont, cont$election_tp == "G2012")
pres_temp4 = subset(pres_temp3, (pres_temp3$cand_nm == "Obama, Barack" | pres_temp3$cand_nm == 
    "Romney, Mitt"))
# droplevels
pres2 = droplevels(pres_temp4)

Exercise 11:

Romney has a larger average contribution amount than Barack, but a lower total contribution amount. This may be due to a weightier contribution in Romney's lot than in Obama's. Romney may have a more positive contributions than Obama, while Obama has more contributions despite the neg or pos value.

neg_index = which(pres2$contb_receipt_amt < 0)
pres2$receipt_desc[neg_index]

##  [1] Refund Refund Refund               Refund Refund Refund Refund Refund
## [11] Refund
## 5 Levels:  REATTRIBUTION FROM SPOUSE ... SEE REATTRIBUTION

sum(pres2$contb_receipt_amt[pres2$cand_nm == "Romney, Mitt"])

## [1] 313580

sum(pres2$contb_receipt_amt[pres2$cand_nm == "Obama, Barack"])

## [1] 319497

mean(pres2$contb_receipt_amt[pres2$cand_nm == "Romney, Mitt"])

## [1] 500.1

mean(pres2$contb_receipt_amt[pres2$cand_nm == "Obama, Barack"])

## [1] 159.1

Exercise 12:

Because we are only comparing two variables, Obama and Romney, and be cause I do not already know the standard deviation, we should use a T test.

Exercise 13:

Th p-value is so small, it shows sufficient evidence of a significant difference between Romney's and Obama's average contribution amounts.

inference(data = pres2$contb_receipt_amt, est = "mean", siglevel = 0.05, null = 0, 
    alternative = "twosided", type = "ht", method = "theoretical")

## Single mean 
## Summary statistics:

## mean = 240.257 ;  sd = 565.5362 ;  n = 2635 
## H0: mu = 0 
## HA: mu != 0 
## Standard error = 11.02 
## Test statistic: Z = 21.808 
## p-value =  0

plot of chunk unnamed-chunk-13

Exercise 14:

The confidence interval, ( 218.6637 , 261.8503 ), does include the mean difference from the data, supporting the conclusion that there is a significant difference between the two average contribution amounts.

inference(data = pres2$contb_receipt_amt, est = "mean", siglevel = 0.05, null = 0, 
    alternative = "twosided", type = "ci", method = "theoretical")

## Single mean 
## Summary statistics:

plot of chunk unnamed-chunk-14

## mean = 240.257 ;  sd = 565.5362 ;  n = 2635 
## Standard error = 11.0172 
## 95 % Confidence interval = ( 218.6637 , 261.8503 )