Introducing… R Markdown Files!

File > New File > select either Markdown or Notebook

If you have output: html_notebook, it allows you to preview it and refresh it before saving it as a Word or PDF document. If this interests you, search the difference between R Notebook and R Markdown files. They have subtle differences but both end up as Markdown files when saved. Knit runs the whole document and saves it. Preview will show you the output for only what you have run during the session. I kind of like the notebook over markdown file type.

If you have output: word_document then it will export it as a word document with the same name in your working directory. If you have output: pdf_document then it will export it as a pdf with the same name in your working directory. Knitting the PDF requires LaTeX to be installed, otherwise it will fail.

Otherwise you can click on the options for “Preview” and select the format you want to knit/export it as.

I personally liked using the Notebook and preview feature to frequently refresh changes and then “printed” the html preview as a PDF once I liked how it looked. This way I could also delete any output I didn’t want before saving it as a final version. That’s just me and it’s probably not the best way to do it but it did the job and kept the formatting I wanted.

You can NOT submit html files on Blackboard.

There are countless resources for using Markdown/Notebook files. One great resource: https://rmarkdown.rstudio.com/lesson-9.html

Another resource: https://bookdown.org/yihui/rmarkdown-cookbook/notebook.html

Switching between formats can make it grumpy, just a heads up. Usually saving, closing and reopening R fixes the problem.

require("knitr")

## Loading required package: knitr

## Warning: package 'knitr' was built under R version 3.6.3

opts_knit$set(root.dir= ("C:/Users/aleaw/OneDrive/Desktop/PhD Fall 2020/TA_402/Week 10"))
knitr::opts_chunk$set(echo=TRUE) # echo=TRUE keeps the code in the chunks that creates output visible. Otherwise if echo=FALSE, only the output is shown without the code.
# echo=TRUE for all chunks is useful for teaching and showing your code in homework.

## Warning: package 'tidyverse' was built under R version 3.6.3

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.1     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   1.0.0
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## Warning: package 'ggplot2' was built under R version 3.6.3

## Warning: package 'tibble' was built under R version 3.6.3

## Warning: package 'tidyr' was built under R version 3.6.3

## Warning: package 'purrr' was built under R version 3.6.3

## Warning: package 'dplyr' was built under R version 3.6.3

## Warning: package 'forcats' was built under R version 3.6.3

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

## Warning: package 'haven' was built under R version 3.6.3

## Warning: package 'car' was built under R version 3.6.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 3.6.3

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

## Warning: package 'psych' was built under R version 3.6.3

## 
## Attaching package: 'psych'

## The following object is masked from 'package:car':
## 
##     logit

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

## Warning: package 'descr' was built under R version 3.6.3

## Warning: package 'DescTools' was built under R version 3.6.3

## 
## Attaching package: 'DescTools'

## The following objects are masked from 'package:psych':
## 
##     AUC, ICC, SD

## The following object is masked from 'package:car':
## 
##     Recode

For anyone that knits, enjoy the entertaining command names in the knitr package :)

In case you don’t like this format, you can go from .Rmd (which is a markdown file) to .R (which is a script file) by using the purl() command below:

knitr::purl(input=‘Week10Lab.Rmd’, output=‘Week10Script.R’, documentation = 2)

documentation=0 keeps only the R code that is in the chunks
documentation=1 default option. adds chunk headers to the code
documentation=2 keeps all the text and changes it to a comments in the script file
- I want all the comments to stay with the code for this situation, but depending on what you are doing, you might want something else.

Just in case this isn’t working for you… I will upload an .R file to blackboard too.

# knitr::purl(input='Week10Lab.Rmd', output='Week10Script.R', documentation = 2)
# delete pound sign in the above line of code if you want it to run

Chapter 2’s Appendix in the Fogarty textbook goes over Markdown too. Plus countless resources online for using Markdown or R Notebooks if that interests you.

Chi Squared Tests of Independence

Fogarty Chapter 10 has a really great section dedicated to both Measures of Association and Chi-Squared examples with interpretations.

This test is used to test whether two variables are related to one another or not. Crosstabulations and Chi-Squared Tests are for NOMINAL and ORDINAL data with few categories. If you use an ordinal variable, the options will be treated as categories without order.

Don’t use interval or ratio variables. Those are best for regressions.

There are a lot of nonparametric tests of association:

Pearson Chi-Squared Contingency Test - 2 categorical variables
Cramer’s V - for strength of relationship between 2 Categorical Variables or 1 categorical with 1 ordinal variable
Kendall’s Tau-B - Strength of relationship between ordinal variables with equal number of rows and columns
Goodman Kruskal Test - Strength of relationship between ordinal variables with unequal number of rows and columns
Plus a lot more…
Chapter 10, Table 1 has a nice summary of what to use and when to use it.

The chi-squared test looks at the crosstables’s cell frequencies (called the observed frequencies) and then compares them to expected frequencies. The differences between the expected and observed are used in the calculation of the Chi-Square statistic (Fogarty Chapter 10).

Null Hypothesis is that there is no relationship.

Alternative hypothesis is that there is a relationship.

Remember: For the Chi-Squared Contingency Test, this test does not tell you if it is the right relationship, only if there is an association between two categorical variables. You still need common sense!

Variable Prep

V161342: Gender

Originally coded as 1=Male, 2=Female

table(ANES$V161342)

## 
##   -9    1    2    3 
##   41 1987 2231   11

ANES$gender <- recode(as.numeric(ANES$V161342), "1='Male';2='Female'; else=NA")  #else=NA gets rid of the 3 and -9
table(ANES$gender)

## 
## Female   Male 
##   2231   1987

V161115: Healthy - Would you say that in general your health is excellent, very good, good,fair, or poor? where 1=Excellent and 5=Poor

table(ANES$V161115)

## 
##   -9    1    2    3    4    5 
##    8  742 1429 1341  604  146

ANES$healthy <- recode(ANES$V161115, "-9=NA")
table(ANES$healthy)

## 
##    1    2    3    4    5 
##  742 1429 1341  604  146

freq(ANES$healthy)

## PRE: Self-evaluation of R health 
##       Frequency  Percent Valid Percent
## 1           742  17.3770        17.410
## 2          1429  33.4660        33.529
## 3          1341  31.4052        31.464
## 4           604  14.1452        14.172
## 5           146   3.4192         3.426
## NA's          8   0.1874              
## Total      4270 100.0000       100.000

V161232: Opinions on Abortion, 4 options

attributes(ANES$V161232)

## $label
## [1] "PRE: STD Abortion: self-placement"
## 
## $format.stata
## [1] "%93.0g"
## 
## $class
## [1] "haven_labelled" "vctrs_vctr"     "double"        
## 
## $labels
##                                                                                   -9. Refused 
##                                                                                            -9 
##                                                                     -8. Don't know (FTF only) 
##                                                                                            -8 
##                                                1. By law, abortion should never be permitted. 
##                                                                                             1 
##                           2. By law, only in case of rape, incest, or woman's life in danger. 
##                                                                                             2 
## 3. By law, for reasons other than rape, incest, or woman's life in danger if need established 
##                                                                                             3 
##                                           4. By law, abortion as a matter of personal choice. 
##                                                                                             4 
##                                                                              5. Other SPECIFY 
##                                                                                             5

table(ANES$V161232)

## 
##   -9   -8    1    2    3    4    5 
##   48    9  544 1115  616 1932    6

ANES$choice <- ifelse(ANES$V161232 <1, NA, ANES$V161232)
ANES$choice <- recode(as.numeric(ANES$V161232), "1='By law Never'; 2='Law Permits Extreme Cases'; 3='Law Permits If Need Established'; 4='By law Always Personal Choice'; else=NA")
table(ANES$choice)

## 
##   By law Always Personal Choice                    By law Never 
##                            1932                             544 
##       Law Permits Extreme Cases Law Permits If Need Established 
##                            1115                             616

V161232: Opinions on Abortion - Simplified to Never and Conditional Yes/Yes

make a dummy variable where 0=“Never allow abortion”, and 1=options 2,3,4 which allow abortion to some extent

ANES$choice01 <- recode(as.numeric(ANES$V161232), "1=0; 2=1;3=1;4=1; else=NA")
table(ANES$choice01)

## 
##    0    1 
##  544 3663

ANES$choice01 <- recode(ANES$choice01, "1='Yes/Conditional Yes'; 0='Never'")
table(ANES$choice01)

## 
##               Never Yes/Conditional Yes 
##                 544                3663

V161019: Party of registration, Categorical

table(ANES$V161019)

## 
##   -9   -8   -1    1    2    4    5 
##    9   11 2151  924  682  471   22

ANES$regparty <- ANES$V161019
ANES$regparty <- ifelse(ANES$regparty <0, NA, ANES$regparty)
table(ANES$regparty)

## 
##   1   2   4   5 
## 924 682 471  22

ANES$regpartylabels <- recode(ANES$regparty, "1='Democrat'; 2='Republican'; 4='Independent'; 5='Other' ")
table(ANES$regpartylabels)

## 
##    Democrat Independent       Other  Republican 
##         924         471          22         682

RECAP: Party of registration / Goodness of Fit

We did this in Week 6 for the Chi-Squared Goodness of Fit lab, but now we will go further into the details and usefulness of the command.

Reminder: Chi-Square goodness of fit is used to compare the observed sample distribution to either a known population distribution or expected probability distribution. 1 Categorical variable.

# V161019: Party of registration
freq(ANES$regpartylabels, plot=FALSE)

## ANES$regpartylabels 
##             Frequency  Percent Valid Percent
## Democrat          924  21.6393        44.021
## Independent       471  11.0304        22.439
## Other              22   0.5152         1.048
## Republican        682  15.9719        32.492
## NA's             2171  50.8431              
## Total            4270 100.0000       100.000

If the US population is 39% republicans, 40% democrats, and 20% independents, and 1% Other, does our sample represent the population?

Null hypothesis: Sample and population are not different

freq(ANES$regpartylabels)

## ANES$regpartylabels 
##             Frequency  Percent Valid Percent
## Democrat          924  21.6393        44.021
## Independent       471  11.0304        22.439
## Other              22   0.5152         1.048
## Republican        682  15.9719        32.492
## NA's             2171  50.8431              
## Total            4270 100.0000       100.000

population<- c(.39, .40, .20, .01)
sample <- c(.32492, .44021, .22439, .0148)
chisq.test(sample, p=population)

## Warning in chisq.test(sample, p = population): Chi-squared approximation may be
## incorrect

## 
##  Chi-squared test for given probabilities
## 
## data:  sample
## X-squared = 0.020075, df = 3, p-value = 0.9992

Interpretation of test statistics: X-squared = 0.020075, df=3, p-value=.9992 ??

The p-value =0.9992, which is >.05, so we cannot reject the null hypothesis. Therefore there does not seem to be a significant difference between the sample and the population.

Then we pretended that the United States political world drastically changed and resulted in a population distribution of 1% republicans, 45% democrats, and 37% independents, and 17% Other.

If the US population is 1% republicans, 45% democrats, and 37% independents, and 17% Other, does our sample represent the population?

Null hypothesis: Sample and population are not different

population2<- c(.01, .45, .37, .17)            # from the question
sample <- c(.32492, .44021, .22439, .0148)     # from our freq() table
chisq.test(sample, p=population2)        # compares sample and population

## Warning in chisq.test(sample, p = population2): Chi-squared approximation may be
## incorrect

## 
##  Chi-squared test for given probabilities
## 
## data:  sample
## X-squared = 10.073, df = 3, p-value = 0.01795

Interpretation of test statistics: X-squared = 10.073, df=3, p-value=0.01795 ??

The p-value=0.0195 which is <.05, so we can reject the null hypothesis. Therefore, we can infer that our sample is significantly different than the population at the 95% confidence level.

Both of the examples above used the Chi-Square Goodness of Fit test to determine if the sample distribution was significantly different than the population distribution.

Chi-Square Goodness of Fit: There was one variable observed.

Chi-Square Test of Independence: There are two variables for each observation.

put into the context of our survey, that means using two survey responses for each individual that responded.

Chi-Squared Test of Independence - Adding another variable!

Chi-Squared with Nominal Variables

Command syntax: CrossTable(Variable 1, Variable 2, prop.c=TRUE, ….. add other things if you want) If you want to display extra stuff when you run the CrossTable() function, include it in your line of code and set it equal to TRUE. If you don’t want it to appear in your output, set it it equal to FALSE. The first variable should be the dependent variable (x) and the second variable is the independent variable (y).

prop.r=TRUE shows row proportions in the table
prop.c=TRUE shows column proportions in the table
prop.t=TRUE shows table proportions in the table
prop.chisq=TRUE includes chi-square constribution for each cell
total.r=TRUE includes row totals
total.c=TRUE includes column totals
chisq=TRUE conducts the chi-squared test
expected=TRUE includes expected cell counts
resid=TRUE includes Pearson residual

Full details on the package and command options: https://cran.r-project.org/web/packages/descr/descr.pdf

Reminder: Expected frequency for each cell = (row totalXcolumn total)/overall total if you were to calculate it by hand.

partybygender <- CrossTable(ANES$regpartylabels, ANES$gender, prop.c=FALSE, prop.r=FALSE, prop.t=FALSE, prop.chisq = FALSE)
partybygender

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |-------------------------|
## 
## ============================================
##                        ANES$gender
## ANES$regpartylabels    Female   Male   Total
## --------------------------------------------
## Democrat                  555    354     909
## --------------------------------------------
## Independent               218    247     465
## --------------------------------------------
## Other                      14      8      22
## --------------------------------------------
## Republican                316    360     676
## --------------------------------------------
## Total                    1103    969    2072
## ============================================

Creates a crosstabulation of registered party and gender and then saves it as partybygender. This should look familiar to you. It is just like the table() function but now with two variables so it has the number of people who responded for each rows and column combination.

Now lets add some more details to the table:

partybygender2 <- CrossTable(ANES$regpartylabels, ANES$gender, prop.c=TRUE, prop.r=TRUE, prop.t=FALSE, prop.chisq = FALSE, chisq = TRUE, total.r = TRUE, total.c = TRUE, expected = TRUE)
partybygender2

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |              Expected N | 
## |           N / Row Total | 
## |           N / Col Total | 
## |-------------------------|
## 
## =============================================
##                        ANES$gender
## ANES$regpartylabels    Female    Male   Total
## ---------------------------------------------
## Democrat                  555     354     909
##                         483.9   425.1        
##                         0.611   0.389   0.439
##                         0.503   0.365        
## ---------------------------------------------
## Independent               218     247     465
##                         247.5   217.5        
##                         0.469   0.531   0.224
##                         0.198   0.255        
## ---------------------------------------------
## Other                      14       8      22
##                          11.7    10.3        
##                         0.636   0.364   0.011
##                         0.013   0.008        
## ---------------------------------------------
## Republican                316     360     676
##                         359.9   316.1        
##                         0.467   0.533   0.326
##                         0.286   0.372        
## ---------------------------------------------
## Total                    1103     969    2072
##                         0.532   0.468        
## =============================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 42.26516      d.f. = 3      p = 3.52e-09

Lots of information now… Let’s look at it for a bit. Based on lecture, what can we say about the output?

What about the Pearson’s Chi-Squared Test statistics?

Null Hypothesis: The two variables are not related. Alternative Hypothesis: The two variables are related.

The p-value < 0.01, therefore we can reject the null hypothesis and accept the alternative hypothesis. There is a relationship between the two variables representing gender and registered political party.

Okay but… what is the relationship? What can we say about it?

Measures of association!

Use measures of association to determine the direction and/or strength of the relationship. Remember: The correct test of association depends on variables you are using.

Cramer’s V

We have 2 categorical variables with unequal numbers of rows and columns (gender with 2 options and registered political party with 4 options). Using the nice table in the Fogarty textbook, we can see that we should use Cramer’s V test of association.

because they are categorical variables, we should NOT be talking about direction of the relationship, but only the strength of the relationship.

CramerV(ANES$regpartylabels, ANES$gender)

## [1] 0.1428224

Cramer’s V ranges from 0 to 1 and indicates the strength of the relationship. Based on the output of the Cramer’s V test, there is a weak relationship between gender and political party.

There is a weak but statistically significant relationship between the two variables representing gender and registered political party.

Again, Cramer’s V Test of association is for 2 Nominal Variables OR 1 Nominal and 1 Ordinal Variable.

Opinions on Abortion by Party of Registration

CrossTable(ANES$choice01, ANES$regpartylabels, chisq = TRUE)

## Warning in chisq.test(tab, correct = FALSE, ...): Chi-squared approximation may
## be incorrect

##    Cell Contents 
## |-------------------------|
## |                       N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## ==========================================================================
##                        ANES$regpartylabels
## ANES$choice01          Democrat   Independent   Other   Republican   Total
## --------------------------------------------------------------------------
## Never                        69            49       1          123     242
##                          13.087         0.576   0.957       25.186        
##                           0.285         0.202   0.004        0.508   0.117
##                           0.076         0.105   0.045        0.183        
##                           0.033         0.024   0.000        0.059        
## --------------------------------------------------------------------------
## Yes/Conditional Yes         842           419      21          550    1832
##                           1.729         0.076   0.126        3.327        
##                           0.460         0.229   0.011        0.300   0.883
##                           0.924         0.895   0.955        0.817        
##                           0.406         0.202   0.010        0.265        
## --------------------------------------------------------------------------
## Total                       911           468      22          673    2074
##                           0.439         0.226   0.011        0.324        
## ==========================================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 45.06389      d.f. = 3      p = 8.97e-10

CramerV(ANES$choice01, ANES$regpartylabels) # Yes or No to Abortion and registered political party, both are nominal.

## [1] 0.1474042

Based on the output from the chi-squared test and the measure of association, what can we say?

Null Hypothesis: There is not relationship between opinions on abortion and political party. Alternative Hypothesis: There is a relationship between opinions on abortion and political party.

Test Statistics: Chi^2 = 45.06389, d.f. = 3, p = 8.97e-10 Cramer’sV: 0.1474042

Because the p-value is below 0.05, we can reject the null hypothesis at the 95% confidence level. We can then say that there is a weak but statistically significant relationship between opinions on abortion and political party.

Opinions on Abortion by Gender

CrossTable(ANES$choice01, ANES$gender, prop.c = TRUE, chisq = TRUE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## =============================================
##                        ANES$gender
## ANES$choice01          Female    Male   Total
## ---------------------------------------------
## Never                     296     242     538
##                         0.484   0.542        
##                         0.550   0.450   0.129
##                         0.135   0.123        
##                         0.071   0.058        
## ---------------------------------------------
## Yes/Conditional Yes      1901    1719    3620
##                         0.072   0.081        
##                         0.525   0.475   0.871
##                         0.865   0.877        
##                         0.457   0.413        
## ---------------------------------------------
## Total                    2197    1961    4158
##                         0.528   0.472        
## =============================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 1.179248      d.f. = 1      p = 0.278 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 = 1.080875      d.f. = 1      p = 0.299

Because the p-value is greater than 0.05, we cannot reject the null hypothesis, therefore there is not a relationship between the two variables.

# Gender and Choice with 4 options 
CrossTable(ANES$choice, ANES$gender, chisq = TRUE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## =========================================================
##                                    ANES$gender
## ANES$choice                        Female    Male   Total
## ---------------------------------------------------------
## By law Always Personal Choice        1044     862    1906
##                                     1.353   1.516        
##                                     0.548   0.452   0.458
##                                     0.475   0.440        
##                                     0.251   0.207        
## ---------------------------------------------------------
## By law Never                          296     242     538
##                                     0.484   0.542        
##                                     0.550   0.450   0.129
##                                     0.135   0.123        
##                                     0.071   0.058        
## ---------------------------------------------------------
## Law Permits Extreme Cases             563     542    1105
##                                     0.745   0.835        
##                                     0.510   0.490   0.266
##                                     0.256   0.276        
##                                     0.135   0.130        
## ---------------------------------------------------------
## Law Permits If Need Established       294     315     609
##                                     2.399   2.687        
##                                     0.483   0.517   0.146
##                                     0.134   0.161        
##                                     0.071   0.076        
## ---------------------------------------------------------
## Total                                2197    1961    4158
##                                     0.528   0.472        
## =========================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 10.56123      d.f. = 3      p = 0.0144

Because the p-value is less than 0.05, we can reject the null hypothesis that there is no relationship between the two variables. We can then infer that there is a statistically significant relationship between one’s opinion on abortion and their gender. But how strong is the relationship?

CramerV(ANES$choice, ANES$gender)

## [1] 0.0503982

There is a weak association between one’s opinions on abortion and their gender.

Note: When abortion is coded as a Never/Conditional Yes, there is not a relationship between opinions and gender. However, if the variable keeps it’s 4 survey options describing different scenarios, there is a relationship between opinions on abortion and one’s gender. I just thought it was kind of interesting…

Gender and Health: Another quick example

CrossTable(ANES$healthy, ANES$gender, chisq = TRUE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## ======================================
##                 ANES$gender
## ANES$healthy    Female    Male   Total
## --------------------------------------
## 1                  402     334     736
##                  0.413   0.464        
##                  0.546   0.454   0.175
##                  0.180   0.168        
##                  0.095   0.079        
## --------------------------------------
## 2                  751     660    1411
##                  0.029   0.032        
##                  0.532   0.468   0.335
##                  0.337   0.333        
##                  0.178   0.157        
## --------------------------------------
## 3                  674     644    1318
##                  0.770   0.865        
##                  0.511   0.489   0.313
##                  0.303   0.325        
##                  0.160   0.153        
## --------------------------------------
## 4                  324     279     603
##                  0.079   0.089        
##                  0.537   0.463   0.143
##                  0.145   0.141        
##                  0.077   0.066        
## --------------------------------------
## 5                   77      67     144
##                  0.009   0.010        
##                  0.535   0.465   0.034
##                  0.035   0.034        
##                  0.018   0.016        
## --------------------------------------
## Total             2228    1984    4212
##                  0.529   0.471        
## ======================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 2.761416      d.f. = 4      p = 0.599

There is not a statistically significant relationship between gender and perceptions of one’s health.

Chi-Squared Tests with Ordinal Variables

If you are working with ordinal variables, use different measures of association. These can tell you the direction and strength of the relationship.

Reminder: Ordinal variables are variables that are categorized in an ordered format, so that the different categories can be ranked from smallest to largest or from less to more on a particular characteristic.

Kendall Tau-B

Indicates strength and direction of relationship. NOT for nominal variables.

Use with 2 Ordinal Variables w/ Equal Categories (2 X 2, 3 X 3 etc. ). This would work with previous examples like when we looked at Defense Spending and Service Spending where both had 7 options.

Hypothesis?

0.1903 => negative weak relationship between preferences for spending on defense and providing social services.

As respondents become more willing to spend money on defense, they prefer that the government provides fewer social services.

# V161178: Provide Fewer -> More on Services; 7 pt scale
ANES$servicespend <- recode(ANES$V161178, "-10:-1=NA; 99=NA")
table(ANES$servicespend)

## 
##   1   2   3   4   5   6   7 
## 378 445 598 908 637 366 295

ANES$servicespend <- recode(as.numeric(ANES$servicespend), "1='1. Provide Fewer'; 7='7. Provide More'")
table(ANES$servicespend)

## 
## 1. Provide Fewer                2                3                4 
##              378              445              598              908 
##                5                6  7. Provide More 
##              637              366              295

# V161181: Spend Less -> More on Defense; 7 pt scale
ANES$defensespend <- recode(ANES$V161181, "-10:-1=NA; 99=NA")
table(ANES$defensespend)

## 
##    1    2    3    4    5    6    7 
##  184  249  411 1008  787  594  450

ANES$defensespend <- recode(as.numeric(ANES$defensespend), "1='1. Spend Less'; 7='7. Spend More'")
table(ANES$defensespend)

## 
## 1. Spend Less             2             3             4             5 
##           184           249           411          1008           787 
##             6 7. Spend More 
##           594           450

CrossTable(ANES$defensespend, ANES$servicespend, chisq = TRUE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## | Chi-square contribution | 
## |           N / Row Total | 
## |           N / Col Total | 
## |         N / Table Total | 
## |-------------------------|
## 
## ===============================================================================
##           ANES$servicespend
## ANES$d    1. P F        2        3        4        5        6   7. P M    Total
## -------------------------------------------------------------------------------
## 1. S L        27       13        6       22       25       29       50      172
##            3.846    3.607   18.249    9.692    0.751    8.152   99.608         
##            0.157    0.076    0.035    0.128    0.145    0.169    0.291    0.051
##            0.075    0.030    0.011    0.027    0.043    0.087    0.191         
##            0.008    0.004    0.002    0.007    0.007    0.009    0.015         
## -------------------------------------------------------------------------------
## 2             13       20       31       46       52       46       20      228
##            5.465    2.798    1.445    1.780    4.026   23.732    0.272         
##            0.057    0.088    0.136    0.202    0.228    0.202    0.088    0.068
##            0.036    0.047    0.055    0.056    0.090    0.137    0.076         
##            0.004    0.006    0.009    0.014    0.015    0.014    0.006         
## -------------------------------------------------------------------------------
## 3             11       32       54       93       88       65       33      376
##           21.541    5.245    1.397    0.005    8.153   20.101    0.453         
##            0.029    0.085    0.144    0.247    0.234    0.173    0.088    0.112
##            0.030    0.075    0.095    0.113    0.152    0.194    0.126         
##            0.003    0.010    0.016    0.028    0.026    0.019    0.010         
## -------------------------------------------------------------------------------
## 4             67       85      151      300      178       70       63      914
##           10.122    8.420    0.064   25.458    2.542    4.943    0.978         
##            0.073    0.093    0.165    0.328    0.195    0.077    0.069    0.272
##            0.185    0.199    0.267    0.364    0.307    0.209    0.240         
##            0.020    0.025    0.045    0.089    0.053    0.021    0.019         
## -------------------------------------------------------------------------------
## 5             64      131      135      183      131       52       23      719
##            2.369   17.071    1.556    0.237    0.365    5.447   19.556         
##            0.089    0.182    0.188    0.255    0.182    0.072    0.032    0.214
##            0.177    0.307    0.239    0.222    0.226    0.155    0.088         
##            0.019    0.039    0.040    0.055    0.039    0.015    0.007         
## -------------------------------------------------------------------------------
## 6             76       93      136      110       65       41       18      539
##            5.486    8.696   22.371    3.771    8.508    3.047   13.779         
##            0.141    0.173    0.252    0.204    0.121    0.076    0.033    0.161
##            0.210    0.218    0.240    0.133    0.112    0.122    0.069         
##            0.023    0.028    0.041    0.033    0.019    0.012    0.005         
## -------------------------------------------------------------------------------
## 7. S M       104       53       53       70       41       32       55      408
##           81.774    0.023    3.633    9.090   12.352    1.870   16.822         
##            0.255    0.130    0.130    0.172    0.100    0.078    0.135    0.122
##            0.287    0.124    0.094    0.085    0.071    0.096    0.210         
##            0.031    0.016    0.016    0.021    0.012    0.010    0.016         
## -------------------------------------------------------------------------------
## Total        362      427      566      824      580      335      262     3356
##            0.108    0.127    0.169    0.246    0.173    0.100    0.078         
## ===============================================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 530.6723      d.f. = 36      p <2e-16

KendallTauB(ANES$defensespend, ANES$servicespend) # 7 rows by 7 columns

## [1] -0.1902998

There is a statistically significant, weak negative relationship between preferences for spending more on defense and providing more social services. Together this implies that the more someone is willing to spend on defense, the less they want to provide more social services.

Goodman Kruskal Gamma

Goodman Kruskal Gamma is for ordinal variables with unequal rows and columns. This would be like looking at preferences for spending on defense (7 options: Spend less -> spend more) by party id (3 options: Democrat -> Republican).

# V161158x: Strong Democrat to Strong Republican, 7 point scale. Ordinal.
ANES$partyid <- ifelse(ANES$V161158x <1,NA, ANES$V161158x)
table(ANES$partyid)

## 
##   1   2   3   4   5   6   7 
## 890 559 490 579 500 508 721

hist(ANES$partyid)

Now let’s consolidate this scale. (I wouldn’t normally do this because you lose valuable information regarding how strongly someone considers themselves to be one thing or the other, but it makes a good example.)

# V161158x: Democrat -> Republican, 3 consolidated options, Ordinal

ANES$partyid3 <- plyr::mapvalues(as.numeric(ANES$partyid), c(1,2,3,4,5,6,7), c('Democrat', 'Democrat', 'Democrat', 'Independent', 'Republican', 'Republican', 'Republican'))
table(ANES$partyid3)

## 
##    Democrat Independent  Republican 
##        1939         579        1729

CrossTable(ANES$choice, ANES$partyid3,chisq = FALSE,  prop.c = TRUE, prop.t = FALSE, prop.r = FALSE,  prop.chisq = FALSE, total.r = FALSE, total.c = FALSE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |           N / Col Total | 
## |-------------------------|
## 
## ======================================================================
##                                    ANES$partyid3
## ANES$choice                        Democrat   Independent   Republican
## ----------------------------------------------------------------------
## By law Always Personal Choice          1205           245          477
##                                       0.629         0.436        0.278
## ----------------------------------------------------------------------
## By law Never                            149            66          327
##                                       0.078         0.117        0.191
## ----------------------------------------------------------------------
## Law Permits Extreme Cases               303           157          649
##                                       0.158         0.279        0.378
## ----------------------------------------------------------------------
## Law Permits If Need Established         258            94          262
##                                       0.135         0.167        0.153
## ======================================================================

Reminder: The default is FALSE for all options in the CrossTable() command. Let’s add the chisq = TRUE to run the test.

CrossTable(ANES$choice, ANES$partyid3,chisq = TRUE,  prop.c = TRUE, prop.t = FALSE, prop.r = FALSE,  prop.chisq = FALSE, total.r = FALSE, total.c = FALSE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |           N / Col Total | 
## |-------------------------|
## 
## ======================================================================
##                                    ANES$partyid3
## ANES$choice                        Democrat   Independent   Republican
## ----------------------------------------------------------------------
## By law Always Personal Choice          1205           245          477
##                                       0.629         0.436        0.278
## ----------------------------------------------------------------------
## By law Never                            149            66          327
##                                       0.078         0.117        0.191
## ----------------------------------------------------------------------
## Law Permits Extreme Cases               303           157          649
##                                       0.158         0.279        0.378
## ----------------------------------------------------------------------
## Law Permits If Need Established         258            94          262
##                                       0.135         0.167        0.153
## ======================================================================
## 
## Statistics for All Table Factors
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 = 503.5621      d.f. = 6      p <2e-16

GoodmanKruskalGamma(ANES$choice, ANES$partyid3)

## [1] 0.3489139

There is a moderate positive association between preferences for spending on defense and political party such that as one becomes more republican, they prefer to spend more on defense. (This assumes that Independents are actually between Democrats and Republicans on a scale which could be disputed).

Lab Practice

Problem 1: Climate Change and Political Party

Is there a relationship between opinions on climate change(V161222) and political party(V161019)?

What is the Null Hypothesis? What is the alternative hypothesis? What kind of variables are these? What tests should you use? What can you say about the relationship?

Problem 2: Democracy and Millionaires

Is there a relationship between opinions on taxation of millionaires(V162140) and overall happiness with democracy in the US(V162290)?

What is the Null Hypothesis? What is the alternative hypothesis? What kind of variables are these? What tests should you use? What can you say about the relationship?

Problem 3: Conspiracy Theories

Is there a relationship between believing that the government knew about 9/11(V162254) and faith in vaccines(V162161)?

What is the Null Hypothesis? What is the alternative hypothesis? What kind of variables are these? What tests should you use? What can you say about the relationship?

Week 10: Bivariate Tests of Association - Chi-Squared

Introducing… R Markdown Files!

Chi Squared Tests of Independence

Variable Prep

V161342: Gender

V161115: Healthy - Would you say that in general your health is excellent, very good, good,fair, or poor? where 1=Excellent and 5=Poor

V161232: Opinions on Abortion, 4 options

V161232: Opinions on Abortion - Simplified to Never and Conditional Yes/Yes

V161019: Party of registration, Categorical

RECAP: Party of registration / Goodness of Fit

If the US population is 39% republicans, 40% democrats, and 20% independents, and 1% Other, does our sample represent the population?

If the US population is 1% republicans, 45% democrats, and 37% independents, and 17% Other, does our sample represent the population?

Chi-Squared Test of Independence - Adding another variable!

Chi-Squared with Nominal Variables

Okay but… what is the relationship? What can we say about it?

Measures of association!

Cramer’s V

Opinions on Abortion by Party of Registration

Opinions on Abortion by Gender

Gender and Health: Another quick example

Chi-Squared Tests with Ordinal Variables

Kendall Tau-B

Goodman Kruskal Gamma

Lab Practice

Problem 1: Climate Change and Political Party

Problem 2: Democracy and Millionaires

Problem 3: Conspiracy Theories