File > New File > select either Markdown or Notebook
If you have output: html_notebook, it allows you to preview it and refresh it before saving it as a Word or PDF document. If this interests you, search the difference between R Notebook and R Markdown files. They have subtle differences but both end up as Markdown files when saved. Knit runs the whole document and saves it. Preview will show you the output for only what you have run during the session. I kind of like the notebook over markdown file type.
If you have output: word_document then it will export it as a word document with the same name in your working directory. If you have output: pdf_document then it will export it as a pdf with the same name in your working directory. Knitting the PDF requires LaTeX to be installed, otherwise it will fail.
Otherwise you can click on the options for “Preview” and select the format you want to knit/export it as.
I personally liked using the Notebook and preview feature to frequently refresh changes and then “printed” the html preview as a PDF once I liked how it looked. This way I could also delete any output I didn’t want before saving it as a final version. That’s just me and it’s probably not the best way to do it but it did the job and kept the formatting I wanted.
You can NOT submit html files on Blackboard.
There are countless resources for using Markdown/Notebook files. One great resource: https://rmarkdown.rstudio.com/lesson-9.html
Another resource: https://bookdown.org/yihui/rmarkdown-cookbook/notebook.html
Switching between formats can make it grumpy, just a heads up. Usually saving, closing and reopening R fixes the problem.
require("knitr")
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.6.3
opts_knit$set(root.dir= ("C:/Users/aleaw/OneDrive/Desktop/PhD Fall 2020/TA_402/Week 10"))
knitr::opts_chunk$set(echo=TRUE) # echo=TRUE keeps the code in the chunks that creates output visible. Otherwise if echo=FALSE, only the output is shown without the code.
# echo=TRUE for all chunks is useful for teaching and showing your code in homework.
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.1 v purrr 0.3.4
## v tibble 3.0.1 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'haven' was built under R version 3.6.3
## Warning: package 'car' was built under R version 3.6.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.6.3
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
## Warning: package 'psych' was built under R version 3.6.3
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## Warning: package 'descr' was built under R version 3.6.3
## Warning: package 'DescTools' was built under R version 3.6.3
##
## Attaching package: 'DescTools'
## The following objects are masked from 'package:psych':
##
## AUC, ICC, SD
## The following object is masked from 'package:car':
##
## Recode
For anyone that knits, enjoy the entertaining command names in the knitr package :)
In case you don’t like this format, you can go from .Rmd (which is a markdown file) to .R (which is a script file) by using the purl() command below:
knitr::purl(input=‘Week10Lab.Rmd’, output=‘Week10Script.R’, documentation = 2)
Just in case this isn’t working for you… I will upload an .R file to blackboard too.
# knitr::purl(input='Week10Lab.Rmd', output='Week10Script.R', documentation = 2)
# delete pound sign in the above line of code if you want it to run
Chapter 2’s Appendix in the Fogarty textbook goes over Markdown too. Plus countless resources online for using Markdown or R Notebooks if that interests you.
Fogarty Chapter 10 has a really great section dedicated to both Measures of Association and Chi-Squared examples with interpretations.
This test is used to test whether two variables are related to one another or not. Crosstabulations and Chi-Squared Tests are for NOMINAL and ORDINAL data with few categories. If you use an ordinal variable, the options will be treated as categories without order.
Don’t use interval or ratio variables. Those are best for regressions.
There are a lot of nonparametric tests of association:The chi-squared test looks at the crosstables’s cell frequencies (called the observed frequencies) and then compares them to expected frequencies. The differences between the expected and observed are used in the calculation of the Chi-Square statistic (Fogarty Chapter 10).
Null Hypothesis is that there is no relationship.
Alternative hypothesis is that there is a relationship.
Remember: For the Chi-Squared Contingency Test, this test does not tell you if it is the right relationship, only if there is an association between two categorical variables. You still need common sense!
Originally coded as 1=Male, 2=Female
table(ANES$V161342)
##
## -9 1 2 3
## 41 1987 2231 11
ANES$gender <- recode(as.numeric(ANES$V161342), "1='Male';2='Female'; else=NA") #else=NA gets rid of the 3 and -9
table(ANES$gender)
##
## Female Male
## 2231 1987
table(ANES$V161115)
##
## -9 1 2 3 4 5
## 8 742 1429 1341 604 146
ANES$healthy <- recode(ANES$V161115, "-9=NA")
table(ANES$healthy)
##
## 1 2 3 4 5
## 742 1429 1341 604 146
freq(ANES$healthy)
## PRE: Self-evaluation of R health
## Frequency Percent Valid Percent
## 1 742 17.3770 17.410
## 2 1429 33.4660 33.529
## 3 1341 31.4052 31.464
## 4 604 14.1452 14.172
## 5 146 3.4192 3.426
## NA's 8 0.1874
## Total 4270 100.0000 100.000
attributes(ANES$V161232)
## $label
## [1] "PRE: STD Abortion: self-placement"
##
## $format.stata
## [1] "%93.0g"
##
## $class
## [1] "haven_labelled" "vctrs_vctr" "double"
##
## $labels
## -9. Refused
## -9
## -8. Don't know (FTF only)
## -8
## 1. By law, abortion should never be permitted.
## 1
## 2. By law, only in case of rape, incest, or woman's life in danger.
## 2
## 3. By law, for reasons other than rape, incest, or woman's life in danger if need established
## 3
## 4. By law, abortion as a matter of personal choice.
## 4
## 5. Other SPECIFY
## 5
table(ANES$V161232)
##
## -9 -8 1 2 3 4 5
## 48 9 544 1115 616 1932 6
ANES$choice <- ifelse(ANES$V161232 <1, NA, ANES$V161232)
ANES$choice <- recode(as.numeric(ANES$V161232), "1='By law Never'; 2='Law Permits Extreme Cases'; 3='Law Permits If Need Established'; 4='By law Always Personal Choice'; else=NA")
table(ANES$choice)
##
## By law Always Personal Choice By law Never
## 1932 544
## Law Permits Extreme Cases Law Permits If Need Established
## 1115 616
make a dummy variable where 0=“Never allow abortion”, and 1=options 2,3,4 which allow abortion to some extent
ANES$choice01 <- recode(as.numeric(ANES$V161232), "1=0; 2=1;3=1;4=1; else=NA")
table(ANES$choice01)
##
## 0 1
## 544 3663
ANES$choice01 <- recode(ANES$choice01, "1='Yes/Conditional Yes'; 0='Never'")
table(ANES$choice01)
##
## Never Yes/Conditional Yes
## 544 3663
table(ANES$V161019)
##
## -9 -8 -1 1 2 4 5
## 9 11 2151 924 682 471 22
ANES$regparty <- ANES$V161019
ANES$regparty <- ifelse(ANES$regparty <0, NA, ANES$regparty)
table(ANES$regparty)
##
## 1 2 4 5
## 924 682 471 22
ANES$regpartylabels <- recode(ANES$regparty, "1='Democrat'; 2='Republican'; 4='Independent'; 5='Other' ")
table(ANES$regpartylabels)
##
## Democrat Independent Other Republican
## 924 471 22 682
We did this in Week 6 for the Chi-Squared Goodness of Fit lab, but now we will go further into the details and usefulness of the command.
Reminder: Chi-Square goodness of fit is used to compare the observed sample distribution to either a known population distribution or expected probability distribution. 1 Categorical variable.
# V161019: Party of registration
freq(ANES$regpartylabels, plot=FALSE)
## ANES$regpartylabels
## Frequency Percent Valid Percent
## Democrat 924 21.6393 44.021
## Independent 471 11.0304 22.439
## Other 22 0.5152 1.048
## Republican 682 15.9719 32.492
## NA's 2171 50.8431
## Total 4270 100.0000 100.000
Null hypothesis: Sample and population are not different
freq(ANES$regpartylabels)
## ANES$regpartylabels
## Frequency Percent Valid Percent
## Democrat 924 21.6393 44.021
## Independent 471 11.0304 22.439
## Other 22 0.5152 1.048
## Republican 682 15.9719 32.492
## NA's 2171 50.8431
## Total 4270 100.0000 100.000
population<- c(.39, .40, .20, .01)
sample <- c(.32492, .44021, .22439, .0148)
chisq.test(sample, p=population)
## Warning in chisq.test(sample, p = population): Chi-squared approximation may be
## incorrect
##
## Chi-squared test for given probabilities
##
## data: sample
## X-squared = 0.020075, df = 3, p-value = 0.9992
Interpretation of test statistics: X-squared = 0.020075, df=3, p-value=.9992 ??
The p-value =0.9992, which is >.05, so we cannot reject the null hypothesis. Therefore there does not seem to be a significant difference between the sample and the population.
Then we pretended that the United States political world drastically changed and resulted in a population distribution of 1% republicans, 45% democrats, and 37% independents, and 17% Other.
Null hypothesis: Sample and population are not different
population2<- c(.01, .45, .37, .17) # from the question
sample <- c(.32492, .44021, .22439, .0148) # from our freq() table
chisq.test(sample, p=population2) # compares sample and population
## Warning in chisq.test(sample, p = population2): Chi-squared approximation may be
## incorrect
##
## Chi-squared test for given probabilities
##
## data: sample
## X-squared = 10.073, df = 3, p-value = 0.01795
Interpretation of test statistics: X-squared = 10.073, df=3, p-value=0.01795 ??
The p-value=0.0195 which is <.05, so we can reject the null hypothesis. Therefore, we can infer that our sample is significantly different than the population at the 95% confidence level.
Both of the examples above used the Chi-Square Goodness of Fit test to determine if the sample distribution was significantly different than the population distribution.
Chi-Square Goodness of Fit: There was one variable observed.
Chi-Square Test of Independence: There are two variables for each observation.
Command syntax: CrossTable(Variable 1, Variable 2, prop.c=TRUE, ….. add other things if you want) If you want to display extra stuff when you run the CrossTable() function, include it in your line of code and set it equal to TRUE. If you don’t want it to appear in your output, set it it equal to FALSE. The first variable should be the dependent variable (x) and the second variable is the independent variable (y).
Full details on the package and command options: https://cran.r-project.org/web/packages/descr/descr.pdf
Reminder: Expected frequency for each cell = (row totalXcolumn total)/overall total if you were to calculate it by hand.
partybygender <- CrossTable(ANES$regpartylabels, ANES$gender, prop.c=FALSE, prop.r=FALSE, prop.t=FALSE, prop.chisq = FALSE)
partybygender
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
## ============================================
## ANES$gender
## ANES$regpartylabels Female Male Total
## --------------------------------------------
## Democrat 555 354 909
## --------------------------------------------
## Independent 218 247 465
## --------------------------------------------
## Other 14 8 22
## --------------------------------------------
## Republican 316 360 676
## --------------------------------------------
## Total 1103 969 2072
## ============================================
Creates a crosstabulation of registered party and gender and then saves it as partybygender. This should look familiar to you. It is just like the table() function but now with two variables so it has the number of people who responded for each rows and column combination.
Now lets add some more details to the table:
partybygender2 <- CrossTable(ANES$regpartylabels, ANES$gender, prop.c=TRUE, prop.r=TRUE, prop.t=FALSE, prop.chisq = FALSE, chisq = TRUE, total.r = TRUE, total.c = TRUE, expected = TRUE)
partybygender2
## Cell Contents
## |-------------------------|
## | N |
## | Expected N |
## | N / Row Total |
## | N / Col Total |
## |-------------------------|
##
## =============================================
## ANES$gender
## ANES$regpartylabels Female Male Total
## ---------------------------------------------
## Democrat 555 354 909
## 483.9 425.1
## 0.611 0.389 0.439
## 0.503 0.365
## ---------------------------------------------
## Independent 218 247 465
## 247.5 217.5
## 0.469 0.531 0.224
## 0.198 0.255
## ---------------------------------------------
## Other 14 8 22
## 11.7 10.3
## 0.636 0.364 0.011
## 0.013 0.008
## ---------------------------------------------
## Republican 316 360 676
## 359.9 316.1
## 0.467 0.533 0.326
## 0.286 0.372
## ---------------------------------------------
## Total 1103 969 2072
## 0.532 0.468
## =============================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 42.26516 d.f. = 3 p = 3.52e-09
Lots of information now… Let’s look at it for a bit. Based on lecture, what can we say about the output?
What about the Pearson’s Chi-Squared Test statistics?
Null Hypothesis: The two variables are not related. Alternative Hypothesis: The two variables are related.
Use measures of association to determine the direction and/or strength of the relationship. Remember: The correct test of association depends on variables you are using.
We have 2 categorical variables with unequal numbers of rows and columns (gender with 2 options and registered political party with 4 options). Using the nice table in the Fogarty textbook, we can see that we should use Cramer’s V test of association.
CramerV(ANES$regpartylabels, ANES$gender)
## [1] 0.1428224
Cramer’s V ranges from 0 to 1 and indicates the strength of the relationship. Based on the output of the Cramer’s V test, there is a weak relationship between gender and political party.
Again, Cramer’s V Test of association is for 2 Nominal Variables OR 1 Nominal and 1 Ordinal Variable.
CrossTable(ANES$choice01, ANES$regpartylabels, chisq = TRUE)
## Warning in chisq.test(tab, correct = FALSE, ...): Chi-squared approximation may
## be incorrect
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ==========================================================================
## ANES$regpartylabels
## ANES$choice01 Democrat Independent Other Republican Total
## --------------------------------------------------------------------------
## Never 69 49 1 123 242
## 13.087 0.576 0.957 25.186
## 0.285 0.202 0.004 0.508 0.117
## 0.076 0.105 0.045 0.183
## 0.033 0.024 0.000 0.059
## --------------------------------------------------------------------------
## Yes/Conditional Yes 842 419 21 550 1832
## 1.729 0.076 0.126 3.327
## 0.460 0.229 0.011 0.300 0.883
## 0.924 0.895 0.955 0.817
## 0.406 0.202 0.010 0.265
## --------------------------------------------------------------------------
## Total 911 468 22 673 2074
## 0.439 0.226 0.011 0.324
## ==========================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 45.06389 d.f. = 3 p = 8.97e-10
CramerV(ANES$choice01, ANES$regpartylabels) # Yes or No to Abortion and registered political party, both are nominal.
## [1] 0.1474042
Based on the output from the chi-squared test and the measure of association, what can we say?
Null Hypothesis: There is not relationship between opinions on abortion and political party. Alternative Hypothesis: There is a relationship between opinions on abortion and political party.
Test Statistics: Chi^2 = 45.06389, d.f. = 3, p = 8.97e-10 Cramer’sV: 0.1474042
Because the p-value is below 0.05, we can reject the null hypothesis at the 95% confidence level. We can then say that there is a weak but statistically significant relationship between opinions on abortion and political party.
CrossTable(ANES$choice01, ANES$gender, prop.c = TRUE, chisq = TRUE)
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =============================================
## ANES$gender
## ANES$choice01 Female Male Total
## ---------------------------------------------
## Never 296 242 538
## 0.484 0.542
## 0.550 0.450 0.129
## 0.135 0.123
## 0.071 0.058
## ---------------------------------------------
## Yes/Conditional Yes 1901 1719 3620
## 0.072 0.081
## 0.525 0.475 0.871
## 0.865 0.877
## 0.457 0.413
## ---------------------------------------------
## Total 2197 1961 4158
## 0.528 0.472
## =============================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 1.179248 d.f. = 1 p = 0.278
##
## Pearson's Chi-squared test with Yates' continuity correction
## ------------------------------------------------------------
## Chi^2 = 1.080875 d.f. = 1 p = 0.299
Because the p-value is greater than 0.05, we cannot reject the null hypothesis, therefore there is not a relationship between the two variables.
# Gender and Choice with 4 options
CrossTable(ANES$choice, ANES$gender, chisq = TRUE)
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## =========================================================
## ANES$gender
## ANES$choice Female Male Total
## ---------------------------------------------------------
## By law Always Personal Choice 1044 862 1906
## 1.353 1.516
## 0.548 0.452 0.458
## 0.475 0.440
## 0.251 0.207
## ---------------------------------------------------------
## By law Never 296 242 538
## 0.484 0.542
## 0.550 0.450 0.129
## 0.135 0.123
## 0.071 0.058
## ---------------------------------------------------------
## Law Permits Extreme Cases 563 542 1105
## 0.745 0.835
## 0.510 0.490 0.266
## 0.256 0.276
## 0.135 0.130
## ---------------------------------------------------------
## Law Permits If Need Established 294 315 609
## 2.399 2.687
## 0.483 0.517 0.146
## 0.134 0.161
## 0.071 0.076
## ---------------------------------------------------------
## Total 2197 1961 4158
## 0.528 0.472
## =========================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 10.56123 d.f. = 3 p = 0.0144
Because the p-value is less than 0.05, we can reject the null hypothesis that there is no relationship between the two variables. We can then infer that there is a statistically significant relationship between one’s opinion on abortion and their gender. But how strong is the relationship?
CramerV(ANES$choice, ANES$gender)
## [1] 0.0503982
There is a weak association between one’s opinions on abortion and their gender.
Note: When abortion is coded as a Never/Conditional Yes, there is not a relationship between opinions and gender. However, if the variable keeps it’s 4 survey options describing different scenarios, there is a relationship between opinions on abortion and one’s gender. I just thought it was kind of interesting…
CrossTable(ANES$healthy, ANES$gender, chisq = TRUE)
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ======================================
## ANES$gender
## ANES$healthy Female Male Total
## --------------------------------------
## 1 402 334 736
## 0.413 0.464
## 0.546 0.454 0.175
## 0.180 0.168
## 0.095 0.079
## --------------------------------------
## 2 751 660 1411
## 0.029 0.032
## 0.532 0.468 0.335
## 0.337 0.333
## 0.178 0.157
## --------------------------------------
## 3 674 644 1318
## 0.770 0.865
## 0.511 0.489 0.313
## 0.303 0.325
## 0.160 0.153
## --------------------------------------
## 4 324 279 603
## 0.079 0.089
## 0.537 0.463 0.143
## 0.145 0.141
## 0.077 0.066
## --------------------------------------
## 5 77 67 144
## 0.009 0.010
## 0.535 0.465 0.034
## 0.035 0.034
## 0.018 0.016
## --------------------------------------
## Total 2228 1984 4212
## 0.529 0.471
## ======================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 2.761416 d.f. = 4 p = 0.599
There is not a statistically significant relationship between gender and perceptions of one’s health.
If you are working with ordinal variables, use different measures of association. These can tell you the direction and strength of the relationship.
Reminder: Ordinal variables are variables that are categorized in an ordered format, so that the different categories can be ranked from smallest to largest or from less to more on a particular characteristic.
Indicates strength and direction of relationship. NOT for nominal variables.
Use with 2 Ordinal Variables w/ Equal Categories (2 X 2, 3 X 3 etc. ). This would work with previous examples like when we looked at Defense Spending and Service Spending where both had 7 options.
Hypothesis?
As respondents become more willing to spend money on defense, they prefer that the government provides fewer social services.
# V161178: Provide Fewer -> More on Services; 7 pt scale
ANES$servicespend <- recode(ANES$V161178, "-10:-1=NA; 99=NA")
table(ANES$servicespend)
##
## 1 2 3 4 5 6 7
## 378 445 598 908 637 366 295
ANES$servicespend <- recode(as.numeric(ANES$servicespend), "1='1. Provide Fewer'; 7='7. Provide More'")
table(ANES$servicespend)
##
## 1. Provide Fewer 2 3 4
## 378 445 598 908
## 5 6 7. Provide More
## 637 366 295
# V161181: Spend Less -> More on Defense; 7 pt scale
ANES$defensespend <- recode(ANES$V161181, "-10:-1=NA; 99=NA")
table(ANES$defensespend)
##
## 1 2 3 4 5 6 7
## 184 249 411 1008 787 594 450
ANES$defensespend <- recode(as.numeric(ANES$defensespend), "1='1. Spend Less'; 7='7. Spend More'")
table(ANES$defensespend)
##
## 1. Spend Less 2 3 4 5
## 184 249 411 1008 787
## 6 7. Spend More
## 594 450
CrossTable(ANES$defensespend, ANES$servicespend, chisq = TRUE)
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
## ===============================================================================
## ANES$servicespend
## ANES$d 1. P F 2 3 4 5 6 7. P M Total
## -------------------------------------------------------------------------------
## 1. S L 27 13 6 22 25 29 50 172
## 3.846 3.607 18.249 9.692 0.751 8.152 99.608
## 0.157 0.076 0.035 0.128 0.145 0.169 0.291 0.051
## 0.075 0.030 0.011 0.027 0.043 0.087 0.191
## 0.008 0.004 0.002 0.007 0.007 0.009 0.015
## -------------------------------------------------------------------------------
## 2 13 20 31 46 52 46 20 228
## 5.465 2.798 1.445 1.780 4.026 23.732 0.272
## 0.057 0.088 0.136 0.202 0.228 0.202 0.088 0.068
## 0.036 0.047 0.055 0.056 0.090 0.137 0.076
## 0.004 0.006 0.009 0.014 0.015 0.014 0.006
## -------------------------------------------------------------------------------
## 3 11 32 54 93 88 65 33 376
## 21.541 5.245 1.397 0.005 8.153 20.101 0.453
## 0.029 0.085 0.144 0.247 0.234 0.173 0.088 0.112
## 0.030 0.075 0.095 0.113 0.152 0.194 0.126
## 0.003 0.010 0.016 0.028 0.026 0.019 0.010
## -------------------------------------------------------------------------------
## 4 67 85 151 300 178 70 63 914
## 10.122 8.420 0.064 25.458 2.542 4.943 0.978
## 0.073 0.093 0.165 0.328 0.195 0.077 0.069 0.272
## 0.185 0.199 0.267 0.364 0.307 0.209 0.240
## 0.020 0.025 0.045 0.089 0.053 0.021 0.019
## -------------------------------------------------------------------------------
## 5 64 131 135 183 131 52 23 719
## 2.369 17.071 1.556 0.237 0.365 5.447 19.556
## 0.089 0.182 0.188 0.255 0.182 0.072 0.032 0.214
## 0.177 0.307 0.239 0.222 0.226 0.155 0.088
## 0.019 0.039 0.040 0.055 0.039 0.015 0.007
## -------------------------------------------------------------------------------
## 6 76 93 136 110 65 41 18 539
## 5.486 8.696 22.371 3.771 8.508 3.047 13.779
## 0.141 0.173 0.252 0.204 0.121 0.076 0.033 0.161
## 0.210 0.218 0.240 0.133 0.112 0.122 0.069
## 0.023 0.028 0.041 0.033 0.019 0.012 0.005
## -------------------------------------------------------------------------------
## 7. S M 104 53 53 70 41 32 55 408
## 81.774 0.023 3.633 9.090 12.352 1.870 16.822
## 0.255 0.130 0.130 0.172 0.100 0.078 0.135 0.122
## 0.287 0.124 0.094 0.085 0.071 0.096 0.210
## 0.031 0.016 0.016 0.021 0.012 0.010 0.016
## -------------------------------------------------------------------------------
## Total 362 427 566 824 580 335 262 3356
## 0.108 0.127 0.169 0.246 0.173 0.100 0.078
## ===============================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 530.6723 d.f. = 36 p <2e-16
KendallTauB(ANES$defensespend, ANES$servicespend) # 7 rows by 7 columns
## [1] -0.1902998
There is a statistically significant, weak negative relationship between preferences for spending more on defense and providing more social services. Together this implies that the more someone is willing to spend on defense, the less they want to provide more social services.
Goodman Kruskal Gamma is for ordinal variables with unequal rows and columns. This would be like looking at preferences for spending on defense (7 options: Spend less -> spend more) by party id (3 options: Democrat -> Republican).
# V161158x: Strong Democrat to Strong Republican, 7 point scale. Ordinal.
ANES$partyid <- ifelse(ANES$V161158x <1,NA, ANES$V161158x)
table(ANES$partyid)
##
## 1 2 3 4 5 6 7
## 890 559 490 579 500 508 721
hist(ANES$partyid)
Now let’s consolidate this scale. (I wouldn’t normally do this because you lose valuable information regarding how strongly someone considers themselves to be one thing or the other, but it makes a good example.)
# V161158x: Democrat -> Republican, 3 consolidated options, Ordinal
ANES$partyid3 <- plyr::mapvalues(as.numeric(ANES$partyid), c(1,2,3,4,5,6,7), c('Democrat', 'Democrat', 'Democrat', 'Independent', 'Republican', 'Republican', 'Republican'))
table(ANES$partyid3)
##
## Democrat Independent Republican
## 1939 579 1729
CrossTable(ANES$choice, ANES$partyid3,chisq = FALSE, prop.c = TRUE, prop.t = FALSE, prop.r = FALSE, prop.chisq = FALSE, total.r = FALSE, total.c = FALSE)
## Cell Contents
## |-------------------------|
## | N |
## | N / Col Total |
## |-------------------------|
##
## ======================================================================
## ANES$partyid3
## ANES$choice Democrat Independent Republican
## ----------------------------------------------------------------------
## By law Always Personal Choice 1205 245 477
## 0.629 0.436 0.278
## ----------------------------------------------------------------------
## By law Never 149 66 327
## 0.078 0.117 0.191
## ----------------------------------------------------------------------
## Law Permits Extreme Cases 303 157 649
## 0.158 0.279 0.378
## ----------------------------------------------------------------------
## Law Permits If Need Established 258 94 262
## 0.135 0.167 0.153
## ======================================================================
Reminder: The default is FALSE for all options in the CrossTable() command. Let’s add the chisq = TRUE to run the test.
CrossTable(ANES$choice, ANES$partyid3,chisq = TRUE, prop.c = TRUE, prop.t = FALSE, prop.r = FALSE, prop.chisq = FALSE, total.r = FALSE, total.c = FALSE)
## Cell Contents
## |-------------------------|
## | N |
## | N / Col Total |
## |-------------------------|
##
## ======================================================================
## ANES$partyid3
## ANES$choice Democrat Independent Republican
## ----------------------------------------------------------------------
## By law Always Personal Choice 1205 245 477
## 0.629 0.436 0.278
## ----------------------------------------------------------------------
## By law Never 149 66 327
## 0.078 0.117 0.191
## ----------------------------------------------------------------------
## Law Permits Extreme Cases 303 157 649
## 0.158 0.279 0.378
## ----------------------------------------------------------------------
## Law Permits If Need Established 258 94 262
## 0.135 0.167 0.153
## ======================================================================
##
## Statistics for All Table Factors
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 503.5621 d.f. = 6 p <2e-16
GoodmanKruskalGamma(ANES$choice, ANES$partyid3)
## [1] 0.3489139
There is a moderate positive association between preferences for spending on defense and political party such that as one becomes more republican, they prefer to spend more on defense. (This assumes that Independents are actually between Democrats and Republicans on a scale which could be disputed).
Is there a relationship between opinions on climate change(V161222) and political party(V161019)?
What is the Null Hypothesis? What is the alternative hypothesis? What kind of variables are these? What tests should you use? What can you say about the relationship?
Is there a relationship between opinions on taxation of millionaires(V162140) and overall happiness with democracy in the US(V162290)?
What is the Null Hypothesis? What is the alternative hypothesis? What kind of variables are these? What tests should you use? What can you say about the relationship?
Is there a relationship between believing that the government knew about 9/11(V162254) and faith in vaccines(V162161)?
What is the Null Hypothesis? What is the alternative hypothesis? What kind of variables are these? What tests should you use? What can you say about the relationship?