Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.
if(!require(dplyr)){install.packages('dplyr')} #installing the package if not
library(stargazer)
library(dplyr) #loading the library
library(gtsummary)
library(magrittr)
library(ggplot2)
library(knitr)
library(gmodels)
my_survey <- read.csv("C:\\Users\\user\\Downloads\\Music Preferance Survey (Responses) (1).csv")
head(my_survey,5)
RACE What.is.your.gender.Identity.
1 White Female
2 Asian Male
3 Other Female
4 Hispanic or Latino Female
5 White Male
music.preference.
1 Pop
2 Hip Hop, K-pop, Pop, R&B/Soul, c and k rnb
3 Pop, Rave/ Electropop
4 Hip Hop, Pop, R&B/Soul, Some edm
5 Country, Rap
Analyze your data with the goal of rejecting or failing to reject the null hypothesis. Answer the following questions (short answers are best now – you will write a narrative for the final paper):
From this survey, it is appropriate to perform a chi-square test of independence on the given survey data containing three categorical variables, race, gender, and music preference. A chi-square test of independence is used to test whether two categorical variables are related to each other. If two variables are independent (unrelated), the probability of belonging to a certain group of one variable isn’t affected by the other variable. Therefore, the chi-square test of independence can be used to test whether there is a relationship between the three categorical variables in the survey data. Besides, Chi-Square Test of Independence in R can be used to determine whether or not there is a significant association between two categorical variables.
A contingency table, also known as a cross-tabulation table, is a table that displays the distribution of two or more categorical variables. The table presents the frequency or count of each combination of the variables in rows and columns.
Each row in the table represents a level of one categorical variable, while each column represents a level of the other categorical variable. The cells in the table contain the number of observations that fall into each combination of the two variables. The table can also display percentages or proportions instead of counts, depending on the purpose of the analysis.
Contingency tables are commonly used in statistical analysis to examine the relationship between two or more categorical variables, such as gender and occupation or region of residence and political party affiliation. They can help identify patterns and associations between variables, and can be used to test hypotheses about the relationship between them. Chi-square tests and Fisher’s exact test are commonly used statistical tests for analyzing contingency tables.
attach(my_survey)
Dealing with outliers and missing values is important in statistical analysis because they can affect the accuracy and validity of the results of the analysis. Outliers are data points that are significantly different from other data points in the dataset, while missing values are data points that are not available for certain variables or participants.
It is important to deal with missing values in a way that minimizes bias and preserves the integrity of the data, as per. There are several ways to deal with missing values, including accepting, removing, or recreating the missing data. Accepting missing data involves leaving the missing cells blank or recoding all missing values with labels of “N/A” to make them consistent throughout the dataset. This is the most conservative option. Removing missing data involves deleting all cases with missing data from the analysis, which can be done using listwise or pairwise deletion. However, this option can reduce the sample size and affect the statistical power of the analysis. Recreating missing data involves imputing the missing values with plausible estimates based on the observed data, which can be done using various methods such as mean imputation, hot deck imputation, or regression imputation. However, imputation methods can introduce bias and affect the variability of the data, so it is important to choose the appropriate method based on the type and amount of missing data.
In summary, dealing with outliers and missing values is important in statistical analysis to ensure the accuracy and validity of the results. Outliers can affect the central tendency and variability of the data, while missing values can bias the results and make them less generalizable. Various methods can be used to deal with outliers and missing values, depending on the type and amount of data missing, and it is important to choose the appropriate method that minimizes bias and preserves the integrity of the data.
cat("Missing by count\n")
Missing by count
sapply(my_survey[,1:3], function(x) round(sum(is.na(x)),2))
RACE What.is.your.gender.Identity.
0 0
music.preference.
0
cat("Missing by percentage\n")
Missing by percentage
sapply(my_survey[,1:3], function(x) round(sum(is.na(x))/nrow(my_survey),2))
RACE What.is.your.gender.Identity.
0 0
music.preference.
0
Since there is no missing observations, we can proceed with our test.
dat <- my_survey[,c(2,3)] %>%
tbl_summary(by = music.preference.) %>%
add_p() %>%
add_overall() %>%
bold_labels()
dat
| Characteristic | Overall, N = 761 | Afrobeats, N = 21 | **Arabic **, N = 11 | City Pop, N = 11 | Classical, N = 11 | **Classical, Country, **, N = 11 | Classical, Pop, N = 31 | Country, N = 11 | Country, Pop, Indie, N = 11 | Country, Pop, R&B/Soul, N = 11 | Country, Rap, N = 11 | Hip Hop, N = 21 | Hip Hop, Classical, Pop, R&B/Soul, N = 11 | Hip Hop, Country, Pop, N = 11 | **Hip Hop, Country, Pop, R&B/Soul, Reggaeton **, N = 11 | Hip Hop, Drill Music, N = 11 | Hip Hop, K-pop, N = 21 | Hip Hop, K-pop, Classical, Pop, N = 11 | Hip Hop, K-pop, Classical, Pop, R&B/Soul, N = 11 | Hip Hop, K-pop, Pop, R&B/Soul, N = 11 | Hip Hop, K-pop, Pop, R&B/Soul, c and k rnb, N = 11 | Hip Hop, K-pop, Pop, R&B/Soul, k and c r&b, N = 11 | Hip Hop, K-pop, R&B/Soul, N = 11 | Hip Hop, Pop, N = 21 | Hip Hop, Pop, Edm, N = 11 | Hip Hop, Pop, Latin music, N = 11 | Hip Hop, Pop, R&B/Soul, N = 11 | Hip Hop, Pop, R&B/Soul, Some edm, N = 11 | Hip Hop, R&B/Soul, N = 21 | Hip Hop, Rock Music, Classical, Country, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, Classical, Pop, N = 11 | **Hip Hop, Rock Music, Classical, Pop, R&B/Soul, Indie **, N = 11 | Hip Hop, Rock Music, Country, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, Folk rock, indie, N = 11 | Hip Hop, Rock Music, K-pop, Classical, Country, Pop, R&B/Soul, Honestly everything but EDM Punk metal fusion jazz grime trap everything, N = 11 | Hip Hop, Rock Music, K-pop, Country, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, K-pop, Country, Pop, R&B/Soul, Jazz, N = 11 | Hip Hop, Rock Music, K-pop, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, Pop, N = 21 | Hip Hop, Rock Music, Pop, R&B/Soul, N = 41 | Hip Hop, Rock Music, R&B/Soul, N = 21 | K-pop, Pop, N = 11 | K-pop, Pop, R&B/Soul, N = 11 | Pop, N = 31 | Pop, R&B/Soul, N = 11 | Pop, R&B/Soul, Indie, Alternative, Reggaaeton, N = 11 | Pop, R&B/Soul, Indie/Alternative/Latin, N = 11 | **Pop, R&B/Soul, Spanish / Bollywood / sometimes Arabic **, N = 11 | **Pop, Rave/ Electropop **, N = 11 | R&B/Soul, N = 11 | Rock Music, N = 21 | Rock Music, Classical, Pop, N = 21 | Rock Music, Country, N = 11 | Rock Music, K-pop, R&B/Soul, N = 11 | Rock Music, Metal, N = 11 | Rock Music, Metal, indie, N = 11 | Rock Music, Metal, punk, N = 11 | Rock Music, Pop, N = 21 | Rock Music, Pop, dance, edm, N = 11 | Rock Music, R&B/Soul, bossa nova, bachata classics, and habibi funk, N = 11 | p-value |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| What.is.your.gender.Identity. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Female | 31 (41%) | 2 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (33%) | 0 (0%) | 1 (100%) | 1 (100%) | 0 (0%) | 1 (50%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (50%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (100%) | 1 (50%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 2 (67%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 1 (50%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (50%) | 1 (100%) | 1 (100%) | |
| Male | 44 (58%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 2 (67%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (50%) | 1 (100%) | 1 (100%) | 0 (0%) | 1 (100%) | 1 (50%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 2 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 1 (100%) | 1 (100%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (50%) | 4 (100%) | 2 (100%) | 0 (0%) | 0 (0%) | 1 (33%) | 0 (0%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (50%) | 2 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (50%) | 0 (0%) | 0 (0%) | |
| Non-Binary | 1 (1.3%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| 1 n (%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
dat2 <- my_survey[,c(1,3)] %>%
tbl_summary(by = music.preference.) %>%
add_p() %>%
add_overall() %>%
bold_labels()
dat2
| Characteristic | Overall, N = 761 | Afrobeats, N = 21 | **Arabic **, N = 11 | City Pop, N = 11 | Classical, N = 11 | **Classical, Country, **, N = 11 | Classical, Pop, N = 31 | Country, N = 11 | Country, Pop, Indie, N = 11 | Country, Pop, R&B/Soul, N = 11 | Country, Rap, N = 11 | Hip Hop, N = 21 | Hip Hop, Classical, Pop, R&B/Soul, N = 11 | Hip Hop, Country, Pop, N = 11 | **Hip Hop, Country, Pop, R&B/Soul, Reggaeton **, N = 11 | Hip Hop, Drill Music, N = 11 | Hip Hop, K-pop, N = 21 | Hip Hop, K-pop, Classical, Pop, N = 11 | Hip Hop, K-pop, Classical, Pop, R&B/Soul, N = 11 | Hip Hop, K-pop, Pop, R&B/Soul, N = 11 | Hip Hop, K-pop, Pop, R&B/Soul, c and k rnb, N = 11 | Hip Hop, K-pop, Pop, R&B/Soul, k and c r&b, N = 11 | Hip Hop, K-pop, R&B/Soul, N = 11 | Hip Hop, Pop, N = 21 | Hip Hop, Pop, Edm, N = 11 | Hip Hop, Pop, Latin music, N = 11 | Hip Hop, Pop, R&B/Soul, N = 11 | Hip Hop, Pop, R&B/Soul, Some edm, N = 11 | Hip Hop, R&B/Soul, N = 21 | Hip Hop, Rock Music, Classical, Country, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, Classical, Pop, N = 11 | **Hip Hop, Rock Music, Classical, Pop, R&B/Soul, Indie **, N = 11 | Hip Hop, Rock Music, Country, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, Folk rock, indie, N = 11 | Hip Hop, Rock Music, K-pop, Classical, Country, Pop, R&B/Soul, Honestly everything but EDM Punk metal fusion jazz grime trap everything, N = 11 | Hip Hop, Rock Music, K-pop, Country, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, K-pop, Country, Pop, R&B/Soul, Jazz, N = 11 | Hip Hop, Rock Music, K-pop, Pop, R&B/Soul, N = 11 | Hip Hop, Rock Music, Pop, N = 21 | Hip Hop, Rock Music, Pop, R&B/Soul, N = 41 | Hip Hop, Rock Music, R&B/Soul, N = 21 | K-pop, Pop, N = 11 | K-pop, Pop, R&B/Soul, N = 11 | Pop, N = 31 | Pop, R&B/Soul, N = 11 | Pop, R&B/Soul, Indie, Alternative, Reggaaeton, N = 11 | Pop, R&B/Soul, Indie/Alternative/Latin, N = 11 | **Pop, R&B/Soul, Spanish / Bollywood / sometimes Arabic **, N = 11 | **Pop, Rave/ Electropop **, N = 11 | R&B/Soul, N = 11 | Rock Music, N = 21 | Rock Music, Classical, Pop, N = 21 | Rock Music, Country, N = 11 | Rock Music, K-pop, R&B/Soul, N = 11 | Rock Music, Metal, N = 11 | Rock Music, Metal, indie, N = 11 | Rock Music, Metal, punk, N = 11 | Rock Music, Pop, N = 21 | Rock Music, Pop, dance, edm, N = 11 | Rock Music, R&B/Soul, bossa nova, bachata classics, and habibi funk, N = 11 | p-value |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RACE | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Asian | 24 (32%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (100%) | 1 (33%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (50%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 2 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (100%) | 1 (25%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (50%) | 1 (50%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| Black or African American | 9 (12%) | 2 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (33%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (50%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | |
| Hispanic or Latino | 9 (12%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (33%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| Other | 3 (3.9%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (25%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | |
| White | 31 (41%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (33%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (100%) | 1 (100%) | 1 (100%) | 0 (0%) | 1 (100%) | 0 (0%) | 1 (100%) | 0 (0%) | 2 (50%) | 2 (100%) | 0 (0%) | 1 (100%) | 2 (67%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (50%) | 1 (50%) | 1 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 1 (100%) | 2 (100%) | 1 (100%) | 0 (0%) | |
| 1 n (%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The alternative hypothesis, on the other hand, is that there is a statistically significant association between the two variables, meaning that the distribution of one variable is dependent on the other variable.
Note that the alternative hypothesis does not specify the direction of the association (i.e., whether certain races are more or less likely to prefer certain types of music). It simply states that there is some type of association present. The direction of the association can be explored further through post-hoc analyses or additional tests.
### Create a two-way table
my_table<-table(RACE, music.preference.)
### Run the chi-square test
result <- chisq.test(my_table)
result
Pearson's Chi-squared test
data: my_table
X-squared = 252.11, df = 232, p-value = 0.174
The Pearson’s Chi-squared test is a statistical test that is used to determine if there is a significant association between two categorical variables. It works by comparing the observed frequencies in a contingency table to the expected frequencies under the null hypothesis of independence between the two variables.
The output you provided shows the results of a Pearson’s Chi-squared test conducted on a contingency table called my_table. The test statistic is X-squared = 252.11, which measures the discrepancy between the observed frequencies and the expected frequencies under the null hypothesis.
The degrees of freedom (df) is 232, which is calculated as (number of rows - 1) * (number of columns - 1). The p-value is 0.174, which is the probability of obtaining a test statistic as extreme or more extreme than the observed value under the null hypothesis.
Since the p-value is greater than the commonly used significance level of 0.05, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest a significant association between the two categorical variables.
The myth behind race and music preference is the idea that people’s musical tastes are determined by their race or ethnicity. This myth suggests that certain genres of music are inherently associated with certain racial or ethnic groups, and that people of different races or ethnicity are less likely to enjoy or appreciate music that is traditionally associated with another group.
However, there is no evidence to support the idea that race or ethnicity has any intrinsic relationship with musical taste. Research has shown that people’s musical preferences are shaped by a complex array of factors, including personal experiences, cultural background, family influences, and individual differences in personality and cognitive style.
While it is true that certain genres of music may have stronger associations with certain cultural or ethnic groups, these associations are largely a result of historical and social factors, rather than any inherent relationship between race and music preference. In reality, people of all races and ethnicity are capable of enjoying a wide variety of musical genres and styles, and musical tastes can vary widely even within the same cultural or ethnic group.
Effect size is an important concept in statistical analysis, as it helps to quantify the magnitude or strength of a relationship between two variables. In the context of a chi-square test of independence, effect size can help to assess the practical significance of the observed association between the two categorical variables, beyond simply whether or not the association is statistically significant.
One common effect size measure for a chi-square test of independence is Cramer’s V and Cohen’s w, which are measures of the strength of association between two nominal variables. Cramer’s V is calculated as the square root of the chi-square statistic divided by the product of the sample size and the minimum of the number of rows and columns in the contingency table.
Cramer’s V ranges from 0 to 1, with higher values indicating a stronger association between the two variables. Typically, values of 0.1, 0.3, and 0.5 are considered small, medium, and large effect sizes, respectively.
In addition to helping to quantify the strength of association between two categorical variables, effect size can also be useful for comparing the results of different studies, as it allows researchers to compare the magnitude of the observed associations across studies, even when the sample sizes and statistical significance levels differ.
Overall, effect size is an important consideration in the interpretation of the results of a chi-square test of independence, as it provides additional information beyond simply whether or not the observed association is statistically significant.
# calculate Cohen's w
n <- sum(my_table)
k <- min(dim(my_table)) - 1
X2 <- chisq.test(my_table)$statistic
w <- sqrt(X2 / (n * k))
w
X-squared
0.9106648
Cohen’s w is a measure of association between categorical variables that ranges from 0 to 1, with higher values indicating a stronger association.
Cohen’s w is a measure of association between two nominal variables in a contingency table, and it ranges from 0 to 1. A value of 0 indicates no association between the variables, while a value of 1 indicates a perfect association.
In this case, Cohen’s w is 0.9106648, which is a very high value. This implies that there is a strong association between the two variables in the contingency table, and that the association is almost perfect. In general, a high value of Cohen’s w suggests that the two variables are closely related and that changes in one variable are likely to be associated with changes in the other variable.
It’s important to note that while a high value of Cohen’s w indicates a strong association between two variables, it does not imply causality. Further research is needed to establish causality and to explore the underlying mechanisms that may be driving the association.
The power of a statistical test is the probability of correctly rejecting the null hypothesis when it is false. In other words, it is the probability of detecting a true effect or difference between groups when one actually exists.
The power of a test is affected by several factors, including the sample size, the effect size, the level of significance, and the variability of the data. The power of a test increases as the sample size increases, as the effect size increases, and as the level of significance decreases. Conversely, the power of a test decreases as the variability of the data increases.
The power of a test is an important consideration when designing a study or experiment, as it determines the ability of the study to detect an effect if one exists. A higher power means that the study is more likely to detect a true effect, while a lower power means that the study is less likely to detect an effect even if one exists.
The power of a test can also be used to determine the sample size needed for a study to achieve a desired level of power. This can be helpful in designing studies that are both efficient and effective, as it ensures that the study is adequately powered to detect an effect if one exists.
library(pwr)
# set the parameters
n <- 76 # sample size
alpha <- 0.05 # significance level
effect_size <- 0.9106648 # Cohen's w
# calculate power
pwr.chisq.test(w = effect_size, N = n, df = 232, sig.level = alpha, power = NULL)
Chi squared power calculation
w = 0.9106648
N = 76
df = 232
sig.level = 0.05
power = 0.8391083
NOTE: N is the number of observations
The results above indicates that a chi-squared test of independence with a sample size of 76, a Cohens’s V effect size of 0.9106648, and a significance level of 0.05 would have a power of 0.8391083 (83.91%) to detect a significant association between the two variables.
my_table2 <-table(What.is.your.gender.Identity.,music.preference.)
results <-chisq.test(my_table2)
results
Pearson's Chi-squared test
data: my_table2
X-squared = 135.98, df = 116, p-value = 0.09914
The Pearson’s chi-squared test was conducted to assess the association between two categorical variables, gender and music preference. The test yielded a chi-squared statistic of 135.98 with 116 degrees of freedom, resulting in a p-value of 0.09914.
Based on the p-value, we fail to reject the null hypothesis that there is no association between gender and music preference at the 0.05 significance level. This means that the observed association between the two variables is not statistically significant.
However, it’s important to note that the p-value is close to the significance level of 0.05, indicating that there may be some evidence of an association, but the sample size may not be large enough to detect it. Therefore, it may be useful to investigate other measures such as effect size (e.g., Cramer’s V) to determine the practical significance of the observed association.
Overall, based on the results of the Pearson’s chi-squared test, we can conclude that there is no statistically significant association between gender and music preference. However, it is important to consider the limitations of the test and the potential for other analyses to provide additional insight into the relationship between these variables.
# calculate Cohen's w
n <- sum(my_table2)
k <- min(dim(my_table2)) - 1
X2 <- chisq.test(my_table2)$statistic
w <- sqrt(X2 / (n * k))
w
X-squared
0.945839
A Cohen’s w of 0.945839 indicates that there is a very strong relationship between the two categorical variables being examined. This suggests that the two variables are highly associated with each other and that there is a large practical significance to the observed relationship beyond just statistical significance.
# set the parameters
n <- 76 # sample size
alpha <- 0.05 # significance level
effect_size <- 0.945839 # Cohen's w
# calculate power
pwr.chisq.test(w = effect_size, N = n, df = 232, sig.level = alpha, power = NULL)
Chi squared power calculation
w = 0.945839
N = 76
df = 232
sig.level = 0.05
power = 0.8793632
NOTE: N is the number of observations
Based on the results above, the power of the chi-squared test of independence is calculated to be 0.8793632, indicating a high level of power. This means that the test has a high probability of correctly rejecting the null hypothesis when it is false, and is able to detect a significant relationship between the two categorical variables at the specified significance level of 0.05.
Additionally, the effect size measure of Cohen’s w is quite large at 0.945839, which suggests a strong relationship between the two variables.
Overall, these results indicate that the chi-squared test of independence is well-powered to detect a significant relationship between the two variables and that the observed relationship is practically significant.
library(ggplot2)
ggplot(my_survey, aes(x = What.is.your.gender.Identity.))+
geom_bar(aes(fill = RACE),
position = position_stack(reverse = TRUE)) +
geom_text(aes(label = after_stat(count)), stat='count'
, color="green", size =3, nudge_y= 8, nudge_x=0,size=9)+
theme_minimal()
library(ggplot2)
ggplot(my_survey, aes(x = music.preference.))+
geom_bar(aes(fill = What.is.your.gender.Identity.),
position = position_stack(reverse = FALSE))+
geom_text(aes(label = after_stat(count)), stat='count'
, color="green", size =3, nudge_y= 8, nudge_x=0,size=9)+
theme_minimal()+
theme(axis.text.x = element_text(angle = 0))
The distribution of gender across various music genres is more or less uniform. There is no gender that is widely distributed in certain music genre than another. The distribution is quite the same. This support the chi-square test performed above on the independence of gender and music preference. In other words, the distribution of gender from this survey is independent of music genre.
library(ggplot2)
ggplot(my_survey, aes(x = music.preference.))+
geom_bar(aes(fill = RACE),
position = position_stack(reverse = FALSE))+
geom_text(aes(label = after_stat(count)), stat='count'
, color="green", size =3, nudge_y= 8, nudge_x=0,size=9)+
theme_minimal()+
theme(axis.text.x = element_text(angle = 0))
Similarly, the distribution of race across various music genres is more or less uniform. There is no race that is widely distributed in certain music genre than another. The distribution is quite the same for all race across all music genres. This support the chi-square test performed above on the independence of race and music preference. In other words, the distribution of race from this survey is independent of music genre.