The populations of interest are those of varying age within provinces of China and Mexico.
The researchers obtained a sample population of the population. This is determined because the researcher’s stated that they selected representative provinces. Additionally a SRS is by definition a subgroup of the population in which each subset of members has equal likelihood of being chosen and is intended to represent a population in an unbiased manner. A basic random sample is intended to represent a group in an unbiased manner. Here, the team selected provinces within China and Mexico, where subsets of the population are picked randomly and theoretically represent the larger group.
A potential bias that could be present is a sampling bias. This would occur if the sample does not correctly represent the population it was supposed to.
From this quick analysis, this would imply that China would have more
political efficacy than Mexico. This seems unexpected as Mexican
nationals are allowed to vote and have a say in removing the political
party. In a complete contrast Chinese nationals do not have access to
fair elections. You would expect the numbers to be reversed.
Based on these distributions, Mexico has a positively skewed distribution (1.51) compared to a more normal distribution of China with (.30). This suggests that the China sample better represents it population than Mexico, which aligns with my expectations that the data isn’t exactly representative.
china = data %>% filter(china == 1)
mexico = data %>% filter(china == 0)
data$china = as.factor(data$china)
q2 = meanDiff(data$self~data$china)
q3 = data.frame(country = c("Mexico", "China"),
`Mean Self Response` = c("1.825301", "2.621908"))
colnames(q3) = c("Country", "Mean Self Response")
q3
coul = brewer.pal(5, "Set2")
barplot(prop.table(table(china$self)),
names = c("None", " A little", "Some", "A lot", "Unlimited"),
xlab = "Self-Reported Political Efficacy",
ylab = "Proportion of Respondents", main = "Self-Assessment Responses for Chinese Citizens", col=coul )
barplot(prop.table(table(mexico$self)),
names = c("None", " A little", "Some", "A lot", "Unlimited"),
xlab = "Self-Reported Political Efficacy",
ylab = "Proportion of Respondents", main = "Self-Assessment Responses for CMexican Citizens", col=coul )
skew.c = skewness(china$self)
skew.m = skewness(mexico$self)
Based on the correlation of China (.186, with a p-value of .00167) this suggests a weak positive correlation between age and self. However in Mexico (-0.0274, with a p-value of .542) this suggests no correlation and that this is not representative of a normal distribution.
hist(china$age, freq = FALSE, ylim = c(0, 0.04), xlab = "Age", main = "Age Distribution: China", col = "#12b8c7")
abline(v = mean(china$age), # Add line for mean
col = "blue",
lwd = 3)
text(x = mean(china$age) * 1.7, # Add text for mean
y = mean(china$age) * 1.7,
paste("Mean =", mean(china$age)),
col = "blue",
cex = 2)
hist(mexico$age, freq = FALSE, ylim = c(0, 0.04), xlab = "Age", main = "Age Distribution: Mexico", col = "#8ff2ba")
abline(v = mean(mexico$age), # Add line for mean
col = "blue",
lwd = 3)
text(x = mean(mexico$age) * 1.7, # Add text for mean
y = mean(mexico$age) * 1.7,
paste("Mean =", mean(mexico$age)),
col = "blue",
cex = 2)
## China
cor.c = cor.test(china$self, china$age, method = "pearson")
## Mexico
cor.m = cor.test(mexico$age, mexico$self, method = "pearson")
The data shows that nearly ~56 percent of Chinese Respondents have a lower self-assessment than Moses, who comes from a place lacking clean drinking water, a lack of voting, and a lack of representation in his government. Whereas in comparison only ~29 percent of Mexicon respondents reported having a lower efficacy in comparison. Showing that in retrospect that the likert scale carries different weight per response.
q4 = data.frame(country = c("China", "Mexico"),
col2= c(mean(china$self < china$moses),mean(mexico$self < mexico$moses)))
colnames(q4) = c("Country", "Proportion of Respondents With Lower Self-Assessment Than Moses Assessment")
q4
The data shows that anchoring vignettes in survey research is incredibly
important because what is allegedly ranked as the same answer weighs and
is compared different amonst populations. Now with this data there is a
more accurate representation of the rankings.
china.2 = subset(china, alison>= jane & jane >= moses)
mexico.2 = subset(mexico, alison>= jane & jane >= moses)
china.2$self = 1 +
(china.2$self >= china.2$alison) +
(china.2$self >= china.2$jane) +
(china.2$self >= china.2$moses)
mexico.2$self = 1 +
(mexico.2$self >= mexico.2$alison) +
(mexico.2$self >= mexico.2$jane) +
(mexico.2$self >= mexico.2$moses)
mexico.2$country = "Mexico"
china.2$country = "China"
data2 = rbind(mexico.2, china.2)
data2$self_rank = ifelse(data2$self > data2$alison, "4",
ifelse(data2$self <data2$alison & data2$self >= data2$jane, "3",
ifelse(data2$self < data2$jane & data2$self>= data2$moses, "2", "1")))
data2$self_rank = as.factor(data2$self_rank)
data2$country = as.factor(data2$country)
ggplot(data2, aes(x = self_rank, group = country)) + geom_bar(aes(y = ..prop.., fill = factor(country)), stat="count", position=position_dodge()) +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="Country") + scale_fill_manual(values=c('#540b06','#d9790b')) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Self Rank Against Vignettes",
subtitle = "Mexico + China Residents") +
theme(axis.text.x = element_text(vjust = -.2), plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))