Introduction:

Whether should the government invest lots of money to space exploration program? On one hand, space exploration may bring significant findings which can fundamentally change our way of life in the long run. Think about living in another planet or finding other intelligent beings in the universe. Isn’t it exciting? On the other hand, decision-makings on investment to space exploration are often based on scientific theory, that is, lack of practical evidence in most circumstances. Someday, we may find that it is just a beautiful dream about extraterrestrial intelligence and life on another planet.

It seems people’s faith in science may associate with their attitude of investment on space exploration program. In the current study, we will investigate whether people having low confidence in scientific community present more negative attitude to space exploration program.

Data:

General Social Survey (GSS) was used to collect data on demographic characteristics and attitudes of residents of the United States. The data is a cumulative data file for surveys conducted between 1972-2012 and not all respondents answered all questions in all years. GSS questions cover a diverse range of issues including national spending priorities, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and sexual behavior. Cases (units of observation) are individual persons (i.e. respondents). More information about GSS Data can be found at Reference.

Two categorical variables are selected from the General Social Survey (GSS) to investigate the research question “Do people having low confidence in scientific community present more negative attitude to space exploration program?” The first variable is natspac. It provides respondents’ attitude to the spending on space exploration program. The second variable is consci. It provides respondents’ confidence in scientific community.

The data can not be used to establish causal links between variables of interest. This is an observational study for that the data were collected in a way that did not directly interfere with how the data arise. Therefore we can only establish an association. If we want to establish causal links, we need to randomly assign participants to treatments. To arrive at my conclusion, I’ll compare the proportion of negative attitude in people with low confidence in science community with the proportion of negative attitude in people with high confidence in science community.

The population of interest is residents of the United States. As the data is a radom sampling from all US residents in general, the findings from this analysis can be generalized to all US residents. However, there might be still several potential sources of bias: (1) the data is a cumulative data file for surveys conducted between 1972-2012 and not all respondents answered all questions in all years; (2) the sample may consist of people who volunteer to respond because they have strong opinions on some issues; (3) the sample may consist of people who are easily accessible.

Exploratory data analysis:

Although the data was collected between 1972-2012, I only use data corresponding to the two categorical variables in the year 2012 to keep up with the times.

data <- gss[gss$year == '2012',c("natspac","consci")]
head(data)
##           natspac       consci
## 55088        <NA>         <NA>
## 55089        <NA> A Great Deal
## 55090 About Right A Great Deal
## 55091        <NA>         <NA>
## 55092        <NA> A Great Deal
## 55093 About Right         <NA>

From the first few view of the data, we can see that there are lots of missing values. As missing values can affect the analysis results, I only include rows without NAs.

g <- complete.cases(data)
data <- data[g,]
head(data)
##           natspac       consci
## 55090 About Right A Great Deal
## 55094  Too Little    Only Some
## 55097 About Right A Great Deal
## 55103 About Right    Only Some
## 55105    Too Much    Only Some
## 55108  Too Little   Hardly Any

Now, let’s get more details from the data.

dim(data)
## [1] 585   2
table(data$consci, data$natspac)
##               
##                Too Little About Right Too Much
##   A Great Deal         71         123       52
##   Only Some            54         129      113
##   Hardly Any            7          14       22

We can see that the sample size is 585. Attitude to space exploration program contains three levels: spending too much money on the program (negative), spending about right (neutral), and spending too little (positive). Confidence in scientific community contains three levels: people has hardly any confidence (low), people has only some confidence (neutral), and people has a great deal of confidence (high). Given our research questoin, I’ll reorganize the data by merging “about right” and “too little” into “Positive Attitude” as well as merging “only some” and “hardly any” into “Low Confidence”.

data2 <- data.frame(natspac = as.character(data$natspac), consci = as.character(data$consci))
finalData <- data.frame(natspac = rep(NA,nrow(data2)), consci = rep(NA,nrow(data2)))
finalData$natspac[which(data2$natspac == "Too Much")] <- "Negative Attitude"
finalData$natspac[which(data2$natspac != "Too Much")] <- "Positive Attitude"
finalData$consci[which(data2$consci == "A Great Deal")] <- "High Confidence"
finalData$consci[which(data2$consci != "A Great Deal")] <- "Low Confidence"
table(finalData)
##                    consci
## natspac             High Confidence Low Confidence
##   Negative Attitude              52            135
##   Positive Attitude             194            204
par(mar = c(2,2,2,2))
mosaicplot(table(finalData$consci, finalData$natspac))

After the reorganization, both categorical variables have two levels now. From above table and figure, we can see that (1) there are 246 people having high confidence in scientific community, and 339 people having low confidence in scientific community; (2) there are 187 people having negative attitude to spending on space exploration program, and 398 people having positive attitude to spending on space exploration program; (3) the exploratory data analysis suggest that there may be an association between confidence in scientific community and attitue to spending on space exploration program.

Inference:

To answer the research question, we need to do hypothesis testing.

Check Conditions

# total
total <- nrow(finalData)
# SE for confidence interval
se_ci <- sqrt(0.40*0.60/339 + 0.21*0.79/246)
# pooled proportion
pool <- (52+135)/total
# high confidence
hc1 <- (52+194)*pool
hc2 <- (52+194)*(1-pool)
# low confidence
lc1 <- (135+204)*pool
lc2 <- (135+204)*(1-pool)
# SE for hypothesis testing
se <- sqrt(pool*(1-pool)/246 + pool*(1-pool)/339) 
c(hc1, hc2, lc1, lc2, se_ci, se)
## [1]  78.63589744 167.36410256 108.36410256 230.63589744   0.03718003
## [6]   0.03905863

All conditions are met. We can assume that the sampling distribution of the difference between two proportions is nearly normal.

Methods

To perform inference, we can use either confidence interval or hypothesis testing. Results from the two methods should be consistent with each other. As conditions are met, we can assume the sampling distribution is nearly normal. Therefore we can use normal distribution to calculate confidence interval and do hypothesis testing.

95% Confidence interval To calculate confidence interval, we need to calculate point estimation of proportion of negative attitude in low confidence and high confidence group. Moreover, we need to calculate the standard error using sample data. We already know the key value of 95% CI is 1.96.

# point estimate
estimate <- 135/(135+204) - 52/(52+194)
estimate + c(-1,1)*se_ci
## [1] 0.1496679 0.2240280

The 95% confidence interval is 0.15 ~ 0.22, not including 0. We can infer that 95% of the times, the difference between the proportion of negative attitude in low confidence group and those in high confidence group from a random sample with size of 585 will be in the range between 0.15 and 0.22.

Hypothesis testing: Calculate test statistic and p-value

From previous sections, we can see that \(p_{low}\) - \(p_{high}\) ~ N(mean = 0, SE = 0.039). Let’s calculate test statistic and p-value

# test statistic
Z <- (estimate - 0)/se
# p-value
pvalue <- 2*pnorm(Z, lower.tail = FALSE)
pvalue
## [1] 1.720272e-06

As p-value is less than 0.05, we can reject \(H_{0}\) for that it means the probability of observing a difference between the two proportion from a random sample with size of 585 equal to 0.19 and more extreme or equal to -0.19 and more extreme is less than 5%. We can infer that the proportion of negative attitude in low confidence group is not equal to those in high confidence group.

Results from the two methods are consistent with each other.

Conclusion:

From the current study, we can conclude that confidence in scientific community is associated with US residents’ attitude to the spending on space exploration program. More specifically, people with low confidence in scientific community express more negative attitue.

From the research question, I learned that peoples’ belief (e.g. whether believe in science) are associated with their behavior (e.g. support or oppose scientific exploration). Actually, according to psychology studies, there are evidence supporting that difference attitudes comes from difference belief and attitude is an external expression of belief. Unfortunately, the current study can not establish causal link between belief and attitude. Future research can investigate the current research question by using experimental design to establish causal links.

References:

Appendix:

head(finalData,48)
##              natspac          consci
## 1  Positive Attitude High Confidence
## 2  Positive Attitude  Low Confidence
## 3  Positive Attitude High Confidence
## 4  Positive Attitude  Low Confidence
## 5  Negative Attitude  Low Confidence
## 6  Positive Attitude  Low Confidence
## 7  Negative Attitude  Low Confidence
## 8  Positive Attitude High Confidence
## 9  Negative Attitude  Low Confidence
## 10 Positive Attitude High Confidence
## 11 Positive Attitude High Confidence
## 12 Negative Attitude High Confidence
## 13 Positive Attitude  Low Confidence
## 14 Positive Attitude  Low Confidence
## 15 Positive Attitude  Low Confidence
## 16 Negative Attitude  Low Confidence
## 17 Positive Attitude  Low Confidence
## 18 Positive Attitude High Confidence
## 19 Positive Attitude  Low Confidence
## 20 Negative Attitude  Low Confidence
## 21 Negative Attitude  Low Confidence
## 22 Positive Attitude High Confidence
## 23 Negative Attitude  Low Confidence
## 24 Positive Attitude  Low Confidence
## 25 Positive Attitude  Low Confidence
## 26 Positive Attitude High Confidence
## 27 Negative Attitude  Low Confidence
## 28 Positive Attitude High Confidence
## 29 Positive Attitude High Confidence
## 30 Positive Attitude High Confidence
## 31 Negative Attitude High Confidence
## 32 Negative Attitude High Confidence
## 33 Positive Attitude  Low Confidence
## 34 Positive Attitude  Low Confidence
## 35 Negative Attitude High Confidence
## 36 Negative Attitude High Confidence
## 37 Positive Attitude High Confidence
## 38 Positive Attitude High Confidence
## 39 Negative Attitude  Low Confidence
## 40 Negative Attitude High Confidence
## 41 Positive Attitude  Low Confidence
## 42 Negative Attitude High Confidence
## 43 Positive Attitude  Low Confidence
## 44 Positive Attitude  Low Confidence
## 45 Negative Attitude  Low Confidence
## 46 Positive Attitude High Confidence
## 47 Positive Attitude  Low Confidence
## 48 Negative Attitude High Confidence