The “World Values Survey”" has documented waves of data collection, to understand values across cultures and across time. Below study concentrates study related to environment. People across world were asked their opinion about “Environmental problems in the world: Global warming or the greenhouse effect.” and below is analysis about the response received.
Here we will concentrate on data from three countries. US, China and India. These three countries are presumably the World’s biggest polluters. Hence what people of these countries think about environment will have major impact across the globe.
library(data.table)
## Warning: package 'data.table' was built under R version 3.2.4
#mydat <- fread('https://raw.githubusercontent.com/chirag-vithlani/606/master/project%20proposal/Environment_Subset.RData')
Environment_Subset_Data <- "Environment_Subset.RData"
load(Environment_Subset_Data)
colnames(Environment_Subset) <- c("Environment_Related_Question",
"Year_Of_survey",
"Country_or_Region",
"Age",
"Highest_educational_level")
# Subset to US,India & China
# also removing missing/unanswered data
Environment_Subset_US_India_China<-subset( Environment_Subset,(Country_or_Region==356 | Country_or_Region==156 |Country_or_Region==840) & Environment_Related_Question >0 )
Environment_Subset_US_India_China_Country_Number<-Environment_Subset_US_India_China
#Country specific labeling
Environment_Subset_US_India_China$Country_or_Region[Environment_Subset_US_India_China$Country_or_Region==156] <-"China"
Environment_Subset_US_India_China$Country_or_Region[Environment_Subset_US_India_China$Country_or_Region==356] <-"India"
Environment_Subset_US_India_China$Country_or_Region[Environment_Subset_US_India_China$Country_or_Region==840] <-"US"
aggregate(Environment_Subset_US_India_China[1],list(Environment_Subset_US_India_China$Country_or_Region),mean)
## Group.1 Environment_Related_Question
## 1 China 1.816924
## 2 India 1.709510
## 3 US 1.779605
Environment_Subset_US_India_China$avg[Environment_Subset_US_India_China$Country_or_Region=='India']<-1.70
Environment_Subset_US_India_China$avg[Environment_Subset_US_India_China$Country_or_Region=='China']<-1.81
Environment_Subset_US_India_China$avg[Environment_Subset_US_India_China$Country_or_Region=='US']<-1.77
#Plotting
plot_ly(Environment_Subset_US_India_China,
x = Country_or_Region,
y = avg,
name = "Environment",
type = "bar"
)%>%
layout(title = "Average response of people country wise",yaxis = list(title="Average (1|Very serious->4|Not serious at all)",range = c(1.6, 1.9)))
with(Environment_Subset_US_India_China_Country_Number,cor.test(Environment_Related_Question,Country_or_Region))
##
## Pearson's product-moment correlation
##
## data: Environment_Related_Question and Country_or_Region
## t = -0.32547, df = 3789, p-value = 0.7448
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03711559 0.02655138
## sample estimates:
## cor
## -0.005287463
Environment_Subset_US_India_China25<-subset(Environment_Subset_US_India_China,Environment_Subset_US_India_China$Age<25)
grp25<-aggregate(Environment_Subset_US_India_China25[1],list(Environment_Subset_US_India_China25$Country_or_Region),mean)
names(grp25) <- c("Country","Avg")
plot_ly(grp25,
x = Country,
y = Avg,
name = "Environment",
type = "bar"
)%>%
layout(title = "Average response of people country wise (Age under 25)",yaxis = list(range = c(1.6, 1.9)))
Environment_Subset_US_India_China_High_Degree<-subset(Environment_Subset_US_India_China,Environment_Subset_US_India_China$Highest_educational_level==8)
highlyQualified<-aggregate(Environment_Subset_US_India_China_High_Degree[1],list(Environment_Subset_US_India_China_High_Degree$Country_or_Region),mean)
names(highlyQualified) <- c("Country","Avg")
plot_ly(highlyQualified,
x = Country,
y = Avg,
name = "Environment",
type = "bar"
)%>%
layout(title = "Avg response of people country wise (Higher education)",yaxis = list(range = c(1,2)))
Environment_Subset_US_India_China_Less_serious<-subset(Environment_Subset_US_India_China,Environment_Subset_US_India_China$Environment_Related_Question==4)
table(Environment_Subset_US_India_China_Less_serious$Country_or_Region)
##
## China India US
## 26 75 68
table(Environment_Subset_US_India_China_Less_serious$Highest_educational_level)
##
## -3 1 2 3 4 5 6 7 8
## 38 5 13 6 21 23 34 13 16
Although there is no strong relationship between variables studied. We still have some noticable points.
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
How people think about environment in three different countries China,US and India? ( World’s biggest polluters** ).
If time permits other variables like age and level of education will be explored.
What are the cases, and how many are there?
Total cases are 3791
Describe the method of data collection.
As per “Collection Procedures” section at http://www.worldvaluessurvey.org/WVSContents.jsp
“The mode of data collection for WVS surveys is face-to-face interviewing. Other modes (e.g., telephone, mail, internet) are not acceptable except under very exceptional circumstances and only on an experimental basis”
R Data file downloaded from http://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp and then filtered as per requirement
What type of study is this (observational/experiment)?
This is observational study.
Link :
http://www.worldvaluessurvey.org/WVSDocumentationWVL.jsp
Citation : WORLD VALUES SURVEY 1981-2014 LONGITUDINAL AGGREGATE v.20150418. World Values Survey Association (www.worldvaluessurvey.org). Aggregate File Producer: JDSystems, Madrid SPAIN.
The response variable would be answer to question “Environmental problems in the world: Global warming or the greenhouse effect.”
| Value | Description |
|---|---|
| 1 | Very serious |
| 2 | Somewhat serious |
| 3 | Not very serious |
| 4 | Not serious at all |
We wil check how answer to this question varies with region,and also possibly, with age and education.
What is the explanatory variable, and what type is it (numerical/categorival)?
Explanatory variables are region ( US,China & India ), education qualification and age.
| Value | Description |
|---|---|
| 156 | China |
| 356 | India |
| 840 | US |
| Value | Description |
|---|---|
| 1 | Inadequately completed elementary education |
| 2 | Completed (compulsory) elementary education |
| 3 | Incomplete secondary school: technical/vocational type/(Compulsory) elementary education and basic vocational qualification |
| 4 | Complete secondary school: technical/vocational type/Secondary, intermediate vocational qualification |
| 5 | Incomplete secondary: university-preparatory type/Secondary, intermediate general qualification |
| 6 | Complete secondary: university-preparatory type/Full secondary, maturity level certificate |
| 7 | Some university without degree/Higher education - lower-level tertiary certificate |
| 8 | University with degree/Higher education - upper-level tertiary certificate |
| -5 | Missing; Unknown |
| -4 | Not asked in survey |
| -3 | Not applicable; No formal education |
| -2 | No answer |
| -1 | Don´t know |
| Value | Description |
|---|---|
| 15 to 98 | age ranges from 15 to 98 |
| -5 | Missing; Unknown |
| -4 | Not asked in survey |
| -3 | Not applicable |
| -2 | No answer |
| -1 | Don’t know |
** Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed. **
We can ask below questions from data available.
## Environment_Related_Question Year_Of_survey Country_or_Region
## Min. :1.000 Min. :2006 Length:3791
## 1st Qu.:1.000 1st Qu.:2006 Class :character
## Median :2.000 Median :2006 Mode :character
## Mean :1.767 Mean :2006
## 3rd Qu.:2.000 3rd Qu.:2007
## Max. :4.000 Max. :2007
## Age Highest_educational_level avg
## Min. :-2.00 Min. :-3.000 Min. :1.700
## 1st Qu.:31.00 1st Qu.: 2.000 1st Qu.:1.700
## Median :42.00 Median : 5.000 Median :1.770
## Mean :43.57 Mean : 3.838 Mean :1.758
## 3rd Qu.:54.00 3rd Qu.: 6.000 3rd Qu.:1.810
## Max. :93.00 Max. : 8.000 Max. :1.810
The mean value of question “Environmental problems in the world: Global warming or the greenhouse effect” In China NaN , in India NaN and in US NaN
WORLD VALUES SURVEY 1981-2014 LONGITUDINAL AGGREGATE v.20150418. World Values Survey Association (www.worldvaluessurvey.org). Aggregate File Producer: JDSystems, Madrid SPAIN.
** World’s biggest polluters - China, U.S. and India : **