dplyr
.All operations in points 3-10 should be done with dplyr
functions
Load a csv-file with a Comparative Political Data Set using this link. Look at the structure of this data frame. Does it seem to be correct?
Add two options to read.csv()
:
dec = ","
so that R can regard numbers with commas as numbers, not stringsstringsAsFactors = TRUE
so that R can regard text values as character ones, not factor.
Make sure that now everything is correct.
Select columns year
, country
, iso
, poco
, eu
, gov_right1
, gov_cent1
, gov_left1
, gov_party
, gov_type
, womenpar
, pop
and save them to the data frame small
.
Create a column log_pop
with values of the natural logarithm of population and add it to small
.
How many observations in small
correspond to post-communist and not post-communist states?
How many post-communist countries are in small
? Hint: n_distinct()
in dplyr
combined with summarise()
might be helpful.
Calculate the mean percentage of right-wing, left-wing and center parties in legislative bodies. Do the same, but separately for EU-members and states that are not in the European Union.
Choose rows that correspond to post-communist countries with the percentage of right-wing parties greater than 50. Look at them. How many rows there are? And how many countries?
Consider the following example:
# calculate a Pearson's correlation coef and test its significance
cor.test(small$womenpar, small$gov_right1)
# save results in test
test <- cor.test(small$womenpar, small$gov_right1)
# look at this structure
str(test)
Now you can choose any element from test, for example, the correlation coefficient itself or corresponding p-value.
test$estimate
test$p.value
small
in the following way: calculate the correlation coefficient between womenpar
and gov_right1
for post-communist and not post-communist states separately, and report the coefficient and the corresponding p-value for each group.