Lab 1. Data manipulation with dplyr.

All operations in points 3-10 should be done with dplyr functions

  1. Load a csv-file with a Comparative Political Data Set using this link. Look at the structure of this data frame. Does it seem to be correct?

  2. Add two options to read.csv():

  1. Select columns year, country, iso, poco, eu, gov_right1, gov_cent1, gov_left1, gov_party, gov_type, womenpar, pop and save them to the data frame small.

  2. Create a column log_pop with values of the natural logarithm of population and add it to small.

  3. How many observations in small correspond to post-communist and not post-communist states?

  4. How many post-communist countries are in small? Hint: n_distinct() in dplyr combined with summarise() might be helpful.

  5. Calculate the mean percentage of right-wing, left-wing and center parties in legislative bodies. Do the same, but separately for EU-members and states that are not in the European Union.

  6. Choose rows that correspond to post-communist countries with the percentage of right-wing parties greater than 50. Look at them. How many rows there are? And how many countries?

  7. Consider the following example:

# calculate a Pearson's correlation coef and test its significance
cor.test(small$womenpar, small$gov_right1)
# save results in test
test <- cor.test(small$womenpar, small$gov_right1)
# look at this structure
str(test)

Now you can choose any element from test, for example, the correlation coefficient itself or corresponding p-value.

test$estimate
test$p.value
  1. Summarise small in the following way: calculate the correlation coefficient between womenpar and gov_right1 for post-communist and not post-communist states separately, and report the coefficient and the corresponding p-value for each group.