DASS. Lab 1.

Lab 1. Data manipulation with `dplyr`.

All operations in points 3-10 should be done with dplyr functions

Load a csv-file with a Comparative Political Data Set using this link. Look at the structure of this data frame. Does it seem to be correct?
Add two options to read.csv():

dec = "," so that R can regard numbers with commas as numbers, not strings
stringsAsFactors = TRUE so that R can regard text values as character ones, not factor.

Make sure that now everything is correct.

Select columns year, country, iso, poco, eu, gov_right1, gov_cent1, gov_left1, gov_party, gov_type, womenpar, pop and save them to the data frame small.
Create a column log_pop with values of the natural logarithm of population and add it to small.
How many observations in small correspond to post-communist and not post-communist states?
How many post-communist countries are in small? Hint: n_distinct() in dplyr combined with summarise() might be helpful.
Calculate the mean percentage of right-wing, left-wing and center parties in legislative bodies. Do the same, but separately for EU-members and states that are not in the European Union.
Choose rows that correspond to post-communist countries with the percentage of right-wing parties greater than 50. Look at them. How many rows there are? And how many countries?
Consider the following example:

# calculate a Pearson's correlation coef and test its significance
cor.test(small$womenpar, small$gov_right1)

# save results in test
test <- cor.test(small$womenpar, small$gov_right1)
# look at this structure
str(test)

Now you can choose any element from test, for example, the correlation coefficient itself or corresponding p-value.

test$estimate

test$p.value

Summarise small in the following way: calculate the correlation coefficient between womenpar and gov_right1 for post-communist and not post-communist states separately, and report the coefficient and the corresponding p-value for each group.