Instructions

Please prepare your answers as clearly as possible. Label every chart completely and clearly indicate which answer corresponds to which question. Submit your answers as a PDF file.

Question 1

Download ps2q1.csv and read it into memory to answer the following questions.

Question 1 (A)

Summarize all of the variables (give the mean, median, and range). Report each value.

Question 1 (B)

Use the cor() command to correlate each variable pair (x1,y1). Report your correlations and interpret them as far as you can. See Bailey pp. 9-11 for more on how to interpret correlations.

Question 1 (C)

Plot each variable pair (x1,y1). What stands out to you? Tell me why this might be important or interesting.

Question 2

Consider the plots above. Based on Bailey Chapter 3, which of the two samples would allow us to more reliably produce unbiased estimates of \(\beta_1\)? Why?

Question 3

Consider the plots above. Based on Bailey Chapter 3, which of the two samples would allow us to more reliably produce unbiased estimates of \(\beta_1\)? Why?

Question 4

For this question, use the state.x77 dataset preloaded in R. We first have to convert the matrix state.x77 to a dataframe:

state.df <- data.frame(state.x77)

Use the tools we have discussed and used so far to see what the names of the state.df dataframe are, get to know the mean, median, and so forth of each variable. Use ?state.x77 to see what each variable refers to.

Question 4(A)

Calculate the equation \(Income = \beta_0 + \beta_1 Illiteracy + \epsilon\). Report your estimates for \(\beta_0\) and \(\beta_1\).

Question 4(B)

Plot the variables and label them appropriately. Include the trend line you calculated in 4(A).

Question 4(C)

Interpret your results substantively. What does the association imply about the relationship between Illiteracy and Income? Be as precise as you can.

Question 4(D)

Interprety the results statistcally. Is your estimate of \(\beta_1\) statistically significant? Provide an explanation of why or why not, based on Bailey.

Question 4(E)

How reliable do you think the estimate of the relationship between Illiteracy and Income is in a causal sense? That is, do you think this is a reliable guide to the causal effect of Illiteracy on Income? Justify your answer. Explain how we might produce a more reliable estimate.

Question 5

Use the state.df dataframe again. This time, repeat 4(A)-4(E) but for estimating the equation \(Murder = \beta_0 + \beta_1 Frost + \epsilon\).

Question 6

For this question, use QOGSelectPS2.dta and the read.dta() command in the foreign package.

library(foreign)
data <- read.dta("QOGSelectPS2.dta")
names(data)
##  [1] "ccode"            "cname"            "chga_demo"       
##  [4] "ht_region"        "ihme_mmr"         "pwt_pop"         
##  [7] "undp_gii"         "wdi_co2"          "wdi_gdpc"        
## [10] "oil_income_pc_1k"
plot(data$ihme_mmr,data$oil_income_pc_1k)

This is a selection of data drawn from the Quality of Government project.

Read the data into R and create a plot of maternal mortality (ihme_mmr), which measures the rate of deaths of mothers per 100,000 live births, versus oil income per capita (oil_income_pc_1k), which gives the average oil and natural gas-derived income per person per country. Fully label the graph and interpret it.

Question 7

Create a barplot of Oil Income Per Capita by region of the world (ht_region). Fully label the chart and provide two or three sentences describing what you see. Second, plot oil income per capita versus a binary measure of whether a state is democratic (1) or autocratic (0), chga_demo. Again, fully label the graph and provide two or three sentences describing what you see.

Question 8

The United Nations has developed a measure of gender inequality, the Gender Inequality Index (GII). Higher numbers of the GII express more unequal development between men and women and lower numbers represent lower gaps in the development of men and women. (For more, you can see the UN web site).

This measure is included in your dataset as undp_gii. Please (A) plot GII against oil income and (B) regress GII against oil income per capita. Substantively interpret the regression and plot—what do these numbers suggest about the relaitonship between oil income and gender quality? What explanations could affect our interpretation of that relationship? Is the relationship statistically significant? Does that affect our understanding of the causal effect of oil on gender disparities on development?