Submit your HTML output (not your .rmd file) to TurnitIn by Midnight on Tuesday, 9 Feb

Introduction: Reminders about R and Rmarkdown

Please make sure you have downloaded this file (pset2.rmd) to your computer and opened it in R Studio. By download, we do not mean you just clicked on it in your browser – we mean you have saved the actual file to a directory on your computer, and then opened it with R Studio. You should now be looking at the “raw” text of the .rmd file.

If you need to re-orient yourself, please review the introductory material that pset 1 began with describing how to include R code “chunks” into this .rmd file. Remember that when you “knit” the rmd file, only the code written into code “chunks” will be executed and have its results integrated into the output html file.

Please also remember that you will want to use the console to “try out” code to get it working. Once you get it working, copy the code that worked (not the results) over into a code chunk in your rmd. Remember that the code within your rmd file has to be self-contained and include all the steps – your rmd file will not “remember” what you did on your own in the console. When you click knit, it can only execute the code that was present in the rmd. Do not copy the results from your console into your RMD file. In addition, do not include large amounts of output in your writeup (i.e. don’t print full datasets to the screen).

Finally, it is best to work will small amounts of code at a time: get some code working, copy it into the rmd as a code chunk, write your text answer (outside the code chunk) if needed, and check that the file will still knit properly. Do not proceed to answer more questions until you get the first bit working. This will save you huge headaches. While we were generous in grading the first pset for people whose rmd files would not knit, we will not be so generous this time.

Question 1.

First, we will load a dataset (derived from Fearon and Laitin, 2003). The following code should work for most people. This assumes you have used install.packages("RCurl") at some point before (which you did for, pset2!).

library(RCurl)

## Loading required package: bitops

myCsv <- getURL("https://dl.dropboxusercontent.com/u/22563946/fl2csv.csv")
fl2 <- read.csv(textConnection(myCsv))

If that does not work for you, remove it from the RMD (so your file will knit), and instead please go to and download the file fl2.RData to a directory on your computer. You can then load it using load(fl2.RData) in a code chunk once you have set your working directory (i.e. how we loaded data in the first pset).

(1a) What is the name of the variable that was created as the dateset in the above step? What are the names of the variables stored in this dataset? How many observations are there? (Pleae do not print the whole dataset in your ouput!)

The name of the variable creates as the dataset is “fl2” and the names of the variables stored in the data set are :“cname” “ethfrac” “gdpenl” “instab” “lmtnest” “lpopl1” “ncontig” “numyears” “nwstate” “Oil” “polity2l” “relfrac” “war” “war_prop” “warl”,“X”, “year There a 156 observations

head(fl2)

##     X    cname year warl war gdpenl    lpopl1  lmtnest ncontig Oil nwstate
## 1   1      USA 1945    0   0  7.626 11.856296 3.214868       1   0       0
## 2  56   CANADA 1945    0   0  5.188  9.424968 2.797281       0   0       0
## 3 113     CUBA 1947    0   2  2.127  8.524963 1.694107       0   0       0
## 4 168    HAITI 1947    0   5  0.765  7.992945 2.797281       0   0       0
## 5 223 DOMINICA 1947    0   1  0.668  7.567863 2.856470       0   0       0
## 6 276  JAMAICA 1962    0   0  1.810  7.419980 1.335001       0   0       1
##   instab polity2l    ethfrac    relfrac   war_prop numyears
## 1      0       10 0.35695010 0.59599996 0.00000000       55
## 2      0       10 0.75499403 0.63119996 0.00000000       55
## 3      0        3 0.03572363 0.25500000 0.04347826       46
## 4      1       -1 0.01359123 0.33279997 0.09433962       53
## 5      0       -9 0.03698879 0.09500003 0.01886792       53
## 6      0       10 0.04576665 0.50380003 0.00000000       38

names(fl2)

##  [1] "X"        "cname"    "year"     "warl"     "war"      "gdpenl"  
##  [7] "lpopl1"   "lmtnest"  "ncontig"  "Oil"      "nwstate"  "instab"  
## [13] "polity2l" "ethfrac"  "relfrac"  "war_prop" "numyears"

dim(fl2)

## [1] 156  17

(1b) The variable gdpenl is GDP per capita, measures in thousands of dollars (using 1985 price). Show the sample distribution of this variable. Specifically, do a density plot, and a boxplot.

plot(density(fl2$gdpenl))

boxplot(fl2$gdpenl)

(1c) Remark on the shape of this distribution. Compute the median and mean.

Median:

median(fl2$gdpenl)

## [1] 1.091

Mean:

mean(fl2$gdpenl)

## [1] 2.46391

(1d) Repeat (1c), but this time show the distribution of log(gdpenl) using a density plot and a boxplot. Remark on the difference in shape when using the log.

plot(density(log(fl2$gdpenl)))

boxplot(log(fl2$gdpenl))

Question 2.

In the same dataset, the variable Oil describes whether each country in the dataset is an oil exporter (=1) or not (=0). The variable war describes how many years from 1945 to 1999 that country had a civil war.

We are interested in whether the number of years in civil war (war) is on average the same for oil exporters as oil non-exporters. Even though war is technically a count variable, we will treat it as an essentially continuous variable, and work with means.

ow.data.zero=subset(fl2, select = c("Oil", "war"), subset = (Oil=="0"))

ow.data.one=subset(fl2, select = c("Oil", "war"), subset = (Oil=="1"))

(2a) Let xbar_oil and xbar_noil be the mean of war for the oil exporters and oil non-exporters, respectively. Computer the difference in means and save it as a variable.

xbar_oil=mean(ow.data.one$war)

xbar_noil=mean(ow.data.zero$war)

mean.diff= xbar_oil-xbar_noil

mean.diff

## [1] 0.4758454

(2b) We want to know if the mean of war for oil exporters and non oil exporters are the same or different. Write the null hypothesis and alternative hypothesis corresponding to this question.

H0:mean.diff = 0 H1: mean.diff does not equal 0

(2c) Construct and estimate the appropriate z-statistic. Please first construct the numerator, then the denominator, then the z-statistic itself.

Numerator: mean.diff: (xbar_oil- xbar_noil) Denmenator: SE.mean.diff: sqrt((var(ow.data.one\(war)/18)+(var(ow.data.zero\)war)/138))

Z statistic

SE.mean.diff=sqrt((var(ow.data.one$war)/18)+(var(ow.data.zero$war)/138))

mean.diff/SE.mean.diff

## [1] 0.1700467

(2d) Compare the z-statistic to the critical values necessary to determine if you can reject the null hypothesis at the 0.05 level and at the 0.10 level.

The Z statistic, 0.1700467, is significantly less than the critical values of 1.96 and 1.64.

(2e) Estimate the actual p-value. Provide a carefully-worded statement that correctly states what this p-value means.

pnorm(0.1700467)

## [1] 0.5675133

1-pnorm(0.1700467)

## [1] 0.4324867

2*(1-pnorm(0.1700467))

## [1] 0.8649734

(2f) Carefully state the conclusion you make about the hypotheses, including whether or not you reject the null hypothesis.

Under the null hypothesis, there is a 86.5 % chance that we would get a result more extreme or just as extreme.

(2g) Under the null hypothesis, what is the distribution of the z-statistic?

Under the null hypothesis, there would be an continous distribution of the z statistic, because war is a continous variable, and the null hypothesis was that there was no difference in means and the z staistic is 0.865, X less than or equal to x.

Bonus: (2h) Suppose that rather than computing the z-statistic, we think about the distribution of the difference in means (i.e. xbar_oil-xbar_noil under the null). What is the distribution of this difference in means under the null?

a continuous distribution.

Question 3.

Suppose you have a random variable \(X\) with expectation \(E[X]=u\), and variance given by \(s^2\).

(3a) What is \(SD(X)\)?

\(SD(X)\)= sqrt(\(s^2\))

(3b) What is \(Var(\frac{1}{a}X)\) for a constant \(a\)?

\(Var(\frac{1}{a}X)\) = \(\frac{1}{a^2}Var(X)\)

(3c) What is \(Var(\frac{1}{SD(X)} X)\)?

= \(\frac{1}{SD(X)^2}Var(X)\)

= \(\frac{1}{Var(X)}Var(X)\)

= 1

Midterm (Due 9 Feb 2016)

Nidirah Stephens, PS6

5 Feb, 2016

Introduction: Reminders about R and Rmarkdown

Question 1.

Question 2.

Question 3.

Question 4.