getwd()
## [1] "C:/Users/Matthew01/Documents/PS15/Problemset2"
setwd("/Users/Matthew01/Documents/PS15/Problemset2/")

Question 1:

load("fl3.RData")
dim(fl3)
## [1] 156  17
names(fl3)
##  [1] "cname"    "year"     "pop1"     "lpopl1"   "warl"     "war"     
##  [7] "gdpenl"   "lmtnest"  "ncontig"  "Oil"      "nwstate"  "instab"  
## [13] "polity2l" "ethfrac"  "relfrac"  "war_prop" "numyears"

There are 17 variables and 156 Countries. Sample size 156

median(fl3$pop1)
## [1] 4517.001
mean(fl3$pop1)
## [1] 17586.24

Median: 4517.001 Mean: 17586.24

The mean and median are not the same because the data is not perfectly normal distribution. The data is skewed right because the mean is larger than the median

boxplot(fl3$pop1, main = "Boxplot of Population distribution",
xlab = "Population",ylab = "Frequency")
abline(v = mean(fl3$pop1), lty = 2, lwd = 2, col = "red")
abline(v = median(fl3$pop1), lty = 4, lwd = 2, col = "blue")
legend("topright", legend = c("mean", "median"), lty = c(2,4), col = c("red","blue"))

plot(density(fl3$pop1), main = "Density Distribution of population",xlab = "Population",ylab = "Frequency")
abline(v = mean(fl3$pop1), lty = 2, lwd = 2, col = "red")
abline(v = median(fl3$pop1), lty = 4, lwd = 2, col = "blue")
legend("topright", legend = c("mean", "median"), lty = c(2,4), col = c("red","blue"))

plot(density(fl3$pop1), main = "Density Distribution of population",xlab = "Population",ylab = "Frequency")
abline(v = mean(fl3$pop1), lty = 2, lwd = 2, col = "red")
abline(v = median(fl3$pop1), lty = 4, lwd = 2, col = "blue")
legend("topright", legend = c("mean", "median"), lty = c(2,4), col = c("red","blue"))

logpop1 <- log(fl3$pop1)
plot(density(logpop1), main = "Density Distribution of Number Killed",xlab = "Population",ylab = "Frequency")
abline(v = mean(logpop1), lty = 2, lwd = 2, col = "red")
abline(v = median(logpop1), lty = 4, lwd = 2, col = "blue")
legend("topright", legend = c("mean", "median"), lty = c(2,4), col = c("red","blue"))

The shape retains a much more normal distribution and the mean and median reflect that in the sense that they become closer together. Taking the log of data erases the skew

YesOil <- fl3[which(fl3$Oil=="1"),  ]  
NoOil <- fl3[which(fl3$Oil=="0"),  ]
mean(YesOil$war)
## [1] 6.055556
mean(NoOil$war)
## [1] 5.57971
sd(YesOil$war)
## [1] 11.25884
sd(NoOil$war)
## [1] 10.43003

The mean value of having oil is 6.056 The mean value of having no oil is 5.58 sd of having oil=11.25 sd of no oil=10.43 This data shows there is little difference in the variation of the two

min(fl3$ethfrac)
## [1] 0.001
max(fl3$ethfrac)
## [1] 0.9250348
mean(fl3$ethfrac)
## [1] 0.4082564
sd(fl3$ethfrac)
## [1] 0.2798512

min: .001 max= .92 mean= .40 sd= .27 The variable ranges from 0 to 1 because it reflects a percentage, 0 equals 0% and 1 equals 100%.

model1 <- lm(fl3$war ~ fl3$ethfrac, data = fl3) 
plot(fl3$ethfrac, fl3$war, ylab = "Ethnic Fraction", xlab = "Total Wars", main = "The Relationship Between Ethnic Fraction and War")
abline(model1, col = "Blue")

The dependent variabel is the ethnic fraction and the independent variable is the total wars This shows us there is a positive correlation between ethnic fraciton and wars ——————- Question 2 1. It is called the expected value (1/N)N x E{x}

  1. standard deviation= ((E(Xi-u)2)/N)(1/2) standard deviation is variance rooted.

Question 3 1. The estimand is the large scope of what yhou are trying to estimate. Ex= Average weight of women the estimate is what the estimator uses to evaluate the estimand Ex= is the average height of a sample the estimator is a method to evaluate the estimand Ex= creates a sample estimate from the estimate

The larger population you draw from, the more normal the distribution will be and the lower the variance will be.