getwd()
## [1] "C:/Users/Matthew01/Documents/PS15/ProblemSet3"
setwd("/Users/Matthew01/Documents/PS15/ProblemSet3/")
load("fl3.Rdata")
load("Tempdata.Rdata")

A. B. C. G.

model1 <- lm(fl3$gdpenl ~ fl3$polity2l, data = fl3)
plot(fl3$polity2l,fl3$gdpenl, 
     main="Level of Democracy versus GDP", 
     xlab = "How Democratic is the country", 
     ylab = "GDP in Thousands of Dollars")
abline(model1, col = "blue")

The independent variable is the level of democracy and autocracy in the country The dependent variable is the GDP The reason the X variables line up the way they do is because they are whole numbers, so there are no decimals.

cov(fl3$gdpenl, fl3$polity2l)
## [1] -0.5289266
cor(fl3$gdpenl, fl3$polity2l)
## [1] -0.01359485

Covariance is -.529. This means that when X increases there is no correlated change from Y Correlation is -.013 which shows that the two variables do not vary together

E. Yi = B0 + B1Xi + Ei B0= Is the Y- interecept or the value of Y when X is 0 B1= Is the change of Y in response to the change of X

F. B1 gives us a great estimation of the line because its formula is cov X, Y / Var X which gives us a great efficient estimate. It also minimizes the SSE.

model1 <- lm(fl3$gdpenl ~ fl3$polity2l, data = fl3)
summary(model1)
## 
## Call:
## lm(formula = fl3$gdpenl ~ fl3$polity2l, data = fl3)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.508 -1.812 -1.395  0.215 51.353 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.46268    0.44439   5.542 1.27e-07 ***
## fl3$polity2l -0.01069    0.06339  -0.169    0.866    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.55 on 154 degrees of freedom
## Multiple R-squared:  0.0001848,  Adjusted R-squared:  -0.006307 
## F-statistic: 0.02847 on 1 and 154 DF,  p-value: 0.8662

The coefficient estimates represent B1 So when GDP increases by 1, polity21 decreases by .01 And when polity increases by 1, GDP increases by 2.46

H. I.

LogGdp <- log(fl3$gdpenl)
model2 <- lm(LogGdp ~ fl3$polity2l, data = fl3)
plot(fl3$polity2l,LogGdp, 
     main="Level of Democracy versus GDP", 
     xlab = "How Democratic is the country", 
     ylab = "GDP in Thousands of Dollars")
abline(model1, col = "red")

J.

model3 <- lm(LogGdp ~ fl3$polity2l, data = fl3)
summary(model3)
## 
## Call:
## lm(formula = LogGdp ~ fl3$polity2l, data = fl3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7810 -0.6423 -0.0246  0.6165  4.1333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.24383    0.07981   3.055  0.00265 ** 
## fl3$polity2l  0.04875    0.01138   4.283 3.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9967 on 154 degrees of freedom
## Multiple R-squared:  0.1064, Adjusted R-squared:  0.1006 
## F-statistic: 18.34 on 1 and 154 DF,  p-value: 3.237e-05

The results now show much clearer data. THe results are more robust because of this k. The data is not strong enoug to make a causal claim. Just because X and Y are correlated does now mean there is not an outside influence effecting the change of Y. Also the error term has not been evaluated

load("tempdata.Rdata")
range(tempdata$temp)
## [1] 61.6 74.0
Temphigh1 <- tempdata[which(tempdata$temp>73.9),]
summary(Temphigh1)
##       year           temp   
##  Min.   :2015   Min.   :74  
##  1st Qu.:2015   1st Qu.:74  
##  Median :2015   Median :74  
##  Mean   :2015   Mean   :74  
##  3rd Qu.:2015   3rd Qu.:74  
##  Max.   :2015   Max.   :74
TempLow1 <- tempdata[which(tempdata$temp<61.7),]
summary(TempLow1)
##       year           temp     
##  Min.   :1946   Min.   :61.6  
##  1st Qu.:1946   1st Qu.:61.6  
##  Median :1946   Median :61.6  
##  Mean   :1946   Mean   :61.6  
##  3rd Qu.:1946   3rd Qu.:61.6  
##  Max.   :1946   Max.   :61.6

2015 Was the hottest year 1946 Was the coldest year B.

model2 <- lm(LogGdp ~ fl3$polity2l, data = fl3)
model3 <- lm(tempdata$temp ~ tempdata$year, data = tempdata)
plot(tempdata$year, tempdata$temp, 
     main="Temperatures from 1964- 2015", 
     xlab = "Years", 
     ylab = "Temperature in Celsius")
abline(model3, col = "blue")

It shows us that the temperature is gradually increasing over time.

Seventies <- tempdata[which(tempdata$year <= 1979 & tempdata$year >= 1970),]  
Eighties <- tempdata[which(tempdata$year <= 1989 & tempdata$year >= 1980),] 
Nineties <- tempdata[which(tempdata$year <= 1999 & tempdata$year >= 1990),] 
Noughts <- tempdata[which(tempdata$year <= 2009 & tempdata$year >= 2000),] 
LateTwenties <- tempdata[which(tempdata$year <= 2015 & tempdata$year >= 2010),] 
mean(Seventies$temp)
## [1] 66.42
mean(Eighties$temp)
## [1] 67.01
mean(Nineties$temp)
## [1] 67.24
mean(Noughts$temp)
## [1] 65.38
mean(LateTwenties$temp)
## [1] 68.13333

The mean is slightly increasing over time.

sd(Seventies$temp)
## [1] 1.309623
sd(Eighties$temp)
## [1] 1.610003
sd(Nineties$temp)
## [1] 1.594574
sd(Noughts$temp)
## [1] 1.842583
sd(LateTwenties$temp)
## [1] 3.583109

The Standard Deviation increases every year. This shows how much the temperature varies from the mean temperature meaning that we have had a higher frequency of hotter and colder days in LateTwenties compared to other subsets.

E. It is important because it shows the inconsistency in the temperature. A high Standard deviation shows how infrequently the temperature is not average.

  1. A. Did the butterfly ballot lead to more votes for Buchanan B. The Dependent variable is proportion of votes for Buchana The independent variable was the amount of votes cast

C.Palm Beach is an outlier D. The main finding is that the butterfly ballot did influence votes E.

  1. Correlation(x ,y) = Covariance(X,y) / Standard deviance of X and Covariance(x ,y) = Correlation (X,Y) x Standardeviation of X and Y Covariance normalizes the correlation B. A random variable is a variable selected with no bais from a population. It is selected based on chance with little human influence

C. Linear regression creates a line through a data set. This lines slope is the minimum distance of the SSE of the data set. Created the least amount of differing from the line to the data set. One example could be finding the relationship between voting and age, it would show an average slope of the data points. Showing that as your age increases you are more likely to vote