getwd()
## [1] "C:/Users/Matthew01/Documents/PS15/ProblemSet3"
setwd("/Users/Matthew01/Documents/PS15/ProblemSet3/")
load("fl3.Rdata")
load("Tempdata.Rdata")
A. B. C. G.
model1 <- lm(fl3$gdpenl ~ fl3$polity2l, data = fl3)
plot(fl3$polity2l,fl3$gdpenl,
main="Level of Democracy versus GDP",
xlab = "How Democratic is the country",
ylab = "GDP in Thousands of Dollars")
abline(model1, col = "blue")
The independent variable is the level of democracy and autocracy in the country The dependent variable is the GDP The reason the X variables line up the way they do is because they are whole numbers, so there are no decimals.
cov(fl3$gdpenl, fl3$polity2l)
## [1] -0.5289266
cor(fl3$gdpenl, fl3$polity2l)
## [1] -0.01359485
Covariance is -.529. This means that when X increases there is no correlated change from Y Correlation is -.013 which shows that the two variables do not vary together
E. Yi = B0 + B1Xi + Ei B0= Is the Y- interecept or the value of Y when X is 0 B1= Is the change of Y in response to the change of X
F. B1 gives us a great estimation of the line because its formula is cov X, Y / Var X which gives us a great efficient estimate. It also minimizes the SSE.
model1 <- lm(fl3$gdpenl ~ fl3$polity2l, data = fl3)
summary(model1)
##
## Call:
## lm(formula = fl3$gdpenl ~ fl3$polity2l, data = fl3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.508 -1.812 -1.395 0.215 51.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.46268 0.44439 5.542 1.27e-07 ***
## fl3$polity2l -0.01069 0.06339 -0.169 0.866
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.55 on 154 degrees of freedom
## Multiple R-squared: 0.0001848, Adjusted R-squared: -0.006307
## F-statistic: 0.02847 on 1 and 154 DF, p-value: 0.8662
The coefficient estimates represent B1 So when GDP increases by 1, polity21 decreases by .01 And when polity increases by 1, GDP increases by 2.46
H. I.
LogGdp <- log(fl3$gdpenl)
model2 <- lm(LogGdp ~ fl3$polity2l, data = fl3)
plot(fl3$polity2l,LogGdp,
main="Level of Democracy versus GDP",
xlab = "How Democratic is the country",
ylab = "GDP in Thousands of Dollars")
abline(model1, col = "red")
J.
model3 <- lm(LogGdp ~ fl3$polity2l, data = fl3)
summary(model3)
##
## Call:
## lm(formula = LogGdp ~ fl3$polity2l, data = fl3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7810 -0.6423 -0.0246 0.6165 4.1333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.24383 0.07981 3.055 0.00265 **
## fl3$polity2l 0.04875 0.01138 4.283 3.24e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9967 on 154 degrees of freedom
## Multiple R-squared: 0.1064, Adjusted R-squared: 0.1006
## F-statistic: 18.34 on 1 and 154 DF, p-value: 3.237e-05
The results now show much clearer data. THe results are more robust because of this k. The data is not strong enoug to make a causal claim. Just because X and Y are correlated does now mean there is not an outside influence effecting the change of Y. Also the error term has not been evaluated
load("tempdata.Rdata")
range(tempdata$temp)
## [1] 61.6 74.0
Temphigh1 <- tempdata[which(tempdata$temp>73.9),]
summary(Temphigh1)
## year temp
## Min. :2015 Min. :74
## 1st Qu.:2015 1st Qu.:74
## Median :2015 Median :74
## Mean :2015 Mean :74
## 3rd Qu.:2015 3rd Qu.:74
## Max. :2015 Max. :74
TempLow1 <- tempdata[which(tempdata$temp<61.7),]
summary(TempLow1)
## year temp
## Min. :1946 Min. :61.6
## 1st Qu.:1946 1st Qu.:61.6
## Median :1946 Median :61.6
## Mean :1946 Mean :61.6
## 3rd Qu.:1946 3rd Qu.:61.6
## Max. :1946 Max. :61.6
2015 Was the hottest year 1946 Was the coldest year B.
model2 <- lm(LogGdp ~ fl3$polity2l, data = fl3)
model3 <- lm(tempdata$temp ~ tempdata$year, data = tempdata)
plot(tempdata$year, tempdata$temp,
main="Temperatures from 1964- 2015",
xlab = "Years",
ylab = "Temperature in Celsius")
abline(model3, col = "blue")
It shows us that the temperature is gradually increasing over time.
Seventies <- tempdata[which(tempdata$year <= 1979 & tempdata$year >= 1970),]
Eighties <- tempdata[which(tempdata$year <= 1989 & tempdata$year >= 1980),]
Nineties <- tempdata[which(tempdata$year <= 1999 & tempdata$year >= 1990),]
Noughts <- tempdata[which(tempdata$year <= 2009 & tempdata$year >= 2000),]
LateTwenties <- tempdata[which(tempdata$year <= 2015 & tempdata$year >= 2010),]
mean(Seventies$temp)
## [1] 66.42
mean(Eighties$temp)
## [1] 67.01
mean(Nineties$temp)
## [1] 67.24
mean(Noughts$temp)
## [1] 65.38
mean(LateTwenties$temp)
## [1] 68.13333
The mean is slightly increasing over time.
sd(Seventies$temp)
## [1] 1.309623
sd(Eighties$temp)
## [1] 1.610003
sd(Nineties$temp)
## [1] 1.594574
sd(Noughts$temp)
## [1] 1.842583
sd(LateTwenties$temp)
## [1] 3.583109
The Standard Deviation increases every year. This shows how much the temperature varies from the mean temperature meaning that we have had a higher frequency of hotter and colder days in LateTwenties compared to other subsets.
E. It is important because it shows the inconsistency in the temperature. A high Standard deviation shows how infrequently the temperature is not average.
C.Palm Beach is an outlier D. The main finding is that the butterfly ballot did influence votes E.
C. Linear regression creates a line through a data set. This lines slope is the minimum distance of the SSE of the data set. Created the least amount of differing from the line to the data set. One example could be finding the relationship between voting and age, it would show an average slope of the data points. Showing that as your age increases you are more likely to vote