This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
AllCountries <- read.csv("/Users/ShelsyChouakong/Downloads/AllCountries.csv")
str(AllCountries)
## 'data.frame': 217 obs. of 26 variables:
## $ Country : chr "Afghanistan" "Albania" "Algeria" "American Samoa" ...
## $ Code : chr "AFG" "ALB" "DZA" "ASM" ...
## $ LandArea : num 652.86 27.4 2381.74 0.2 0.47 ...
## $ Population : num 37.172 2.866 42.228 0.055 0.077 ...
## $ Density : num 56.9 104.6 17.7 277.3 163.8 ...
## $ GDP : int 521 5254 4279 NA 42030 3432 16864 11653 4212 NA ...
## $ Rural : num 74.5 39.7 27.4 12.8 11.9 34.5 75.4 8.1 36.9 56.6 ...
## $ CO2 : num 0.29 1.98 3.74 NA 5.83 1.29 5.74 4.78 1.9 8.41 ...
## $ PumpPrice : num 0.7 1.36 0.28 NA NA 0.97 NA 1.1 0.77 NA ...
## $ Military : num 3.72 4.08 13.81 NA NA ...
## $ Health : num 2.01 9.51 10.73 NA 14.02 ...
## $ ArmedForces : int 323 9 317 NA NA 117 0 105 49 NA ...
## $ Internet : num 11.4 71.8 47.7 NA 98.9 14.3 76 75.8 69.7 97.2 ...
## $ Cell : num 67.4 123.7 111 NA 104.4 ...
## $ HIV : num NA 0.1 0.1 NA NA 1.9 NA 0.4 0.2 NA ...
## $ Hunger : num 30.3 5.5 4.7 NA NA 23.9 NA 3.8 4.3 NA ...
## $ Diabetes : num 9.6 10.1 6.7 NA 8 3.9 13.2 5.5 7.1 11.6 ...
## $ BirthRate : num 32.5 11.7 22.3 NA NA 41.3 16.1 17 13.1 11 ...
## $ DeathRate : num 6.6 7.5 4.8 NA NA 8.4 5.8 7.6 9.7 8.9 ...
## $ ElderlyPop : num 2.6 13.6 6.4 NA NA 2.5 7.2 11.3 11.4 13.6 ...
## $ LifeExpectancy: num 64 78.5 76.3 NA NA 61.8 76.5 76.7 74.8 76 ...
## $ FemaleLabor : num 50.3 55.9 16.4 NA NA 76.4 NA 57.1 55.8 NA ...
## $ Unemployment : num 1.5 13.9 12.1 NA NA 7.3 NA 9.5 17.7 NA ...
## $ Energy : int NA 808 1328 NA NA 545 NA 2030 1016 NA ...
## $ Electricity : int NA 2309 1363 NA NA 312 NA 3075 1962 NA ...
## $ Developed : int NA 1 1 NA NA 1 NA 2 1 NA ...
AllCountries <- read.csv("/Users/ShelsyChouakong/Downloads/AllCountries.csv")
linear <- lm(LifeExpectancy ~ GDP, data = AllCountries)
summary(linear)
##
## Call:
## lm(formula = LifeExpectancy ~ GDP, data = AllCountries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.352 -3.882 1.550 4.458 9.330
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.842e+01 5.415e-01 126.36 <2e-16 ***
## GDP 2.476e-04 2.141e-05 11.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.901 on 177 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.4304, Adjusted R-squared: 0.4272
## F-statistic: 133.7 on 1 and 177 DF, p-value: < 2.2e-16
-Based off the results the intercept is the predicted life expentancy if the GDP is at 0, the the slope is the change in life expectancy everytime it increases so a higher life expectancy is equal to a higher gdp. The R^2 value shows the differences in life expectancy with the different countries so if the R^2 value is higher then this can be a larger difference between countries.
AllCountries <- read.csv("/Users/ShelsyChouakong/Downloads/AllCountries.csv")
multi<-lm(LifeExpectancy ~ GDP + Health + Internet,
data = AllCountries)
summary(multi)
##
## Call:
## lm(formula = LifeExpectancy ~ GDP + Health + Internet, data = AllCountries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5662 -1.8227 0.4108 2.5422 9.4161
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.908e+01 8.149e-01 72.499 < 2e-16 ***
## GDP 2.367e-05 2.287e-05 1.035 0.302025
## Health 2.479e-01 6.619e-02 3.745 0.000247 ***
## Internet 1.903e-01 1.656e-02 11.490 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.104 on 169 degrees of freedom
## (44 observations deleted due to missingness)
## Multiple R-squared: 0.7213, Adjusted R-squared: 0.7164
## F-statistic: 145.8 on 3 and 169 DF, p-value: < 2.2e-16
-The coefficient in this case can show the change in life expectancy every time the government health spending goes up. The adjusted r^2 is also larger than the model from question 1 so this suggests that if healthcare and internet are in place this can improve conditions.
AllCountries <- read.csv("/Users/ShelsyChouakong/Downloads/AllCountries.csv")
model <- lm(LifeExpectancy ~ GDP, data = AllCountries)
plot(model$fitted.values, model$residuals,
main = "Residual Plot",
xlab = "Predicted Values",
ylab = "Residuals")
hist(model$residuals,
main = "History of Residuals",
xlab = "Residuals")
-In order to check the homoscedasticity I would compare the residuals
and the fitted values to see how the different points align and if
they’re consistent and in order to have an ideal outcome, this would
have to have no real pattern. In order to check the normality of
residuals, I use the plot to see if it looks skewed a certain way to see
if it lowers the reliability of the statistical conclusion.
#AllCountri
AllCountries <- read.csv("/Users/ShelsyChouakong/Downloads/AllCountries.csv")
model <- lm(LifeExpectancy ~ GDP + Health + Internet,
data = AllCountries)
rmse <-sqrt(mean(model$residuals^2))
rmse
## [1] 4.056417
-The RMSE is to show how big the size of prediction flaws are and they show us the predicted life expectancy value is. A larger residual can lead to less accuracy in my predictions so this would make me look into the education infrastructure in those countries to see how strong it is in contrast.
AllCountries <- read.csv("/Users/ShelsyChouakong/Downloads/AllCountries.csv")
hypo <-lm(CO2 ~ Energy + Electricity,
data= AllCountries)
summary(hypo)
##
## Call:
## lm(formula = CO2 ~ Energy + Electricity, data = AllCountries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.7559 -1.1406 -0.2020 0.7143 7.3751
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.998e-01 2.655e-01 3.012 0.00311 **
## Energy 3.122e-03 1.066e-04 29.290 < 2e-16 ***
## Electricity -7.044e-04 5.526e-05 -12.747 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.331 on 131 degrees of freedom
## (83 observations deleted due to missingness)
## Multiple R-squared: 0.899, Adjusted R-squared: 0.8974
## F-statistic: 582.8 on 2 and 131 DF, p-value: < 2.2e-16
cor(AllCountries$Energy,
AllCountries$Electricity)
## [1] NA
-The correlation between energy and electricity might affect the interpretation of the regression coefficients because it can make it difficult to differentiate between the two because they move almost the same. This can lead to the model not being being reliable.