For this discussion post I decided to analyzed data from World Bank’s Data Bank. My downloaded data was upload to my github account Gitbub repository.
GDP <- read.csv('https://raw.githubusercontent.com/jnaval88/DATA605/main/Week11/Discussion11-GDP_Birth_Rate.csv')
GDP$X2019..YR2019. = as.numeric(GDP$X2019..YR2019)
## Warning: NAs introduced by coercion
GDP = GDP %>%
pivot_wider(names_from = "Series.Name" , values_from = "X2019..YR2019." )
For this section I will plot the birth rate
ggplot(data = GDP, aes(x =`GDP per capita (current US$)` , y = `Birth rate, crude (per 1,000 people)`)) +
geom_point()
## Warning: Removed 28 rows containing missing values (`geom_point()`).
Since the data is very big, plotting a big data can’t give a visual, to be a better view of the data from the plot I will a take a log transformation which will make it clearer.
ggplot(data = GDP, aes(x =`GDP per capita (current US$)` , y = `Birth rate, crude (per 1,000 people)`)) +
geom_point() +
scale_x_log10() + scale_y_log10()
## Warning: Removed 28 rows containing missing values (`geom_point()`).
Now I will perform some linear regression model of the GDP data.
GDP_LM = lm( log1p(`Birth rate, crude (per 1,000 people)`) ~ log1p(`GDP per capita (current US$)`), data = GDP)
summary(GDP_LM)
##
## Call:
## lm(formula = log1p(`Birth rate, crude (per 1,000 people)`) ~
## log1p(`GDP per capita (current US$)`), data = GDP)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.87441 -0.17155 0.02437 0.18867 0.67440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.42074 0.10510 51.58 <2e-16 ***
## log1p(`GDP per capita (current US$)`) -0.28491 0.01185 -24.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2548 on 236 degrees of freedom
## (28 observations deleted due to missingness)
## Multiple R-squared: 0.7102, Adjusted R-squared: 0.709
## F-statistic: 578.4 on 1 and 236 DF, p-value: < 2.2e-16
Now I will plot the out come of the linear regression model.
plot(GDP_LM)
After plotting the linear regression model, I can conclude that The residuals vs fitted plot appears to have constant variability, and the QQ plot would indicate that the residuals are somewhat normally distributed.
gvlma(GDP_LM)
##
## Call:
## lm(formula = log1p(`Birth rate, crude (per 1,000 people)`) ~
## log1p(`GDP per capita (current US$)`), data = GDP)
##
## Coefficients:
## (Intercept) log1p(`GDP per capita (current US$)`)
## 5.4207 -0.2849
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = GDP_LM)
##
## Value p-value Decision
## Global Stat 15.1020 0.004494 Assumptions NOT satisfied!
## Skewness 3.5186 0.060682 Assumptions acceptable.
## Kurtosis 0.5489 0.458779 Assumptions acceptable.
## Link Function 7.0409 0.007967 Assumptions NOT satisfied!
## Heteroscedasticity 3.9936 0.045673 Assumptions NOT satisfied!