We wish to investigate the relationship between electricity consumption and the gross domestic product (GDP) for countries of the world. GDP is an indicator of a country’s economic performance adjusted for purchasing power parities to account for between-country differences in price levels. Information was obtained for a selection of 26 of the most populous countries in the world.
The data is stored in the file electricity.csv and contains the variables:
| Variable | Description |
|---|---|
| Electricity | electricity consumption (in billions of kilowatt-hours), |
| GDP | gross domestic product (GDP) in billions of dollars (US), |
| Country | name of the country. |
We are interested in using a country’s gross domestic product to predict the amount of electricity that they use.
elec.df<-read.csv("electricity.csv")
plot(Electricity~GDP, data=elec.df,xlab = "GDP (Billions of Dollars US)", ylab = "Electricity Consumption (in billions of kilowatt-hours)")
plot(Electricity~GDP, data=elec.df[elec.df$GDP<6000,],xlab = "GDP (Billions of Dollars US)", ylab = "Electricity Consumption (in billions of kilowatt-hours)")
elecfit1.lm=lm(Electricity~GDP,data=elec.df)
cooks20x(elecfit1.lm)
elec.df[elec.df$GDP>6000,]
## Country Electricity GDP
## 4 China 3438 9872
## 27 UnitedStates 3873 14720
elecfit2.lm=lm(Electricity~GDP,data=elec.df[elec.df$GDP<6000,])
modelcheck(elecfit2.lm)
summary(elecfit2.lm)
##
## Call:
## lm(formula = Electricity ~ GDP, data = elec.df[elec.df$GDP <
## 6000, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -115.16 -22.56 -11.25 29.08 122.43
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.05155 15.28109 0.134 0.894
## GDP 0.18917 0.01041 18.170 1.56e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.64 on 24 degrees of freedom
## Multiple R-squared: 0.9322, Adjusted R-squared: 0.9294
## F-statistic: 330.2 on 1 and 24 DF, p-value: 1.561e-15
confint(elecfit2.lm)
## 2.5 % 97.5 %
## (Intercept) -29.4870645 33.5901674
## GDP 0.1676863 0.2106611
confint(elecfit2.lm)*100
## 2.5 % 97.5 %
## (Intercept) -2948.70645 3359.01674
## GDP 16.76863 21.06611
plot(Electricity~GDP, data=elec.df[elec.df$GDP<6000,],xlab = "GDP (Billions of Dollars US)", ylab = "Electricity Consumption (in billions of kilowatt-hours)")
abline(elecfit2.lm)
Since we have a linear relationship between GDP and electricity consumption, we have fitted a simple linear regression model to our data. We have 28 of the most populous countries, but have no information on how these were obtained. As the method of sampling is not detailed, there could be doubts about independence. These are likely to be minor, with a bigger concern being how representative the data is of a wider group of countries. The initial residuals and Cooks plot showed two distinct outliers (USA and China) who had vastly higher GDP than all other countries and therefore could be following a totally different pattern so we limited our analysis to countries with GDP under 6000 (billion dollars). After this, the residuals show patternless scatter with fairly constant variability - so no problems. The normality checks don’t show any major problems (slightly long tails, if anything) and the Cook’s plot doesn’t reveal any further unduly influential points. Overall, all the model assumptions are satisfied.
Our model is:
\(Electrcity_i = \beta_0 + \beta_1 \times GDP_i + \epsilon_i\) where \(\epsilon_i \sim iid ~ N(0,\sigma^2)\)
Our fitted model explains 93.2% of the variability in the data.
It was of interest to see if there is a relationship between electricity consumption and gross domestic product (GDP) for countries.
We restricted our analysis to countries with GDP less than 6,000 billion dollars.
The relationship between GDP and electricity consumption has been analysed, and it shows that there is a positive liner relationship. I was able to fit a line in the data plots so I could see more clearly that there was a positive relationship. We had a p-value of 1.561e-15, giving us strong evidence that there is a significant relationship between the two variables.
For every 100 billion dollar increase in GDP, the electricity consumption will increase by 16.77 to 21.07. This range provides us with a practical estimate of how economic growth translates into increased energy consumption, and how that there is a positive relationship between the two variables as it’s a positive number
1.3 Comment on the plots
The first plot shows a positive linear relationship between GDP and electricity consumption. There is constant scatter and two outliers, most of the data points are condensed together near the bottom of the x-axis under the 5000 GDP mark. The second plot also shows a positive linear relationship between GDP and electricity consumption, there is constant scatter but this time only one outlier. The second plot shows us more clearly the positive linear relationship between GDP and electricity consumption because the data points are not as condensed. I can see that both data plots could fit a straight line. I can also see from these two plots that as GDP increases, the electricity consumption will increase as well.