1 Question 1

We wish to investigate the relationship between electricity consumption and the gross domestic product (GDP) for countries of the world. GDP is an indicator of a country’s economic performance adjusted for purchasing power parities to account for between-country differences in price levels. Information was obtained for a selection of 26 of the most populous countries in the world.

The data is stored in the file electricity.csv and contains the variables:

Variable Description
Electricity electricity consumption (in billions of kilowatt-hours),
GDP gross domestic product (GDP) in billions of dollars (US),
Country name of the country.

1.1 Question of interest/goal of the study

We are interested in using a country’s gross domestic product to predict the amount of electricity that they use.

1.2 Read in and inspect the data:

elec.df<-read.csv("electricity.csv")
plot(Electricity~GDP, data=elec.df,xlab = "GDP (Billions of Dollars US)", ylab = "Electricity Consumption (in billions of kilowatt-hours)")

plot(Electricity~GDP, data=elec.df[elec.df$GDP<6000,],xlab = "GDP (Billions of Dollars US)", ylab = "Electricity Consumption (in billions of kilowatt-hours)")

1.3 Comment on the plots

We are examining the linear relationship between a country’s GDP and its electricity consumption. Productivity is related to electricity usage, with larger economies consuming more electricity. On the x axis, we represent productivity, and on the y axis, we represent electricity consumption.

In Plot One, a linear relationship is evident, with two outliers that might be exceptions. Plot Two shows more scatter however the data is filtered for GDP less than 6000 removing the outliers.The linear relationship still appears as a straight line, indicating a trend where GDP and energy consumption are correlated.

We observe an upwards sloping line, signifying a positive relationship, with a slope of 0.2 (800/4000). As GDP increases, electricity consumption increases proportionally. This positive linear relationship indicates that higher energy consumption is associated with higher GDP. Although there is some scatter in the second plot, it still shows that higher productivity correlates with higher GDP. The positive slope of 0.2 demonstrates that as GDP increases, electricity consumption also rises proportionally, suggesting that energy consumption drives GDP growth.

1.4 Fit an appropriate linear model, including model checks and relevant output.

elecfit1.lm=lm(Electricity~GDP,data=elec.df)
cooks20x(elecfit1.lm)

elec.df[elec.df$GDP>6000,]
##         Country Electricity   GDP
## 4         China        3438  9872
## 27 UnitedStates        3873 14720
elecfit2.lm=lm(Electricity~GDP,data=elec.df[elec.df$GDP<6000,])
     
modelcheck(elecfit2.lm)

summary(elecfit2.lm)
## 
## Call:
## lm(formula = Electricity ~ GDP, data = elec.df[elec.df$GDP < 
##     6000, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -115.16  -22.56  -11.25   29.08  122.43 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.05155   15.28109   0.134    0.894    
## GDP          0.18917    0.01041  18.170 1.56e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.64 on 24 degrees of freedom
## Multiple R-squared:  0.9322, Adjusted R-squared:  0.9294 
## F-statistic: 330.2 on 1 and 24 DF,  p-value: 1.561e-15
confint(elecfit2.lm)
##                   2.5 %     97.5 %
## (Intercept) -29.4870645 33.5901674
## GDP           0.1676863  0.2106611

1.5 Create a scatter plot with the fitted line from your model superimposed over it.

plot(Electricity~GDP, data=elec.df[elec.df$GDP<6000,],xlab = "GDP (Billions of Dollars US)", ylab = "Electricity Consumption (in billions of kilowatt-hours)")

# Add some code here

abline(elecfit2.lm)

1.6 Method and Assumption Checks

Since we have a linear relationship between GDP and electricity consumption, we have fitted a simple linear regression model to our data. We have 28 of the most populous countries, but have no information on how these were obtained. As the method of sampling is not detailed, there could be doubts about independence. These are likely to be minor, with a bigger concern being how representative the data is of a wider group of countries. The initial residuals and Cooks plot showed two distinct outliers (USA and China) who had vastly higher GDP than all other countries and therefore could be following a totally different pattern so we limited our analysis to countries with GDP under 6000 (billion dollars). After this, the residuals show patternless scatter with fairly constant variability - so no problems. The normality checks don’t show any major problems (slightly long tails, if anything) and the Cook’s plot doesn’t reveal any further unduly influential points. Overall, all the model assumptions are satisfied.

1.6.1 Complete the equation below:

Our model is:

\(Electricity_i = \beta_0 + \beta_1 \times GDP_i + \epsilon_i\) where \(\epsilon_i \sim iid ~ N(0,\sigma^2)\)

\(Electricity_i = 2.05155 + 0.18917 \times GDP_i + \epsilon_i\) where \(\epsilon_i \sim iid ~ N(0,\sigma^2)\)

1.6.2 Complete the statement

Our fitted model explains 93.2% of the variability in the data.

1.7 Executive Summary

We are interested in the relationship between electricity consumption and gross domestic product (GDP) for countries.

We restricted our analysis to countries with GDP less than 6,000 billion dollars. We are interested in the relationship between the energy consumed and the GDP of countries.

There is a strong linear relationship between the energy consumed and the GDP of countries, this can be shown by the P value of P< 0.05. There is an increase in GDP between 0.1676863 and 0.2106611.