I decided to use the temp_carbon dataset from the dslabs package to create a scatter plot. The temp_carbon dataset has recorded observations of the global, land, and ocean annual mean temperatures form 1880 to 2018. It also contains data on carbon emissions in millions of metric tons from 1751 to 2014. When the data is plotted on a scatter plot, it is revealed that temperature variables display a linear correlation with the carbon emission variable. The dataset has missing values for the temperature variables from 1751 to 1880. I will be using linear regression to make two predictions on the scatter plot. The first prediction will be the ocean temperatures for carbon emission levels from the years 1751 to 1880. The second prediction will be the ocean temperatures for carbon emission levels from 10,000MMT to 15,000MMT.
Upload required libraries:
library(dslabs)
## Warning: package 'dslabs' was built under R version 4.0.4
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'stringr' was built under R version 4.0.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(broom)
library(ggfortify)
## Warning: package 'ggfortify' was built under R version 4.0.4
The dataset consists of 5 numerical variables and 268 observations.
#Display a preview of six observations in the dataset
head(temp_carbon)
## year temp_anomaly land_anomaly ocean_anomaly carbon_emissions
## 1 1880 -0.11 -0.48 -0.01 236
## 2 1881 -0.08 -0.40 0.01 243
## 3 1882 -0.10 -0.48 0.00 256
## 4 1883 -0.18 -0.66 -0.04 272
## 5 1884 -0.26 -0.69 -0.14 275
## 6 1885 -0.25 -0.56 -0.17 277
#Display the structure of the dataset
str(temp_carbon)
## 'data.frame': 268 obs. of 5 variables:
## $ year : num 1880 1881 1882 1883 1884 ...
## $ temp_anomaly : num -0.11 -0.08 -0.1 -0.18 -0.26 -0.25 -0.24 -0.28 -0.13 -0.09 ...
## $ land_anomaly : num -0.48 -0.4 -0.48 -0.66 -0.69 -0.56 -0.51 -0.47 -0.41 -0.31 ...
## $ ocean_anomaly : num -0.01 0.01 0 -0.04 -0.14 -0.17 -0.17 -0.23 -0.05 -0.02 ...
## $ carbon_emissions: num 236 243 256 272 275 277 281 295 327 327 ...
Variable descriptions were copied from the temp_carbon dataset details section.
year: The year an observation was recorded. The years range from 1751 to 2018 in common era (CE).
temp_anomaly: Global annual mean temperature anomaly in degrees Celsius relative to the 20th century mean temperature. Temperatures were recorded from 1880 to 2018 and range from -0.45°C to 0.98°C.
land_anomaly: Annual mean temperature anomaly on land in degrees Celsius relative to the 20th century mean temperature. Temperatures were recorded from 1880 to 2018 and range from -0.69°C to 1.50°C.
ocean_anomaly: Annual mean temperature anomaly over the ocean in degrees Celsius relative to the 20th century mean temperature. Temperatures were recorded from 1880 to 2018 and range from -0.46°C to 0.79°C.
carbon_emissions: Annual carbon emissions in millions of metric tons of carbon. Emissions were recorded from 1751 to 2014 and range from 3MMT to 277MMT.
Correlation:
The correlation between carbon emissions and ocean temperature is 0.86. This signifies that the two variables have a strong positive correlation.
cor(temp_carbon$carbon_emissions, temp_carbon$ocean_anomaly, use = "complete.obs")
## [1] 0.8648896
Visualization of positive linear correlation:
temp_carbon %>%
ggplot(aes(x = carbon_emissions, y = ocean_anomaly )) +
#Create Scatter Plot
geom_point()+
#Create linear regression line
geom_smooth(method = "lm", se = FALSE)+
#label X-axis
xlab("Carbon Emissions (MMT)")+
#label Y-axis
ylab("Ocean Temperature (°C)")+
#Create title
ggtitle("Affects of Carbon Emissions on Ocean Annual Mean Temperature ")+
#Center title
theme(plot.title = element_text(hjust = 0.5))+
#Display the graph using theme_minimal
theme_minimal()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 133 rows containing non-finite values (stat_smooth).
## Warning: Removed 133 rows containing missing values (geom_point).
Linear Regression Model:
Since the data displays a linear correlation, a linear regression model will be helpful to create predictions.
#Create a dataset with observations to remove from the model dataset to improve model metrics
temp_carbon5 <- temp_carbon %>%
filter(year %in% c(1974,1976, 1975,1903,1971, 1904, 1908, 1909, 1910, 1911, 1940, 1941, 1942,1943,1944,1945))
#Conduct anti join to remove the observations from the temp_carbon dataset that match the temp_carbon5 dataset
temp_carbon3 <- temp_carbon %>%
anti_join(temp_carbon5)
## Joining, by = c("year", "temp_anomaly", "land_anomaly", "ocean_anomaly", "carbon_emissions")
#Create a linear regression model using ocean_anomaly as the response variable and carbon emission as the explanatory variable to predict ocean_anomaly temperatures from carbon emission levels
mdl<-lm(ocean_anomaly ~ carbon_emissions, data = temp_carbon3)
#Plot linear model metrics
autoplot(mdl, 1:3, nrow=3, ncol=1)
## Warning: `arrange_()` was deprecated in dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
Variance:
The r.squared variable was extracted from the glance function to observe the variance of the model. Eighty-eight percent of the variation in the ocean_anomly variable can be explained by variation in carbon emissions using the model.
#Variance of model
mdl%>%
glance()%>%
pull(r.squared)
## [1] 0.8774718
Tibbles used to conduct predictions:
Three tibbles were created. Two of the tibbles will be used to create a set values, which will in turn be used to create predictions. The third tibble will be used to plot the linear regression line on the graph.
#Create a dataframe with NA values for ocean_anomaly to isolate carbon emission levels without temperature readings.
t1<- temp_carbon %>%
#filter for observations with NA values for ocean_anomaly
filter(is.na(ocean_anomaly)) %>%
#select the carbon_emissions column
select(carbon_emissions) %>%
#isolate unique values
unique()
#Create a tibble to conduct the first prediction on a the subset carbon_emissions values of the t1 dataframe
explanatory_data1 <-tibble(
carbon_emissions = t1$carbon_emissions
)
#Create a tibble to conduct the second prediction on a sequence values for carbon emissions ranging from 10,000 to 15,000 by an interval of 2,500
explanatory_data2 <-tibble(
carbon_emissions = seq(10000,15000, 2500)
)
#Create a tibble to create linear regression line in plot by subsetting the carbon emission data from the temp_carbon dataset into a separate tibble
explanatory_data3 <-tibble(
carbon_emissions = temp_carbon$carbon_emissions)
Execute predictions:
The mutate and predict functions are used to generate a column of ocean_anomaly values in the tibbles generated in the previous step. Carbon emission values in the tibbles are being used as an input in the linear regression model to generate the ocean_anomaly values.
#Conduct prediction of ocean_anomaly temperatures using the values in explanatory_data1 with linear regression model
prediction1 <- explanatory_data1 %>%
mutate(ocean_anomaly = predict(mdl,explanatory_data1))
#Conduct prediction of ocean_anomaly temperatures using the values in explanatory_data2 with the linear regression model
prediction2 <- explanatory_data2 %>%
mutate(ocean_anomaly = predict(mdl,explanatory_data2))
#Conduct prediction of ocean_anomaly temperatures using the values in explanatory_data3 with the linear regression model to plot on the graph
prediction3 <- explanatory_data3 %>%
mutate(ocean_anomaly = predict(mdl,explanatory_data3))
I created a scatter plot using the carbon_emission variable as the explanatory variable and the ocean_anonmly variable as the response variable. To depict global temperature, the temp_anomly variable was used to color the points on a continuous gradient using the inferno palette in viridis. The constructed prediction models were used to plot a linear regression line and the prediction points are symbolized by asterisks.
Prediction 1:
The graph is displaying the predictions of the ocean annual average temperature for the carbon emissions levels that didn’t have a recorded ocean temperature.
temp_carbon %>%
ggplot(aes(x = carbon_emissions, y = ocean_anomaly )) +
#Color the points based off the temp_anomaly
geom_point(aes(color = temp_anomaly))+
#Plot the ocean_anomly points predicted in prediction2
geom_point(data = prediction1, col = "red", shape = 8, size = 3, alpha = .5) +
#Plot model linear regression line
geom_smooth(data = prediction3, method = "lm", se = FALSE, size = 0.3)+
#Display the graph using theme_minimal
theme_minimal()+
#Color the points based off of the temp_anomly temperatures and label the legend
scale_color_viridis_c(name = "Global Temperature (°C)", option = "inferno")+
#label X-axis
xlab("Carbon Emission (MMT)")+
#label Y-axis
ylab("Ocean Temperature (°C)")+
#label Title
labs(title = "Prediction of Ocean Annual Mean Temperature
by Annual Carbon Emissions below 300MMT", caption = "Source: NOAA and Boden, T.A., G. Marland, and R.J. Andres (2017) via CDIAC")+
#Center title
theme(plot.title = element_text(hjust = 0.5))+
#Position the legend at the bottom of the graph
theme(legend.position="bottom")+
#limit X-axis
xlim(0,300)+
#limit Y-axis
ylim(-0.5, 0)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 131 rows containing non-finite values (stat_smooth).
## Warning: Removed 261 rows containing missing values (geom_point).
Prediction 2:
The graph is displaying the predictions of the ocean annual average temperature if the carbon emissions levels increased to 15,000MMT.
temp_carbon %>%
ggplot(aes(x = carbon_emissions, y = ocean_anomaly )) +
#Color the points based off the temp_anomaly
geom_point(aes(color = temp_anomaly))+
#Plot the ocean_anomly points predicted in prediction2
geom_point(data = prediction2, col = "red", shape = 8, size = 3) +
#Plot model linear regression line
geom_smooth(data = prediction3, method = "lm", se = FALSE)+
#Display the graph using theme_minimal
theme_minimal()+
#Color the points based off of the temp_anomly temperatures and label the legend
scale_color_viridis_c(name = "Global Temperature (°C)", option = "inferno")+
#label X-axis
xlab("Carbon Emissions (MMT)")+
#label Y-axis
ylab("Ocean Temperature (°C)")+
#label Title
labs(title = "Prediction of Ocean Annual Mean Temperature
by the Rise of Annual Carbon Emissions", caption = "Source: NOAA and Boden, T.A., G. Marland, and R.J. Andres (2017) via CDIAC")+
#Center title
theme(plot.title = element_text(hjust = 0.5))+
#Position the legend at the bottom of the graph
theme(legend.position="bottom")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 133 rows containing missing values (geom_point).
NOAA and Boden, T.A., G. Marland, and R.J. Andres (2017) via CDIAC