In 1966, the World Meteorological Organization (WMO) put forth the term “climatic change” to refer to climatic variability on time-scales longer than ten years, regardless of the cause for such change. During the next decade, scientists began to suspect that human activities had the potential to drastically alter the global climate in ways that would have negative impacts on our environment. The term evolved into “climate change” and is now used to describe both the process of change and the perceived problem. Sometimes the term “global warming” is used, though in many ways this fails to adequately describe the variability in impact, since climate change can cause both hot and cold extremes in weather. Anthropogenic climate change is change that is caused by human activity, as opposed to the Earth’s natural processes. However, in the context of environmental policy, the term “climate change” is often used to mean anthropogenic climate change.
Mauna Loa Observatory is a world-renowned atmospheric research facility. It has been continuously monitoring and collecting data since the 1950’s and its remote location makes it very well-suited for monitoring atmospheric components that can contribute to climate change, including the heat-trapping greenhouse gas carbon dioxide (CO2). Carbon overload from burning fossil fuels and deforestation is cited as the primary cause of anthropogenic climate change by proponents of such theories, while opponents assert that natural process (such as photosynthesis) contribute more to atmospheric CO2 than humans and observed changes are simply Earth’s cycle.
Create your own version of the plot found here. Do not replicate it, but rather design your own. Use one of the themes found in the ggplot2
or ggthemes
packages. You are encouraged to make style adjustments to help you informatively display the data.
CO2_monthly2015 <- co2_monthly %>% filter(year >= 2015)
ggplot(CO2_monthly2015) +
geom_line(aes(x = date, y = mean_co2), col = "yellow", linetype = 7) +
geom_point(aes(x = date, y = mean_co2), col = "blue", shape = 7) +
geom_line(aes(x = date, y = trend_mean_co2), col = "black") +
geom_point(aes(x = date, y = trend_mean_co2), col = "black", shape = 8) +
scale_x_continuous(breaks = seq(2015, 2020, .25),
labels = c("2015", rep("", 3),
"2016", rep("", 3),
"2017", rep("", 3),
"2018", rep("", 3),
"2019", rep("", 3),
"2020"),
limits = c(2015, 2020)) +
scale_y_continuous(breaks = 395:415,
labels = c("395", rep("", 4),
"400", rep("", 4),
"405", rep("", 4),
"410", rep("", 4),
"415"),
limits = c(395, 415)) +
ggtitle(expression("RECENT AVERAGE MONTHLY CO"[2]*" LEVELS AT MAUNA LOA")) +
ylab("PARTS PER MILLION") +
xlab("YEAR") +
theme_classic()
An atmospheric CO2 level of 400 ppm is considered by many to be a symbolic threshold with regard to climate change. “In the centuries to come, history books will likely look back on September 2016 as a major milestone for the world’s climate. At a time when atmospheric carbon dioxide is usually at its minimum, the monthly value failed to drop below 400 parts per million.” (source)
Adapt your plot above to include a red dashed line at 400 ppm and a large red dot on September 2016, with appropriate annotations to indicate what these additions represent.
ggplot(CO2_monthly2015) +
geom_line(aes(x = date, y = mean_co2), col = "yellow", linetype = 7) +
geom_point(aes(x = date, y = mean_co2), col = "blue", shape = 7) +
geom_line(aes(x = date, y = trend_mean_co2), col = "black") +
geom_point(aes(x = date, y = trend_mean_co2), col = "black", shape = 8) +
scale_x_continuous(breaks = seq(2015, 2020, .25),
labels = c("2015", rep("", 3),
"2016", rep("", 3),
"2017", rep("", 3),
"2018", rep("", 3),
"2019", rep("", 3),
"2020"),
limits = c(2015, 2020)) +
scale_y_continuous(breaks = 395:415,
labels = c("395", rep("", 4),
"400", rep("", 4),
"405", rep("", 4),
"410", rep("", 4),
"415"),
limits = c(395, 415)) +
ggtitle(expression("RECENT AVERAGE MONTHLY CO"[2]*" LEVELS AT MAUNA LOA")) +
ylab("PARTS PER MILLION") +
xlab("YEAR") +
theme_classic() + geom_hline(yintercept = 400, color = "red", linetype = "dashed") +
geom_point(data = filter(co2_monthly, year == 2016 & month == 9), aes(x = date, y = mean_co2), colour = 'red', size = 4.5) + geom_label(x= 2019.8, y = 401, label="400ppm") +
geom_label(x= 2018.2, y = 397, label = "September 2016 - Yearly minimum surpasses 400ppm")
Consider the full Mauna Loa CO2 record found here. The overall trend is not linear, but segments of it may be piecewise linear. Filter to remove the incomplete decades 1950s and 2010s and create a scatterplot that shows the interpolated CO2 values with a fitted linear model for each remaining decade. Do not include standard error bands.
co2_nod <- co2_monthly %>%
filter(decade != "1950s" & decade != "2010s")
ggplot(co2_nod, aes(x = date, y = int_mean_co2), col = "black") + geom_point(size = .1) + geom_smooth(aes(color = decade), method = "lm", se = FALSE) +
labs(title = expression("Atmospheric CO"[2]*" at Mauna Loa Observatory (1960-2010)"),
y = "PARTS PER MILLION", x = "YEAR")
Replicate as closely as possible the annual mean plot found here. Hint: It uses a ggplot
theme for some of the formatting.
ggplot(co2_annual) +
geom_bar(aes(y = mean_co2, x = year), stat = "identity", fill = "light blue", width = .7) +
geom_smooth(method = "loess", aes(x = year, y = mean_co2)) +
geom_hline(yintercept = 400, color = "red") +
annotate("label", x = 1988, y = 400, label = "crisis threshold") +
geom_hline(yintercept = 280, color = "black") +
annotate("label", x = 1988, y = 280, label = "pre-industrial mean") +
geom_hline(yintercept = 200, color = "black") +
annotate("label", x = 1988, y = 200, label = "ice age mean") +
labs(title = expression("Annual Mean Atmospheric CO"[2]*" at Mauna Loa Observatory"),
subtitle = "with loess smoothed trend curve and estimated historical reference values",
y = expression("CO"[2]*" (ppm)"), x = "measurement year") + scale_y_continuous(breaks = seq(0, 400, 50)) +
scale_x_continuous(breaks = seq(1960, 2020, 5)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
In what way could these visualizations be used to support the theory of anthropogenic climate change?
ANSWER: Yes, these visualizations could be used to support climate change. It shows that CO2 levels have been increasing for a while and have still continued to do so through recent years.
Why are data such as these considered evidence rather than proof of anthropogenic climate change?
ANSWER: The data shows that there is an increase in CO2 but does not show that this cause is because of humans. It also is a small sample size as it shows just one place out of the entire earth. Therefore it does not reflect proof.
Are people generally happy with their heights? If not, how tall do they want to be? Dr. Thomley’s anthropometric dataset contains measurements of students’ heights and their self-selected ideal heights. You will fit a parallel slopes model to predict ideal height using measured height and gender, then interpret the results of your model.
Filter the dataset to include only students who self-identified as male or female (there are not enough data points in the other categories to create a model for them). Perform EDA to determine whether you need to perform any transformations or remove any data points before you fit your model. Create your modeling dataset and call it anthro_mod
.
summary(anthro)
gender ideal height armspan
Length:547 Min. : 45.00 Min. :59.75 Min. :50.00
Class :character 1st Qu.: 66.00 1st Qu.:65.00 1st Qu.:64.00
Mode :character Median : 70.00 Median :68.00 Median :67.88
Mean : 70.03 Mean :68.09 Mean :67.91
3rd Qu.: 74.00 3rd Qu.:71.50 3rd Qu.:72.00
Max. :100.00 Max. :78.00 Max. :81.00
NA's :5 NA's :1 NA's :5
forearm hand leg foot
Min. : 9.00 Min. :4.000 Min. :12.00 Min. : 6.50
1st Qu.:16.43 1st Qu.:7.000 1st Qu.:17.21 1st Qu.: 9.00
Median :17.50 Median :7.250 Median :18.50 Median :10.00
Mean :17.47 Mean :7.365 Mean :18.70 Mean :10.07
3rd Qu.:18.50 3rd Qu.:8.000 3rd Qu.:20.00 3rd Qu.:11.00
Max. :24.50 Max. :9.000 Max. :27.00 Max. :15.00
NA's :1 NA's :1 NA's :3 NA's :4
semester
Length:547
Class :character
Mode :character
anthro_mod <- anthro %>%
filter(gender == "female" | gender == 'male') %>%
filter(!is.na(armspan) & !is.na(height) & !is.na(ideal) & !is.na(forearm) & !is.na(hand) & !is.na(leg) & !is.na(foot)) %>%
filter(ideal < 90 & ideal > 55)
ggplot(anthro_mod, aes(x = height, y = ideal)) + geom_point()
Create a scatterplot of ideal height versus measured height showing separate fitted linear models for males and females. Then fit a parallel slopes model with measured height and gender as predictors and save it as ideal_model
. Display its summary.
ggplot(anthro_mod, aes(x = height, y = ideal, color = gender)) + geom_smooth(method = "lm", se = FALSE) + geom_point() + labs(title = "Ideal Height vs Measured Height for Males and Females", x = "Measured Height", y = "Ideal Height")
ideal_model <- lm(ideal ~ height + gender, data = anthro_mod)
summary(ideal_model)
Call:
lm(formula = ideal ~ height + gender, data = anthro_mod)
Residuals:
Min 1Q Median 3Q Max
-5.5876 -1.3481 -0.1123 1.1468 11.1271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.14842 2.38834 14.3 <2e-16 ***
height 0.49175 0.03669 13.4 <2e-16 ***
gendermale 4.79353 0.29954 16.0 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.257 on 524 degrees of freedom
Multiple R-squared: 0.7708, Adjusted R-squared: 0.7699
F-statistic: 881.1 on 2 and 524 DF, p-value: < 2.2e-16
rp <- get_regression_points(ideal_model)
Create a residual scatterplot and histogram for your model.
ggplot(rp, aes(x = height, y = residual)) +
geom_point(color = "blue") + geom_hline(yintercept = 0, col = "yellow", size = .8) +
labs(title = "Ideal Model Residual Points", x = "Height", y = "Residual")
ggplot(rp, aes(x = residual)) +
geom_histogram(binwidth = .5, color = "blue", density = 20) +
labs(title = "Distribution of Residuals for Ideal Model", x = "Count", y = "Residual")
Create a dataset containing heights at one-inch intervals from 60 to 80 for each gender. Use your parallel slopes model to predict the ideal heights for these values. Use mutate
to create a new variable in your results tibble
that shows whether the ideal height is less than, equal to, or greater than height. The case_when
function may be useful here. Display the results.
hi <- c(60:80)
he <- c(60:80)
m <- c(rep.int("male",21))
f <- c(rep.int("female",21))
height2 <- tibble(height = c(hi,he),
gender = c(m,f))
height2_model <- get_regression_points(ideal_model, newdata = height2)
height2_mut <- height2_model %>%
mutate(comparison = case_when(height >= ideal_hat ~ "less than", height == ideal_hat ~ "equal to", height <= ideal_hat ~ "greater than"))
height2_mut
# A tibble: 42 x 5
ID height gender ideal_hat comparison
<int> <dbl> <chr> <dbl> <chr>
1 1 60 male 68.4 greater than
2 2 61 male 68.9 greater than
3 3 62 male 69.4 greater than
4 4 63 male 69.9 greater than
5 5 64 male 70.4 greater than
6 6 65 male 70.9 greater than
7 7 66 male 71.4 greater than
8 8 67 male 71.9 greater than
9 9 68 male 72.4 greater than
10 10 69 male 72.9 greater than
# … with 32 more rows
Create a plot that shows the same fitted lines for males and females as your scatterplot (but without points), as well as an annotated line indicating the relationship ideal height = measured height. Format this line in some way other than the default (e.g., color, style).
ggplot(anthro_mod, aes(x = height, y = ideal, color = gender)) + geom_smooth(method = "lm", se = FALSE) +
geom_line(data = filter(anthro_mod, height == ideal), color = "black", linetype = "dashed") +
annotate("text", x = 70, y = 71, angle = 41.5, label = "Ideal Height = Measured height") +
labs(title = "Linear Models of Ideal Height vs. Measured Height (Male and Female", subtitle = "Line where ideal height equals measured height for reference", x = "Measured Height", y = "Ideal Height")
Explain your rationale for any transformations or deletions you chose to make in the dataset.
ANSWER: I filtered out anything that was not Male or Female since the instructions stated to do so. I also filtered out NA’s because they may interfere with some calls. I also noticed that there were two outliers so I filtered out any numbers over 90 or under 55.
Does the model seem appropriate for the data? Be sure to include discussion of the residuals.
ANSWER: Yes, the r-squared value is around .77 which is good. Males are taller than females which is normal. The p-value is also very low so with that and our r-squared we can conclude this model is appropriate.
Do the people in this sample generally seem to be happy with their heights or do their ideal heights differ? Do males and females seem to have the same attitudes regarding what is an ideal height? What group patterns do you notice? Discuss.
ANSWER: No, if everyone had their ideal height the model would yield a slope of 1. The model shows us that most shorter girls want to be taller by a few inches and most taller girls want to be shorter. Most shorter guys would prefer to be taller by a lot (about 6 inches) and most taller guys are pretty content with their height.