ES300: Climate Toolbox Data for PDX Airport
Intro
This lab will help you explore how temperatures in Portland are projected to change over the coming decades under different greenhouse gas emissions scenarios. We’ll download daily historical temperature records alongside two future climate projections from The Climate Toolbox Project. After loading and cleaning the data, we’ll visualize temperature trends and you’ll begin to examine how the frequency of extreme heat and cold events may change in the future.
Finding the Data
To find historical and projected future climate data for Portland we will use the Data Download tool from The Climate Toolbox Project. This tool provides access to both observed historical data and modeled future projections under different greenhouse gas emissions scenarios.
Go to the website and under Select Location, choose Point Location. Enter the address for the Portland Internation Airport (7000 NE Airport Way, Portland, OR 97218) and choose Set Location. This is the same location that is being used for the NOAA dataset, which will let us compare the two datasets later.
We will get three separate downloads:
Download 1: Historical Temperature (1979–current)
Under Prodeuct, select the Historical Climate Daily dataset. Collect data for the entire date range. Choose 3 for the number of columns in the data and select minimum and maximum daily temperature as your variables.
Download 2: Future Projected Temperature - Moderate Emissions
Leave the location the same but select Future Climate Projections. Choose the RCP 4.5 model. Choose the same variables as previously. Select CCSM4 as the climate model (the third drop down under the variables). The RCP 4.5 is a moderate emissions scenario where emissions peak around 2040 then decline.
Download 3: Future Projected Temperature - High Emissions
Leave the location the same but select Future Climate Projections. Choose the RCP 8.5 model. Choose the same variables as previously. Select CCSM4 as the climate model (the third drop down under the variables). The RCP 8.5 is a high emissions scenario where emissions continue rising through the century.
Download each scenario separately. You should end up with three CSV files total.
Organize Your Files
Don’t leave these files in your downloads folder or with uninformative names. Move them into your project folder and give them clear names, for example:
climate_historical.csvclimate_rcp45.csvclimate_rcp85.csv
For more on file structures and why they are important see this page on file organization.
Manual Cleanup
These files all have a header portion that will interfere with loading them in R. Open them in a spreadsheet reader (Excel, Google Sheets, Numbers) and delete everything up until the variable titles.
Save the files with this correction.
Load Data Into RStudio
Open RStudio either from a desktop version or by going to Reed’s R server. From the main menu choose File → New Project. If you have a folder you’d like this project to be in choose Existing Directory, otherwise choose New Directory. Choose the plain R New Project, give it a name, and create the project.
If you are using the server, upload all three CSV files using the Files tab in the lower right panel — click Upload, then Choose File, and repeat for each file.
Now we’ll load the libraries and read in the data. Load {tidyverse} first:
library(tidyverse)Then read in all three files:
hist_raw <- read_csv("climate_historical.csv")
rcp45_raw <- read_csv("climate_rcp45.csv")
rcp85_raw <- read_csv("climate_rcp85.csv")Wrangling the Data
To view the datasets click on their names in the Variable Environment and they will open in a spreadsheet view. You can also run glimpse() on each dataset to see what columns you have:
glimpse(hist_raw)
glimpse(rcp45_raw)
glimpse(rcp85_raw)You’ll notice the three files have similar structures. We can use the function rename() to give the temp columns shorter, easier to understand names. We’ll also add two columns using the mutate() function.
- Add a single date column by using the function
make_date()from the package{lubridate}which loads when we load{tidyverse}. We give that function the three columns we want to combine into a date. - Add a column called
scenariowhich we’ll use to know the datasets origin. For the historical data,scenariowill be"historical". Setting a column equal to a single string makes every row have that value.
hist_data <- hist_raw |>
rename(tmin = `tmmn(degF)`,
tmax = `tmmx(degF)`) |>
mutate(date = make_date(year = Year, month = Month, day = Day)) |>
mutate(scenario = "historical")Copy the code above and change it so that it accomplishes the same thing for the two future scenario datasets. scenario should be either "rcp45" or "rcp85". You will need to use glimpse() to ensure that you are entering your column names exactly as they appear in the table.
rcp45_data <- rcp45_raw |>
rename(tmin = `tasmin-CCSM4(degF)`,
tmax = `tasmax-CCSM4(degF)`) |>
mutate(date = make_date(year = Year, month = Month, day = Day)) |>
mutate(scenario = "rcp45")
rcp85_data <- rcp85_raw |>
rename(tmin = `tasmin-CCSM4(degF)`,
tmax = `tasmax-CCSM4(degF)`) |>
mutate(date = make_date(year = Year, month = Month, day = Day)) |>
mutate(scenario = "rcp85")Now we’ll combine all three datasets into one using bind_rows(), which stacks them on top of each other:
climate_data <- bind_rows(hist_data, rcp45_data, rcp85_data)You can run glimpse() to see that the date column is being treated as a date type <date> and the other columns are numbers <dbl> (stands for double, which just means numeric). The scenario column is a character <chr>.
glimpse(climate_data)Visualizing the Data
We can now graph the temperature data over time. We’ll use {ggplot2}, which was loaded with {tidyverse}.
This package has a simple, common syntax for any type of graph:
ggplot(data = _______, aes(x = _______, y = _______) +
geom_TYPEOFPLOT()Let’s parse that out:
ggplot()is the base command that makes graphs, it comes from the{ggplot2}packagedata= the data set we want to useaes()stands for aesthetics, it’s where you tell R what you want to be on the graphx =is where you name your x variabley =is where you name your y variablethe line ends with a
+showing you that the code continues on the next linegeom_TYPEOFPLOT()is where you specify what kind of graph you want to make, the most popular options are:geom_point()geom_line()geom_col()orgeom_bar()geom_histogram()geom_boxplot()
There are many other add-ons, but just those two lines of code will get you started for most types of graphs.
Let’s start with a simple line graph of daily maximum temperature just for historical data.
Fill in the blanks to plot date on the x-axis, tmin on the y-axis for our hist_data.
ggplot(data = climate_data, aes(x = _____, y = _____)) +
geom_line()ggplot(data = hist_data, aes(x = date, y = tmax)) +
geom_line()Making Prettier Graphs
The graph shows the data, but we can make a number of improvements to make our graph easier to understand and more professional.
Start by using the labs() command to specify the axis labels. To do this, add a + and another line of code that specifies the labels for each axis.
Add labels to your axes by putting the name inside the quotation marks.
ggplot(data = hist_data, aes(x = date, y = tmax)) +
geom_line() +
labs(x = "_____",
y = "_____")We can also use color to add information to the graph. This time we’ll graph our entire combined dataset but use the option color to color the data by the value within the scenario column.
ggplot(data = climate_data, aes(x = date, y = tmax, color = scenario)) +
geom_line() There’s a lot of overlap here, so we’d need to do some work to make this graph easily interpretable, but you can see that there are three different colors being displayed.
One thing we can do to understand the projected temperatures for our different models is add a trend line to the data. We do this by adding the function geom_smooth() and we tell it that we’d like the method it should use to make the line is a linear model (lm), so that we create a straight projection line.
ggplot(data = climate_data, aes(x = date, y = tmax, color = scenario)) +
geom_line() +
geom_smooth(method = "lm")Still pretty hard to see! One thing we can do is make our data somewhat transparent so we are able to see the overlap more easily. We do this by using an alpha value to our geom_line() function.
Change the alpha value to make the graph more interpretable. Values can be anywhere between 0 and 1.
ggplot(data = climate_data, aes(x = date, y = tmax, color = scenario)) +
geom_line(alpha = 1) +
geom_smooth(method = "lm")You can also change the overall look of the graph. If you don’t like the gray background, look at the Themes section of this workshop to see how you can change it. If you don’t like the tick placement or labeling look at the Axis Ticks section of this workshop to change them.
Figure it Out
You can explore extreme values using the filter() function. This function will let you subset the data based on a condition, like being above or below a certain value.
For example, we could filter the tmax column so that we are left with a dataset of only values over 80 degrees. We’ll need to make a new object called over80 in which to store this data.
over80 <- climate_data |>
filter(tmax >= 80)Note:
>=means more than or equal to.<-means less than or equal to.
We can now graph this data. A line graph is not the most appropriate choice now because our data is no longer a direct time series, now we have scattered days where the temperature is over 80.
Change the code to make the graph a more appropriate type.
ggplot(data = over80, aes(x = date, y = tmax, color = scenario)) +
________()ggplot(data = over80, aes(x = date, y = tmax, color = scenario)) +
geom_point()A high temperature of 80 is obviously not very extreme. Using your knowledge of what constitutes extreme weather, change the filtering values to create better graphs for both extreme heat and extreme cold events.
Resources: