ES300: Combine Group Data

Each group has now produced different datasets that can be combined into a master file. To do this, you will load all three datasets, make sure they all look in order and change or rename any columns if necessary, then you’ll join the data based on the shared dates between them.

The three datasets: - "pge_data.csv“: This contains energy usage by day of year. - "pdx_daily_noaa.csv”: This contains daily temperatures, wind chill, and heat index for the PDX airport from NOAA. - "pdx_climate_toolbox.csv“: This contains daily min and max temperature values for the PDX airport from the Climate Toolbox. It also contains data on future temperature projections under two different emissions models.

To Do

Look at the columns in each dataset. We call the column that we will be matching things by the “key”. The key is the column that contains a shared unique identifier across all the data. What column in each of our datasets will be the key?

To join the datasets you will load the {tidyverse} package that contains functions for joining. There are different types of joins depending on what you want your resulting dataset to look like.

A left-join keeps every row from your first dataset, which we call the “left”, and adds matching columns from the second dataset, which we call the “right”. If a row in the left dataset has no match in the right, we still keep that row, we just fill it with NA in the new columns. If there is data in the right that doesn’t match the left, it is left alone and not included in the join. The process for left-joining three datasets is to start with one dataset as the left and join the second as the right. Then the joined data becomes the left and the third dataset is the right. The joined data will gain columns, but the row count will be the same as the original left dataset.

A full-join keeps every row from both datasets regardless of whether a match exists. It will match what it can, but if there is no corresponding data it will fill in NA on whichever side is missing. The result will be a dataset that contains all rows and all columns from both datasets. The process of joining is still the same, add the first two together, then add the third to that combination.

To learn more about how to join tables, see this workshop on joins.

To Do

Join your three datasets together.

You’ll see in that workshop that if your key columns are named the same thing you can use simply by = "key". However, if the columns you are trying to match have different names, say key1 and key2, the syntax is by = join_by("key1"=="key2").

To Do

Save the combined data to a file.