Load CSV into R

Load the csv file into R.

library(tidyr)
library(dplyr)
library(DT)
raw_data <- read.csv('https://raw.githubusercontent.com/suswong/DATA-607-Project-2/main/Gasoline_Retail_Prices_Weekly_Average_by_Region__Beginning_2007.csv')
raw_data$Date <- as.Date(raw_data$Date,
                        format = "%m/%d/%Y")
datatable(raw_data)

Tidy the data

newtable <- raw_data
colnames(newtable) <- c("Date", "New.York.State", "Albany","Batavia","Binghamton","Buffalo", "Dutchess", "Elmira,","Glens.Falls","Ithaca", "Kingston","Nassau","New.York.City","Rochester", "Syracuse","Utica","Watertown","White.Plains")

From Wide Format to Long Format

Pivot the table so that each color is in a row.

colnames(raw_data)

##  [1] "Date"                           "New.York.State.Average....gal."
##  [3] "Albany.Average....gal."         "Batavia.Average....gal."       
##  [5] "Binghamton.Average....gal."     "Buffalo.Average....gal."       
##  [7] "Dutchess.Average....gal."       "Elmira.Average....gal."        
##  [9] "Glens.Falls.Average....gal."    "Ithaca.Average....gal."        
## [11] "Kingston.Average....gal."       "Nassau.Average....gal."        
## [13] "New.York.City.Average....gal."  "Rochester.Average....gal."     
## [15] "Syracuse.Average....gal."       "Utica.Average....gal."         
## [17] "Watertown.Average....gal."      "White.Plains.Average....gal."

library(ggplot2)
long_table <- newtable %>% 
  pivot_longer(cols = !c('Date'),names_to = "region", values_to = "average_price" )
datatable(long_table)

colnames(raw_data)

##  [1] "Date"                           "New.York.State.Average....gal."
##  [3] "Albany.Average....gal."         "Batavia.Average....gal."       
##  [5] "Binghamton.Average....gal."     "Buffalo.Average....gal."       
##  [7] "Dutchess.Average....gal."       "Elmira.Average....gal."        
##  [9] "Glens.Falls.Average....gal."    "Ithaca.Average....gal."        
## [11] "Kingston.Average....gal."       "Nassau.Average....gal."        
## [13] "New.York.City.Average....gal."  "Rochester.Average....gal."     
## [15] "Syracuse.Average....gal."       "Utica.Average....gal."         
## [17] "Watertown.Average....gal."      "White.Plains.Average....gal."

library(ggplot2)
long_table <- newtable %>% 
  pivot_longer(cols = !c('Date'), names_to = "region", values_to = "average_price" )
datatable(long_table)

Data Analysis

Create a line graph to view the trend or compare the prices of gasoline for each region.

Overall, we can see most region follows the same trend. Overall, there is a increase in average weekly prices from mid 2007 to around 2012 for all region. There is a decrease in average weekly prices from around 2012 to around 2017. The average weekly prices increase from 2017 to present.

library(ggplot2)
ggplot(long_table, aes(x = Date, y = average_price, colour = region)) +
  geom_smooth() + scale_y_continuous(labels = function(x) format(x, scientific = FALSE))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## Warning: Removed 3792 rows containing non-finite values (stat_smooth).

Compare the trend with the crude oil prices

Annual US Crude Oil Prices from 1900

The US crude oil annual prices from 1900 can be found here: https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&s=f000000__3&f=a

Load CSV into R

The website provided a XLS file of the crude oil prices. The file was save as a csv file.

crude_oil_raw_data <- read.csv('https://raw.githubusercontent.com/suswong/DATA-607-Project-2/main/crude%20oil%20prices%20from%20eia.gov.csv')

Tidy Data

Remove the first two rows and rename the columns.

colnames(crude_oil_raw_data) <- c("Year", "Price")
crude_oil <- crude_oil_raw_data
tidy_crude_oil <- crude_oil [-1,]
tidy_crude_oil <- tidy_crude_oil [-1,]

Filter out crude oil prices prior to 2007

filtered_crude_oil <- tidy_crude_oil[as.numeric(tidy_crude_oil$Year) >2006,] 
datatable(filtered_crude_oil)

# filtered_crude_oil <- tidy_crude_oil %>% 
#   filter(Year >2006)
# datatable(filtered_crude_oil)

Plot the crude oil

The annual US crude oil prices have similar trends as the average weekly gasoline prices in NY.

ggplot(filtered_crude_oil, aes(x = Year, y = Price, group=1)) +
  geom_line()+ ggtitle("Annual Crude Oil Price from 2007 to 2022") + 
  xlab("Year") + 
  ylab("US Crude Oil Price (Dollars/Barrel)")

ggplot(long_table, aes(x = Date, y = average_price, colour = region)) +
  geom_line() + scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) + ggtitle("Average Weekly Gasoline Prices in Different NY Region from 2007 to Present") + 
  ylab("Average Weekly Gasoline Prices") + 
  xlab("Year")

## Warning: Removed 3792 row(s) containing missing values (geom_path).

DATA 607 dataset 3

Susanna Wong

2023-03-05

Load CSV into R

Tidy the data

From Wide Format to Long Format

Data Analysis

Create a line graph to view the trend or compare the prices of gasoline for each region.

Compare the trend with the crude oil prices

Annual US Crude Oil Prices from 1900

Load CSV into R

Tidy Data

Filter out crude oil prices prior to 2007

Plot the crude oil