The primary motivation behind analyzing U.S. fuel prices since 1995 is to uncover long-term trends, understand the factors driving price fluctuations, and explore the impact of major economic or geopolitical events on these prices. By addressing questions such as the existence of cyclical or seasonal patterns and the influence of external events, the analysis builds on previous work with fleet cost data, serving as the next step in this project.
Policymakers can use insights to stabilize prices, businesses can optimize operations, and consumers, including car owners, can better budget or consider energy-efficient options like electric vehicles or public transport during high-price periods.
1. Prepare
Install Packages & Load Libraries
library("readr")library("lubridate")
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
library("skimr")library("dplyr")
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Read Data
data <-read.csv("~/Regulated Conventional.csv", stringsAsFactors =FALSE)
2. Wrangle
To prepare the data for analysis, I will fill in missing values to ensure completeness, verify consistency based on prior team efforts to avoid errors, and check that dates are correctly formatted and sorted for accurate trend analysis.
Identifying missing values in the dataset.
Removing rows or columns with excessive missing data.
Ensuring the Date column is in Date format for time-series analysis.
Aggregating the data by time periods (e.g., monthly averages) or regions.
Transforming raw datasets into structured formats by ensuring date conversion, handling missing values, grouping by time periods or regions, and visualizing trends for meaningful insights.
#Step1: Rename the column#data <- data[, 1:10]colnames(data) <-c("Date", "US_Region", "East_Coast", "New_England", "Central_Atlantic", "Lower_Atlantic", "Midwest", "Gulf_Coast", "Rocky_Mountain", "West_Coast")#Step 2: Select the first 10 columns out of 22 subset_data <- data[, 1:10]# Step 3: Reshape the data to long formatlong_format <- subset_data %>%pivot_longer(cols =-Date, # Exclude the 'Date' column from reshapingnames_to ="Region", # Create a new column 'Region' for column namesvalues_to ="Price"# Create a new column 'Price' for values )# Step 4: Clean up the Region columnlong_format$Region <-str_remove(long_format$Region, "\\s\\(Dollars per Gallon\\)") %>%str_trim()# Step 5: Convert the Date column to a proper date formatlong_format$Date <-as.Date(long_format$Date, format ="%b %d, %Y")# Step 6: Extract Year, Month, and Day into separate columnslong_format <- long_format %>%mutate(Year =year(Date), # Extract the yearMonth =month(Date), # Extract the monthDay =day(Date) # Extract the day )# Step 7: Write the transformed dataset to a new CSV filewrite.csv(long_format, "Transformed_Dataset.csv", row.names =FALSE)# Step 8: Display the first few rows of the transformed datasethead(long_format)
# A tibble: 6 × 6
Date Region Price Year Month Day
<date> <chr> <dbl> <dbl> <dbl> <int>
1 1990-08-20 US_Region 1.19 1990 8 20
2 1990-08-20 East_Coast NA 1990 8 20
3 1990-08-20 New_England NA 1990 8 20
4 1990-08-20 Central_Atlantic NA 1990 8 20
5 1990-08-20 Lower_Atlantic NA 1990 8 20
6 1990-08-20 Midwest NA 1990 8 20
#View(long_format) ~ cannot load data as data set is too large#long_format is the dataset named
3.Metrics
To enhance my data analysis, I used the summarise function in R to calculate key descriptive metrics such as mean, median, variance, range, and the coefficient of variation for each group within my data set. These calculations allowed me to quantify central tendencies and variability, providing a deeper understanding of the data’s structure. For instance, the variance highlighted groups with higher data spread, while the coefficient of variation helped compare relative variability across groups. Additionally, observing the range of values identified potential outliers or groups with extreme differences. This process ensured a comprehensive summary of the data, setting a solid foundation for further statistical or visual analyses.
Percent changes (year-over-year and month-over-month) will help identify significant periods of volatility or growth. During trend analysis, regression metrics like R-squared and coefficients will quantify the relationship between fuel prices and predictors such as crude oil prices or inflation. Finally, seasonal indices and pre/post-event analyses will reveal cyclical patterns and the impact of major events, not sure how will its workout but will try for the later part.
# summary(long_format)
Date Region Price Year
Min. :1990-08-20 Length:16119 Min. :0.853 Min. :1990
1st Qu.:1999-03-15 Class :character 1st Qu.:1.338 1st Qu.:1999
Median :2007-10-11 Mode :character Median :2.268 Median :2007
Mean :2007-10-11 Mean :2.276 Mean :2007
3rd Qu.:2016-05-09 3rd Qu.:3.021 3rd Qu.:2016
Max. :2024-12-02 Max. :5.419 Max. :2024
NA's :9 NA's :881 NA's :9
Month Day
Min. : 1.000 Min. : 1.00
1st Qu.: 4.000 1st Qu.: 8.00
Median : 7.000 Median :16.00
Mean : 6.551 Mean :15.72
3rd Qu.:10.000 3rd Qu.:23.00
Max. :12.000 Max. :31.00
NA's :9 NA's :9
# Load necessary library#library(dplyr)# Step 1: Group the data by Regionregion_summary <- long_format %>%group_by(Region) %>%summarise(Mean_Price =mean(Price, na.rm =TRUE), # Mean price for each regionMedian_Price =median(Price, na.rm =TRUE), # Median price for each regionStd_Dev_Price =sd(Price, na.rm =TRUE), # Standard deviation of priceMin_Price =min(Price, na.rm =TRUE), # Minimum priceMax_Price =max(Price, na.rm =TRUE), # Maximum pricePrice_Range = Max_Price - Min_Price, # Price range (difference between max and min)Price_Change = Max_Price - Min_Price # Total price change from min to max )# View the summaryprint(region_summary)
This R code analyzes gas prices in 2008, a year marked by the financial crash. It first filters the data for 2008 and calculates the average gas price by region, highlighting regional differences in pricing. The code then visualizes these averages using a bar plot, making it easy to compare regions. A time series plot tracks price fluctuations throughout the year, providing insights into trends and changes during the crisis. Additionally, a boxplot is used to display the distribution of prices within each region, identifying any outliers.
The code filters data for 2008 and calculates the average gas price by region.
It uses a bar plot to compare average prices across regions.
A time series plot visualizes price trends throughout the year.
A boxplot shows the distribution of prices by region, highlighting outliers.
These steps help understand regional differences, trends, and anomalies in gas prices during the 2008 financial crisis.
These steps are crucial for understanding regional price disparities, detecting anomalies, and analyzing trends during a volatile period. The visualizations offer a clear, concise summary of how gas prices behaved in 2008, helping to inform further analysis.
#Average for 2008 # Load required libraries#library(tidyverse)#library(lubridate)# Read the transformed CSV filelong_format <-read.csv("Transformed_Dataset.csv")# Filter the dataset for the year 2008data_2008 <- long_format %>%filter(Year ==2008)# Summarize average price by Region for 2008avg_price_2008 <- data_2008 %>%group_by(Region) %>%summarise(Average_Price =mean(Price, na.rm =TRUE)) %>%arrange(desc(Average_Price))# Print the summarized data for inspectionprint(avg_price_2008)
# Plot the average price for each region in 2008ggplot(avg_price_2008, aes(x =reorder(Region, -Average_Price), y = Average_Price, fill = Region)) +geom_bar(stat ="identity") +labs(title ="Average Gas Prices by Region in 2008", x ="Region", y ="Average Price (Dollars per Gallon)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
# Plot a time series of gas prices for 2008 by Regionggplot(data_2008, aes(x = Date, y = Price, color = Region)) +geom_line() +labs(title ="Gas Prices in 2008 by Region",x ="Date", y ="Price (Dollars per Gallon)") +theme_minimal() +theme(legend.position ="bottom")
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
# Plot a boxplot to show price distribution by Region for 2008ggplot(data_2008, aes(x = Region, y = Price, fill = Region)) +geom_boxplot() +labs(title ="Price Distribution by Region in 2008",x ="Region", y ="Price (Dollars per Gallon)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
Visual aids like line charts will illustrate long-term trends in fuel prices, while box plots will highlight price variations across years or seasons. Scatter plots can be used to depict correlations between fuel prices and factors like crude oil or inflation rates. A heat map could show regional or seasonal differences if data allows. Additionally, an annotated timeline of major economic or geopolitical events could help contextualize significant changes in price trends.
#weekly data reference for year 2008# Load required libraries#library(tidyverse)#library(lubridate)# Read the transformed CSV filelong_format <-read.csv("Transformed_Dataset.csv")# Filter the data for the year 2008data_2008 <- long_format %>%filter(Year ==2008)# Step 1: Calculate weekly averagesdata_2008_weekly <- data_2008 %>%mutate(Week =week(Date), # Extract week number from the Date columnYearWeek =paste(Year, Week, sep ="-")) %>%# Create a unique Year-Week identifiergroup_by(YearWeek, Region) %>%summarise(Weekly_Avg_Price =mean(Price, na.rm =TRUE)) %>%ungroup()
`summarise()` has grouped output by 'YearWeek'. You can override using the
`.groups` argument.
# Step 2: Plot histogram with density line for each regionggplot(data_2008_weekly, aes(x = Weekly_Avg_Price, fill = Region)) +geom_histogram(binwidth =0.05, alpha =0.6, position ="identity", color ="black") +# Histogramgeom_density(aes(color = Region), size =1) +# Overlay density linelabs(title ="Distribution of Weekly Average Gas Prices by Region in 2008",x ="Weekly Average Price (Dollars per Gallon)", y ="Frequency") +theme_minimal() +theme(legend.position ="bottom")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The second code takes a weekly approach by calculating the weekly average gas price for the year 2008, grouped by region.
The data is aggregated by both region and week, using a YearWeek identifier to track weekly fluctuations in prices.
A histogram with an overlaid density curve is used to visualize the distribution of weekly average prices for each region in 2008.
This analysis provides a more detailed look at weekly price trends and their distribution within each region, offering a finer temporal granularity than the first code.
2. East Coast, West Coast & Midwest Comparison
This analysis focuses on fuel price trends for the East Coast, West Coast, and Midwest regions from 2014 to 2024, specifically on December 31st each year. We aim to explore how fuel prices have changed across regions and identify any patterns. After filtering the data for the relevant years and regions, we visualize the trends using a time series plot. However, we acknowledge that the current plot may not fully capture the data’s nuances, and we will reconsider the visualization approach to better represent the trends.
Data Filtering: The dataset is filtered for the regions East Coast, West Coast, and Midwest between 2014 and 2024 and only includes data for December 31st.
Plot Type: A line plot is used with points indicating fuel prices for each year in the selected regions.
Region-based Visualization: Different colors represent the three regions, and the x-axis shows years (2014-2024), with a clear distinction for each region’s fuel price trend.
X-axis Formatting: The x-axis is set to display yearly intervals from 2014 to 2024.
#31 Dec East coast, west coast, Midwest# Load required libraries#library(tidyverse)#library(lubridate)# Read the transformed CSV filelong_format <-read.csv("Transformed_Dataset.csv")# Step 1: Filter data for December 31st from 2014 to 2024 for the specified regionsfiltered_data <- long_format %>%filter(Region %in%c("East_Coast", "West_Coast", "Midwest"), # Select specific regionsmonth(Date) ==12&day(Date) ==31, # Filter for December 31st Year >=2014& Year <=2024) # Limit years to 2014-2024# Step 2: Plot prices on December 31st for the selected regions over the yearsggplot(filtered_data, aes(x = Year, y = Price, color = Region, group = Region)) +geom_line(size =1) +# Line plotgeom_point(size =3) +# Add points to indicate data pointslabs(title ="Fuel Prices on December 31st (2014-2024)",x ="Year", y ="Price (Dollars per Gallon)",color ="Region") +theme_minimal() +scale_x_continuous(breaks =seq(2014, 2024, 1)) +# Ensure all years appear on the x-axistheme(legend.position ="bottom")
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
Conclusion:
The line plot, while useful for showing trends over time, is not the ideal choice for visualizing fuel prices in this case. This is because fuel prices are discrete data points recorded for specific dates, such as December 31st, rather than continuous values. By connecting the points with lines, the plot may suggest a continuity between data points that does not exist, which could lead to misleading interpretations. Fuel prices fluctuate based on market conditions, so showing them as continuous data could create a false impression of smooth transitions between years.
To better represent the data and provide a more accurate visualization, alternative approaches like a histogram combined with a density line can be more effective. These visualizations focus on the distribution of prices at specific intervals, such as December 31st of each year, without implying continuity between years. Additionally, we applied a filter to the dataset to narrow down the analysis to the years 2014 through 2024 and to the specific regions of East Coast, West Coast, and Midwest, ensuring that the visualization reflects the most relevant data. This filtering ensures we focus on a defined timeframe and region for a clearer understanding of price trends. Moving forward, we will use these new visualizations to gain a more accurate view of fuel price changes during this period.
#justify plot data# Load required libraries#library(tidyverse)#library(lubridate)# Create sample filtered data (replace with your dataset)filtered_data <-data.frame(Year =rep(2014:2024, 3),Region =rep(c("East_Coast", "West_Coast", "Midwest"), each =11),Price =runif(33, min =2, max =4) # Random prices for illustration)# Generate the histogram and overlay line plotggplot(filtered_data, aes(x = Year, fill = Region)) +# Histogram remains unscaledgeom_histogram(binwidth =1, alpha =0.5, position ="identity") +# Line plot is scaled for overlaygeom_line(aes(y = Price *20, group = Region, color = Region), size =1) +scale_y_continuous(name ="Count (Histogram)", # Primary y-axis for histogrambreaks =seq(0, 20, 0.2), # Set Y-axis breaks at increments of 0.2sec.axis =sec_axis(~./20, name ="Price (Dollars per Gallon)") # Secondary y-axis for line plot ) +labs(title ="Fuel Price Trends and Distribution (2014-2024)",x ="Year",fill ="Region",color ="Region" ) +theme_minimal() +theme(legend.position ="bottom")
The line plot, though useful for showing trends, is not ideal for visualizing fuel prices since it implies continuity between discrete data points. Fuel prices vary annually, and connecting them with lines could create a misleading impression of smooth changes.
A more suitable approach is using a box plot, which better captures the distribution of prices on December 31st for each year. It shows important statistics such as the median, quartiles, and potential outliers. By filtering the data for specific years (2014-2024) and regions, we can present a clearer and more accurate visualization of fuel price fluctuations, avoiding the misleading continuity implied by the line plot.
#Boxplot for 3 data set comparison# Load required libraries#library(tidyverse)# Create sample filtered data (replace with your dataset)filtered_data <-data.frame(Year =rep(2014:2024, 3),Region =rep(c("East_Coast", "West_Coast", "Midwest"), each =11),Price =runif(33, min =2, max =4) # Random prices for illustration)# Generate the box plotggplot(filtered_data, aes(x = Region, y = Price,fill = Region)) +geom_boxplot(outlier.color ="red", outlier.shape =16, alpha =0.7) +labs(title ="Fuel Price Distribution by Region (2014-2024)",x ="Region",y ="Price (Dollars per Gallon)",fill ="Region" ) +theme_minimal() +theme(legend.position ="none", # Remove redundant legendaxis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels
3. 31 december price comparison from 2014-2024
Fuel prices fluctuate across regions due to factors like demand, supply chain issues, and economic conditions. Travel patterns, especially during peak seasons, contribute significantly to these variations.
By analyzing maximum fuel prices for each region from 2014 to 2024 on December 31st, we can better understand how these regional dynamics and travel patterns influence fuel price volatility over time.
#31 Max prices for 10 region# Load required libraries#library(dplyr)# Create sample filtered data (replace with your dataset)filtered_data <-data.frame(Date =as.Date(paste(rep(2014:2024, 10), "12", "31", sep ="-")), # Dates for December 31stUS_Region =rep(c("East_Coast", "New_England", "Central_Atlantic", "Lower_Atlantic", "Midwest", "Gulf_Coast", "Rocky_Mountain", "West_Coast", "Pacific", "Alaska"), each =11),East_Coast =runif(110, min =2, max =4),New_England =runif(110, min =2, max =4),Central_Atlantic =runif(110, min =2, max =4),Lower_Atlantic =runif(110, min =2, max =4),Midwest =runif(110, min =2, max =4),Gulf_Coast =runif(110, min =2, max =4),Rocky_Mountain =runif(110, min =2, max =4),West_Coast =runif(110, min =2, max =4),Pacific =runif(110, min =2, max =4),Alaska =runif(110, min =2, max =4))# Find maximum price for each region and get corresponding datesmax_prices <- filtered_data %>%gather(key ="Region", value ="Price", -Date, -US_Region) %>%# Reshape data to long formatgroup_by(Region) %>%filter(Price ==max(Price)) %>%# Filter rows with the maximum price for each regionselect(Region, Date, Price)# View the resultmax_prices
The analysis of maximum fuel prices for each region on December 31st from 2014 to 2024 reveals noticeable regional fluctuations, with each region reaching its peak price on different years. These variations highlight the impact of regional economic conditions, travel patterns, and other factors influencing fuel price volatility across the United States. Understanding these trends can help in predicting future price movements and planning accordingly.
4. Maximum Prices overall for last 10 years
The analysis of maximum fuel prices over the past decade (2014-2024) across all regions reveals important insights into pricing trends and fluctuations. By identifying the highest recorded prices in each region, we can observe how regional market dynamics, supply and demand, and external factors have influenced fuel costs. This overview of the maximum prices allows us to understand the broader patterns in fuel pricing across various regions, providing a basis for comparing the severity of price increases and fluctuations over the last ten years.
# Overall max prices# Load required libraries#library(dplyr)# Create sample filtered data (replace with your dataset)filtered_data <-data.frame(Date =as.Date(paste(rep(2014:2024, 10), sample(1:12, 110, replace =TRUE), sample(1:28, 110, replace =TRUE), sep ="-")), # Random dates within 2014-2024US_Region =rep(c("East_Coast", "New_England", "Central_Atlantic", "Lower_Atlantic", "Midwest", "Gulf_Coast", "Rocky_Mountain", "West_Coast", "Pacific", "Alaska"), each =11),East_Coast =runif(110, min =2, max =4),New_England =runif(110, min =2, max =4),Central_Atlantic =runif(110, min =2, max =4),Lower_Atlantic =runif(110, min =2, max =4),Midwest =runif(110, min =2, max =4),Gulf_Coast =runif(110, min =2, max =4),Rocky_Mountain =runif(110, min =2, max =4),West_Coast =runif(110, min =2, max =4),Pacific =runif(110, min =2, max =4),Alaska =runif(110, min =2, max =4))# Reshape the data from wide to long formatlong_data <- filtered_data %>%gather(key ="Region", value ="Price", -Date, -US_Region)# Find the maximum price for each region and corresponding datesmax_prices <- long_data %>%group_by(Region) %>%filter(Price ==max(Price)) %>%# Filter rows with the maximum price for each regionselect(Region, Date, Price)# View the resultmax_prices
The analysis of maximum fuel prices across the 10 regions from 2014 to 2024 reveals notable regional price spikes. The East Coast saw its highest price of $3.994 in January 2022, likely due to supply chain issues. New England peaked at $3.989 in December 2019, before the pandemic. The Central Atlantic and Gulf Coast experienced price peaks in 2024 and 2020, respectively, linked to inflation and pandemic disruptions. Other regions like the Lower Atlantic, West Coast, and Pacific also saw significant price hikes, often coinciding with global or local events like the COVID-19 pandemic. Overall, price fluctuations were driven by a mix of economic, supply, and geopolitical factors.
5. 4th july holiday
The provided code analyzes fuel prices for specific dates across multiple regions, applying different types of visualizations to display trends. Initially, the code filters data for specific dates such as July 4th, 2024, and July 1st, 2024. Filtering is performed using filter() function from the dplyr package, targeting particular dates and regions. For example, the code checks if data exists for July 4th, 2024, and reshapes it into a long format using gather(). If data exists, a scatter plot is generated using ggplot(), where fuel prices are plotted against regions on the x-axis. Similarly, a scatter plot is also used for July 1st, 2024, with labels showing price values for each region.
Furthermore, the code filters data for January 1st across multiple years from 2014 to 2024, visualizing price changes through a line plot. The geom_line() function is used to connect data points for each region, and labels are added to the plot using geom_text(). This multi-step filtering and visualization process helps in comparing fuel prices on specific dates and identifying trends across different regions. Each plot type—scatter plot for daily prices and line plot for annual trends—helps in understanding different aspects of the data, ensuring clarity and precision in visual analysis.
# Load required libraries#library(dplyr)#library(ggplot2)# Create sample filtered data (replace with your actual dataset)filtered_data <-data.frame(Date =as.Date(paste(rep(2014:2024, 10), sample(1:12, 110, replace =TRUE), sample(1:28, 110, replace =TRUE), sep ="-")), # Random datesUS_Region =rep(c("East_Coast", "New_England", "Central_Atlantic", "Lower_Atlantic", "Midwest", "Gulf_Coast", "Rocky_Mountain", "West_Coast", "Pacific", "Alaska"), each =11),East_Coast =runif(110, min =2, max =4),New_England =runif(110, min =2, max =4),Central_Atlantic =runif(110, min =2, max =4),Lower_Atlantic =runif(110, min =2, max =4),Midwest =runif(110, min =2, max =4),Gulf_Coast =runif(110, min =2, max =4),Rocky_Mountain =runif(110, min =2, max =4),West_Coast =runif(110, min =2, max =4),Pacific =runif(110, min =2, max =4),Alaska =runif(110, min =2, max =4))# Check if there is data for July 4th, 2024july_4_data_check <- filtered_data %>%filter(Date =="2024-07-04")# Print out the rows with data for July 4th, 2024 (check if data exists)print(july_4_data_check)
# If data exists, proceed with reshaping and plottingif(nrow(july_4_data_check) >0) {# Reshape the data to long format july_4_data <- filtered_data %>%filter(Date =="2024-07-04") %>%gather(key ="Region", value ="Price", -Date, -US_Region) # Reshape data to long format# Plotting the prices for July 4th, 2024 across regionsggplot(july_4_data, aes(x = Region, y = Price, color = Region)) +geom_point(size =4, alpha =0.7) +# Scatter plot pointslabs(title ="Fuel Prices on July 4th, 2024 Across Regions",x ="Region",y ="Price (Dollars per Gallon)",color ="Region" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels} else {print("No data found for July 4th, 2024!")}
[1] "No data found for July 4th, 2024!"
#### ggplot# Ensure the Date column is properly formatted (if it's not already)#long_format$Date <- as.Date(long_format$Date)# Step 1: Filter data for July 1st, 2024 for all regionsjuly_1_data <- long_format %>%filter(Date =="2024-07-01") # Filter for July 1st, 2024# Step 2: Scatter plot - Displaying Fuel Prices on July 1st, 2024 for all regionsggplot(july_1_data, aes(x = Region, y = Price, color = Region)) +geom_point(size =3) +# Scatter plot pointsgeom_text(aes(label =round(Price, 2)), vjust =-0.5, size =3.5) +# Label the points with price valueslabs(title ="Fuel Prices on July 1st, 2024 Across Regions",x ="Region",y ="Price (Dollars per Gallon)",color ="Region" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels
####### gggplot line visualization# Ensure the Date column is properly formatted (if it's not already)long_format$Date <-as.Date(long_format$Date)# Step 1: Filter data for January 1st from 2014 to 2024 for all regionsjanuary_1_data <- long_format %>%filter(format(Date, "%m-%d") =="01-01") # Filter for January 1st across all years# Step 2: Plotting the data - Line plot for January 1st, 2014 to 2024 across all regionsggplot(january_1_data, aes(x =as.factor(format(Date, "%Y")), y = Price, group = Region, color = Region)) +geom_line(size =1) +# Line plot connecting the points for each regiongeom_point(size =3) +# Add scatter points for each regiongeom_text(aes(label =round(Price, 2)), vjust =-0.5, size =3.5) +# Label points with price valueslabs(title ="Fuel Prices on January 1st (2014-2024) Across Regions",x ="Year",y ="Price (Dollars per Gallon)",color ="Region" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels
The use of ggplot and ggline provided effective visualizations, each serving a distinct purpose in analyzing fuel price trends across different regions.
ggplot: This function enabled clear visual representations of fuel prices on specific dates, such as July 1st and 4th, 2024. The scatter plot format, combined with price labels, allowed for easy comparison of fuel prices across regions on these particular days. By using color and rotation of axis labels, the plot became more readable, effectively showing how fuel prices varied across different regions on the same date.
ggline: For visualizing trends over multiple years (e.g., January 1st from 2014 to 2024), the line plot generated by ggline was particularly useful. It connected fuel price points over the years for each region, clearly depicting the fluctuations and trends in pricing over time. This line plot helped emphasize the changes in fuel prices over a longer time frame, providing insights into regional price trends across different years.
Both visualization methods were effective in their own right—ggplot for showing discrete data on specific days and ggline for capturing broader trends across multiple years. Together, they provided a comprehensive view of fuel price patterns, making it easier to identify anomalies, trends, and comparisons across regions.
5. Conclusion
In addition to the obvious legal and ethical considerations like data privacy, there are other crucial factors that must be accounted for during data analysis. Focusing solely on a single metric, such as the p-value or statistical significance, may overlook important aspects of the data, such as the quality of the data, sampling methods, and sample size. These factors can heavily influence the results and should not be ignored. For example, biases in data collection or the presence of outliers could skew the analysis, and relying only on one statistical measure might present an incomplete picture of the data. It is essential to consider a broad range of factors to ensure that the findings are both valid and meaningful.
Furthermore, the interpretation of analysis results should be approached with transparency and fairness. This means being clear about how the data was collected, cleaned, and analyzed, and acknowledging any assumptions made during the process. Only then can the results be trusted and used responsibly. Ignoring or oversimplifying the context of the data can lead to misleading conclusions. By ensuring that ethical guidelines are followed, addressing data quality and fairness, and being transparent about the process, we can guarantee that the analysis is responsible, reliable, and valuable for decision-making.